1402 Notes

1402: Mathematical Methods 2
J. L. Selden
March 13, 2017

2
Chapter 1
Multivariable Calculus
In this chapter, we make the step from single-variable calculus to multivariable cal-
culus by introducing the notions of (scalar) functions of more than one variable, partial
di↵erentiation, multiple integrals and vector functions.
1.1 Multivariable Functions

You are no doubt familiar with the notion of a function of a single real variable and the
calculus of such functions. Our first task is to introduce functions of more than one real
variable. To that end, it is worthwhile to go back to the basic definition of a function.
Definition 1.1 A function f is an assignment of every element in a set D to one and
only one element in a set R. We summarize this information using the following notation.
f :D!R
The set D is the domain of the f and the set R is the range of f .
For example, consider the function f : R ! R defined by f (x) = x2 , where we denote
the set pof real numbers by R. Another example is the function g : [0, 1) ! R defined by
g(x) = x. You should note that g is not defined on all of R (if we want the range to be
R, that is).
We want to consider functions of more than one real variable. For our purposes, it will
suffice to consider the following two sets. The first is called the (real) plane, denoted by
R2 , and we can think of it as the following set:
R2 = {(x, y) : x, y are real numbers}.
The second set is (real) space, denoted by R3 , which we describe as
R3 = {(x, y, z) : x, y, z are real numbers}.
Although we could certainly keep going by adding more and more variables, we will stop
here. So our multivariable functions will have domains which are subsets of R2 or R3 .
Functions whose domain is R2 are called functions of two variables, while those whose
domain is R3 are called functions of three variables.
3
4 CHAPTER 1. MULTIVARIABLE CALCULUS
25
20
15
10
0
−5 −4 −3 −2 −1 0 1 2 3 4 5
Figure 1.1: Graph of the function f (x) = x2 for x 2 [ 5, 5].
Examples 1.2 1. f : R2 ! R, where f (x, y) = x + y.
2. g : R2 ! R, where g(x, y) = 3xy + y 3
3. h : R3 ! R, where h(x, y, z) = x2 + y 2 + z 2
4. With h as above, if it is understood that F, ~, and ~ are real numbers, then
h(F, ~, ~) = F2 + ~2 + ~2 .
As with functions of a single variable, it is useful to have methods for visualizing

multivariable functions. To that end, recall the following definition for the graph of a
function of a single real variable.
Definition 1.3 For f : R ! R, we form the graph of f by taking all pairs (x, y) in the
plane such that y = f (x).
To make things concrete, let’s take the function f : R ! R defined by f (x) = x2 . The
graph of f is all pairs (x, y) such that y = x2 . We know this as the standard parabola in
the plane. By plotting the graph of f in the plane, we can “see” the function, illustrated
in Figure 1.1. Using geometric jargon, the graph is generally called a curve. We can use
the very same trick with a function of two real variables.
Definition 1.4 For h : R2 ! R, we form the graph of h by taking all triples (x, y, z) in
space such that z = h(x, y).
In this case, we can sometimes visualize the graph of h. In general, the resulting graph of
a function of two variables will be very difficult to “see” directly. The geometric jargon for
a graph of a function of two variables is a surface.
Examples 1.5 1. If h : R2 ! R is defined by h(x, y) = x2 + y 2 , then the graph of h is

given by all triples (x, y, z) such that z = x2 + y 2 . See Figure 1.2.
1.1. MULTIVARIABLE FUNCTIONS 5
50
45
40
35
30
25
20
15
10
5 5
0 0
5 4 3 2 1 0 −1 −2 −3 −4 −5
−5
Figure 1.2: Graph of the function h(x, y) = x2 + y 2 for x 2 [ 5, 5], y 2 [ 5, 5].
−5
5
0
0
−5 −5
Figure 1.3: The coordinate planes.
2. The function f : R2 ! R defined by f (x, y) = 2 is a constant function. The graph

of f is the set of all (x, y, z) such that z = 2. The resulting set if a horizontal plane,
which is usually described by the single equation z = 2.
Planes are a very important part of multivariable function theory, both for visualizing
the graphs of functions as well as for calculus (as we’ll see later). Of particular interest are
the following three planes and their parallel translations.
Definition 1.6 A coordinate plane is a plane gotten by setting one of the three coordinates,
x, y, z, equal to 0. The xy-plane is given by the equation z = 0. The xz-plane is given by
the equation y = 0. The yz-plane is given by the equation x = 0. See Figure 1.3
Along with these three planes, we are also interested in their parallel translations, by which
we simply mean planes of the form z = c where c is a constant, and similarly for x and y.
We make use of these planes in the following way.
Figure 1.4: Cross-section of z = x2 + y 2 by z = 2. Right plot view parallel to the z-axis.
Definition 1.7 A cross-section of a graph is the intersection of the graph with a given
plane.
The idea behind a cross-section is quite simple. It may be difficult to visualize the graph
of a given function, but if we look at cross-sections, we get a “snapshot” of the surface. If
we look at enough of these snapshots, we can reconstruct the surface.
Example 1.8 Let’s find the cross-section of h by the plane z = 2. We do this algebraically
by observing that in order for a point (x, y, z) to be both in the plane as well as on the
graph, it must satisfy the system of equations:
⇢
z = x2 + y 2
.
z=2
2 2
Therefore, wep can eliminate z, which gives x + y = 2. This is the equation of a circle
with radius 2. So the cross-section of the graph of h by the plane z = 2 is a circle.
To find out more about the shape of the graph of h, we can find its cross-section with
another plane, for example, x = 2. Again, we can use algebra to find an equation for the
resulting intersection (curve). We require that (x, y, z) satisfy
⇢
z = x2 + y 2
.
x=2
Therefore, z = 4 + y 2 is a parabola pointing upward and shifted up by 4 (in the yz-plane,

that is.) The Figure 1.4 illustrates this.
Cross-sections are useful for any surface, not just those that are given by functions of
two variables. For example, we could use cross-sections with a sphere, which is not the
graph of a function of two variables. However, when we are dealing with a graph of a
function of two variables, we assign extra importance to cross-sections with planes parallel
to the xy-plane, i.e., z = c for a constant c.
Definition 1.9 Consider the function f : R2 ! R. The cross-section of the graph of f

with a plane z = c for a constant c is called a contour (or level curve) of f .
0
2
1 2
1.5
1
0 0.5
0
−1 −0.5
−1
−1.5
−2 −2
Figure 1.5: Cross-section of z = x2 + y 2 by x = 2.
160 160
140 140
120 120
100 100
80 80
60 60
40 40
20 20
20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160
Figure 1.6: Left contours of f (x, y) = x2 + y 2 . Right contours of f (x, y) = 2x + y 2 .
Examples 1.10 1. The contours of the function f (x, y) = x2 + y 2 are all of the form
2 2
x + y = c. Therefore, all the contours (or level curves) of f are circles, but with
varying radii. See Figure 1.6, left plot.
2. For the function f : R2 ! R given by f (x, y) = 2x + y 2 , the contours are given by the
equation 2x + y 2 = c. I leave it to you to convince yourself that these are parabolas.
See Figure 1.6, right plot.
3. A practical example of contours can be found on any topographic (or contour) map.
The curves plotted on the map are level curves of the elevation function, i.e., if you
were to walk along one of the curves, you would never go up or down, but would
instead remain always at the same elevation.
So far, we have only dealt with visualizing functions of two variables. What about
functions of three variables? You might be tempted toward the idea of graphs, but a quick
look back and the pattern becomes obvious. For functions of three variables, the graphs
would require four variables. This makes the thought of graphing a function of three
variables unhelpful. However, there is nothing wrong with using an idea inspired by the
notion of level curves. Therefore, we find that our tool for visualizing functions of three
variables is via their level surfaces.
Definition 1.11 Given a function f : R3 ! R, we define a level surface of f to be the set

of (x, y, z) in space such that f (x, y, z) = c, for a constant c. Note that di↵erent values of
the constant c give (possibly) di↵erent level surfaces. We often denote the level surface by
simply writing the equation f (x, y, z) = 0.
We will not often try to find the level surfaces of a given function; rather, we will usually
try to realize a given surface as a level surface for some function. For instance, consider
the following.
p
Example 1.12 The equation of a sphere in R3 is x2 + y 2 + z 2 = c, where c is the radius.
If we consider the function f (x, y, z) = x2 + y 2 + z 2 , it pretty quickly becomes apparent that
spheres are the level surfaces of this function. Later on, we’ll be able to use this information
to our advantage.
There is a special class of level surfaces which we have already encountered, just with a
di↵erent name. Instead of level surfaces, we called these surfaces the graphs of a function
of two variables. Recall that if f : R2 ! R, then the graph of f is the set of (x, y, z) in
space such that z = f (x, y). To see the connection, we define the function G : R3 ! R by
G(x, y, z) = z f (x, y). For c = 0, the level surface for G is the set of all (x, y, z) such
that G(x, y, z) = z f (x, y) = 0, i.e., such that z = f (x, y). Therefore, the graph of f is
identical to the level surface G(x, y, z) = 0.
One could ask the question the other way around, too. In other words, if you have
a level surface, is it the graph of some function of two variables? The sphere provides a
negative answer. The level surface x2 + y 2 + z 2 = 1 cannot be represented by a function of
two variables since there would be two possible z values for each pair (x, y). However, this
level surface can be realized as the union of the graphs of two di↵erent functions. (This is
exactly analogous to the case of representing a circle using two functions.) If we solve for
z, we have
p
z = ± 1 x2 y 2 .
p p
If we define f+ (x, y) = 1 x2 y 2 and f (x, y) = 1 x2 y 2 , it is clear (after a
little thought, perhaps) that the union of the graphs of f+ and f is the whole sphere.
Alternative Coordinate Systems

The way in which we have described the plane R2 and space R3 has made use of
Cartesian coordinates. It is sometimes useful to have alternate descriptions of the points
in these sets.
Polar Coordinates
In the plane, we have been using (x, y) to describe the location of a point. The idea
behind polar coordinates is to use two other numbers to describe the location of a point.
One number, r, is the distance from that point to the origin (0, 0), and the other number,
✓, is the angle that a ray emanating from the origin and passing through that point makes
with the x-axis. To make things precise, we require r > 0 and 0  ✓ < 2⇡. It is possible
to describe how the two coordinate systems relate to each other. If a point is described
in polar coordinates (r, ✓), then the corresponding Cartesian coordinates are x = r cos(✓)
and y = r sin(✓). Note that the Pythagorean theorem tell us that r2 = x2 + y 2 .
Examples 1.13 Let’s try representing functions expressed in terms of x and y in polar
coordinates.
1. f (x, y) = x2 + y 2 . This becomes f (r, ✓) = r2 . Notice that the function depended on
both x and y in Cartesian coordinates, but only on r in polar coordinates. This is the
whole point of using a di↵erent coordinate system; to make things easier.
2. g(x, y) = xy. This becomes g(r, ✓) = r2 cos(✓) sin(✓). This certainly doesn’t look any
better in polar coordinates, so it’s unlikely that it would make our lives easier.
Polar coordinates are great whenever there are circles showing up in the problem.
Cylindrical Coordinates
Now we move into space and the di↵erent ways of representing points there. The first
alternative is known as cylindrical coordinates. Basically, we use polar coordinates to
replace x and y and leave z as it was. The three cylindrical coordinates are r, ✓ and z,
where r > 0, 0  ✓ < 2⇡, and z can be any real number. The equations for representing
Cartesian coordinates in terms of cylindrical coordinates are
x = r cos(✓)
y = r sin(✓)
z = z.
Example 1.14 Given the function f (x, y, z) = x2 + y 2 + y + z, we can represent it in
cylindrical coordinates as f (r, ✓, z) = r2 + r sin(✓) + z.
A graphic illustration of cylindrical coordinates is given in the left plot of Figure 1.7.
Spherical Coordinates
In spherical coordinates, we use one distance coordinate and two angular coordinates.
We denote them by ⇢, ✓ and , where ⇢ > 0, 0  ✓ < 2⇡ and 0  < ⇡. The equations
for representing Cartesian coordinates in terms of spherical coordinates are
x = ⇢ cos(✓) sin( )
y = ⇢ sin(✓) sin( )
z = ⇢ cos( ).
Figure 1.7: Left: illustration of cylindrical coordinates. Right: illustration of spherical

coordinates.
Examples 1.15 Let’s try representing functions expressed in terms of x and y in spherical
coordinates.
1. f (x, y, z) = x2 + y 2 + z 2 . This becomes f (⇢, ✓, ) = ⇢2 . In many ways, this is better.
2. g(x, y, z) = xy + y 2 + xz. I encourage you to carry out this computation yourself.

The mess you get at the end certainly doesn’t seem to have made things any easier.
A graphic illustration of spherical coordinates is given in the left plot of Figure 1.7.
1.2 Partial Di↵erentiation

You are familiar with di↵erentiation for functions of a single variable. Roughly speaking,
the derivative of a function tells us something about the local behaviour of the function.
Ultimately, that’s what we would like to get from any notions of the derivative of a function
of more than one variable. The difficulties lie in the actual process of extracting this
information. One of the first things you might say to yourself is “I know how to deal
with a single variable like x, but how do you deal with things if there is an x and a y?”
Essentially, what we’ll see is that we handle this problem by not dealing with more than
one variable at a time. How is this possible? The answer lies in the notion of cross-sections.
Given a function f : R2 ! R, we can form the graph of f . Let’s suppose that we are
interested in how f is behaving near the point (3, 2). We cannot use calculus yet, since f
depends on two variables. So let’s consider the cross-section of f by the plane y = 2. The
result is a curve that includes the point (3, 2) and lies on the graph of f . This curve is
given by z = f (x, 2). Notice that the cross-section has resulted in f depending upon only
one variable, which means we can apply the definition of the derivative to this function at
1.2. PARTIAL DIFFERENTIATION 11
x = 3. We have
f (3 + x, 2) f (3, 2)
lim ,
x!0 x
which is the slope of the tangent line to the curve z = f (x, 2) at the point x = 3. Un-
fortunately, this only gives us a seemingly small amount of information about the surface,
since it applies only to a particular curve on the surface. So if we are interested only in
this curve, we have all the information we need. However, what about the behaviour near
(3, 2) but not on this curve?
Of course, we can do the same thing with the cross-section of the graph of f by the
plane x = 3. This cross-section is described by the curve z = f (3, y), which lies on the
graph and also passes through the point (3, 2). Again, we can use single-variable calculus
to find the slope of the tangent line to this curve at the point y = 2. It is equal to
f (3, 2 + y) f (3, 2)
lim .
y!0 y
All right, so we have information about the behaviour of f along a di↵erent curve, but does
this really help us understand the behaviour of f as a function of two variables? YES! As
it turns out, these two slopes are all we need to find in order to know the local behaviour
of f as a function of two variables. We’ll talk more about this later. For now, we formalize
these notions and set notation.
The key to partial di↵erentiation is only allowing one of the variables to change while
holding the rest constant. As a result, we’ll be able to use almost all of the rules and tricks
from single variable calculus.
Functions of two variables
Definition 1.16 Let f : R2 ! R. The partial derivative of f with respect to x at (x0 , y0 )

is
@f f (x0 + x, y0 ) f (x0 , y0 )
(x0 , y0 ) := lim .
@x x!0 x
The partial derivative of f with respect to y at (x0 , y0 ) is
@f f (x0 , y0 + y) f (x0 , y0 )
(x0 , y0 ) := lim .
@y y!0 y
Take note of the fact that if these limits exist, then the result is a number that depends on
the point (x0 , y0 ). In the standard way, we define a function using the above definitions.
If the limits exist at each point in R2 , then we can define the partial derivative functions
@f @f
: R2 ! R and : R2 ! R.
@x @y
Functions of three variables

The ideas are the same for functions of three variables, except that there are now three
partial derivatives that we need to define and two variables to be held constant at each
turn. We anticipate the fact that we will view these partial derivatives as functions on R3
(and drop the explicit dependence upon a specified point such as (x0 , y0 , z0 )).
Definition 1.17 Let f : R3 ! R. The partial derivative of f with respect to x is
@f f (x + x, y, z) f (x, y, z)
(x, y, z) := lim .
@x x!0 x
The partial derivative of f with respect to y is
@f f (x, y + y, z) f (x, y, z)
(x, y, z) := lim .
@y y!0 y
The partial derivative of f with respect to z is
@f f (x, y, z + z) f (x, y, z)
(x, y, z) := lim .
@z z!0 z
Provided the limits exist for all points in space, each of these is a function from R3 to R.
Before we concern ourselves any more with the meaning of the partial derivative and
what it tells us about the function, let’s get some practice computing them explicitly.
Examples 1.18 Computing Partial Derivatives
1. Suppose we have the function f : R2 ! R given by f (x, y) = x2 + y 2 . Let’s find the

partial derivative of f with respect to x at the point (1, 4). We have
@f f (1 + x, 4) f (1, 4)
(1, 4) := lim
@x x!0 x
(1 + x)2 + 16 (1 + 16)
= lim
x!0 x
1+2 x+ x2 + 16 17
= lim = lim (2 + x) = 2.
x!0 x x!0
2. In single variable calculus, we were able to get the derivative function directly using
rules, e.g., the product rule, the power rule, the quotient rule. Can we do something
similar with partial derivatives? YES. BUT we have to be careful since there are too
many variables floating around to be sloppy. Let’s find the partial derivative function
of f (x, y) = x2 + y 2 with respect to x. To do so, we treat y AS A CONSTANT, and
then use the rules with x. We have
@f @ 2
= (x + y 2 ) = 2x,
@x @x
Since y is constant with respect to x, @y/@x = 0. Note that it is now easy to find the
value of the partial derivative of f with respect to x at the point (1, 4).
If you are having any trouble with thinking of y as a constant here, just consider the
function f (x) = x2 + ⇡ 2 . What is f 0 ? It’s just 2x. So treat y just like you treated ⇡.
3. Let g : R3 ! R be defined as g(x, y, z) = xy + y + 2. What are the three partial

derivatives of g?
@g @
= (xy + y + 2) = y
@x @x
@g @
= (xy + y + 2) = x + 1
@y @y
@g @
= (xy + y + 2) = 0.
@z @z
An easy way of realizing that the partial derivative of g with respect to z should be
equal to 0 is to observe that g is independent of the variable z.
4. This is a good one for emphasizing the particulars of partial di↵erentiation. Let
h : R2 ! R be defined as h(x, y, z) = xy + y z + z x . Find the three partial derivatives
of h.
@h @ y
= (x + y z + z x ) = yxy 1
+ ln(z)z x
@x @x
@h @ y
= (x + y z + z x ) = ln(x)xy + zy z 1
@y @y
@h @ y
= (x + y z + z x ) = ln(y)y z + xz x 1 .
@z @z
Just remember that as much as you would like to think of y and z as variables, when
you are taking a partial derivative with respect to x, they are nothing but constants
and should be treated as such.
Since the result of partial di↵erentiation with respect to a given variable is again a
function on R3 , we can consider its partial derivative with respect to any of the variables.
For example, if f (x, y) = 2xy 2 + y 7 , then
@f
= 4xy + 7y 6 .
@y
We can then take the partial derivative of this function with respect to x, which gives
@
(4xy + 7y 6 ) = 4y,
@x
or with respect to y, which gives
@
(4xy + 7y 6 ) = 4x + 42y 5 .
@y
These are examples of higher order partial derivatives. The di↵erence between these and
higher order ordinary derivatives is the fact that we have a choice of which variables to
di↵erentiate with respect to. The possibilities for second order partial derivatives are as
follows:
@ 2f @ @f
:=
@x2 @x @x
@ 2f @ @f
:=
@y 2 @y @y
@ 2f @ @f
:=
@x@y @x @y
@ 2f @ @f
:=
@y@x @y @x
The last two are called mixed partial derivatives. For us, it is safe to assume that these
last two partial derivatives will be equal. However, that is an assumption and one should
be aware that it is not always true. The following theorem tells us when the mixed partial
derivatives are equal. We present it without the slightest bit of proof.
@ 2f @ 2f
Theorem 1.19 If and are continuous functions on all of R2 , then
@x@y @y@x
@ 2f @ 2f
(x, y) = (x, y).
@x@y @y@x
Some geometric aspects of partial di↵erentiation

Let us revisit the idea that in order to describe the function f : R2 ! R locally near a
point (x0 , y0 ), all we need to know is the partial derivatives of f at that point with respect
to x and y. The reason for this lies at the heart of what the derivative tells us about a
function. Recall that for a function of a single variable, the derivative gave us the slope
of the tangent line to the graph of the function at a point. This is summarized best via
Taylor’s theorem in which the first two terms are precisely the equation of this tangent
line. If h : R ! R, then Taylor’s theorem tells us that for values of x near the point x0 ,
h(x) ⇡ h(x0 ) + h0 (x0 )(x x0 ).
This approximation of h by a line (or linear function) is of great use to a mathematician

or scientist because lines are very easy to handle, both computationally and theoretically.
And the tangent line is special because it is “the best linear approximation” of the function
h.
Figure 1.8: Tangent plane.
The graph of a function of two variables is a surface in space. The linear function
which models surfaces (the way a line models a curve) is a plane. Therefore, one of the
purposes of multivariable calculus is to find “the best linear approximation” of the surface.
We call this ideal plane a tangent plane. To determine the tangent plane to the graph of
a function f : R2 ! R at a point (x0 , y0 ), we must identify it amongst all the other planes
which pass through the point in space given by (x0 , y0 , f (x0 , y0 )). You will recall that a
plane is uniquely determined by two distinct lines. The tangent lines we obtain through
partial di↵erentiation provide two such lines for the tangent plane. That is the reason why
we only need to use two distinct cross-sections to know the local behaviour of a graph.
We choose the planes x = x0 and y = y0 because they are easier for computations. The
resulting equation for the tangent plane at the point (x0 , y0 , f (x0 , y0 )) is
@f @f
z = f (x0 , y0 ) + (x0 , y0 )(x x0 ) + (x0 , y0 )(y y0 ).
@x @y
The right-hand side of this equation is the first three terms in the (multivariable) Taylor
series expansion of the function f . When (x, y) is near (x0 , y0 ), we can use the tangent
plane approximation with a certain degree of confidence, i.e., for (x, y) near (x0 , y0 ), we
have
@f @f
f (x, y) ⇡ f (x0 , y0 ) + (x0 , y0 )(x x0 ) + (x0 , y0 )(y y0 ).
@x @y
A graphic illustration of the tangent plane is given in Figure 1.8.
This brings us to a related topic concerning the local behaviour of a function. Our
cross-sections (and resulting partial derivatives) have told us about the rate of change of
f in the x direction and in the y direction, but what about in other directions. We could
use the tangent plane again, but that is a bit too general for what we’re after. Instead,
we take this opportunity to introduce vectors into the game. Directions in space can be
determined using vectors. So that the magnitude of the vector does not a↵ect the result
(since we’re only interested in the direction), we consider only unit vectors, i.e., vectors of
length 1.
Let’s consider a function f : R2 ! R, the point (x0 , y0 ), and a vector ai + bj. We want
to know how f changes at (x0 , y0 ) in the direction of the unit vector ai + bj. It sounds
rough, but in fact, it will be very easy. Let t be a very small positive number. Starting
at the point (x0 , y0 ), we go a short distance in the direction of ai + bj, i.e., to the point
whose coordinates are (x0 + a t, y0 + b t). We can measure how the function f di↵ers
between these two points, i.e.,
f (x0 + a t, y0 + b t) f (x0 , y0 ).
Since t is the only thing that is changing here, in essence, we are in a single variable
situation and can therefore consider the di↵erence quotient
f (x0 + a t, y0 + b t) f (x0 , y0 )
lim .
t!0 t
Using the tangent plane approximation from above, we can write
@f @f
f (x0 + a t, y0 + b t) ⇡ f (x0 , y0 ) + (x0 , y0 )(a t) + (x0 , y0 )(b t).
@x @y
This allows us to rewrite the limit as
@f @f
@x
(x0 , y0 )(a t) + @y
(x0 , y0 )(b t) @f @f
lim =a (x0 , y0 ) + b (x0 , y0 ).
t!0 t @x @y
This is the expression is called the directional derivative of f at (x0 , y0 ) in the direction of
ai + bj. An illustration is given in Figure 1.9. If we denote the vector ai + bj by u, this
directional derivative is sometimes written as
@f
(x0 , y0 ).
@u
However, the important thing to notice here is that the directional derivative can be deter-
mined by the vector n and the partial derivatives with respect to x and y. If we consider
the vector given by
@f @f
(x0 , y0 )i + (x0 , y0 )j,
@x @y
then ✓ ◆
@f @f @f
(x0 , y0 ) = (x0 , y0 )i + (x0 , y0 )j · u,
@u @x @y
where we have taken the dot product of the two vectors on the right-hand side. All of this
can be generalized to three variables and so we make the following definitions.
~ , at the point
Definition 1.20 Let f : R3 ! R. The gradient of f , denoted by rf
(x0 , y0 , z0 ) is the vector given by
~ (x0 , y0 , z0 ) := @f (x0 , y0 , z0 )i + @f (x0 , y0 , z0 )j + @f (x0 , y0 , z0 )k.

rf
@x @y @z
Figure 1.9: Directional derivative.
As a result, we define the directional derivative as follows.
Definition 1.21 Let f : R3 ! R. For a given point (x0 , y0 , z0 ) and a given unit vector
u = ai + bj + ck, the directional derivative of f at (x0 , y0 , z0 ) in the direction of u is
@f ~ (x0 , y0 , z0 ) · u.
(x0 , y0 , z0 ) = rf
@u
Example 1.22 1. Let f : R3 ! R be defined by f (x, y, z) = x2 + y 2 + z 2 . To find p

the
directional derivative of f at the point (1, 2, 3) in the direction of u = (i 2j +k)/ 6
(note that this is a unit vector), we first need to find the gradient of f at the point
(1, 2, 3). To do that, we compute the three partial derivatives of f :
@f @ 2
= (x + y 2 + z 2 ) = 2x
@x @x
@f @ 2
= (x + y 2 + z 2 ) = 2y
@y @y
@f @ 2
= (x + y 2 + z 2 ) = 2z.
@z @z
At the point (1, 2, 3), the partial derivatives have the values
@f
(1, 2, 3) = 2
@x
@f
(1, 2, 3) = 4
@y
@f
(1, 2, 3) = 6.
@z
So the gradient of f at (1, 2, 3) is given by the vector
~ (1, 2, 3) = 2i + 4j + 6k.
rf
By definition, the directional derivative of f at the point (1, 2, 3) in the direction of

u is the following dot product:
@f ~ (1, 2, 3) · u
(1, 2, 3) = rf
@u p
= (2i + 4j + 6k) · (i 2j + k)/ 6
p
= (2 + 4( 2) + 6)/ 6 = 0.
The fact that the directional derivative of f at the point (1, 2, 3) in the direction of u
is equal to 0 tells us something interesting about the function f . Roughly speaking, it
says that f does not change near the point (1, 2, 3) if we move in the direction of u.
To see that this is not the case in every direction, let’s find the directional derivative
p
of f at the point (1, 2, 3) in the direction of the unit vector v = (2i + 3j + k)/ 14.
Conveniently, we have already found the gradient of f at (1, 2, 3), so this computation
is quicker. We get
@f ~ (1, 2, 3) · v
(1, 2, 3) = rf
@v p
= (2i + 4j + 6k) · (2i + 3j + k)/ 14
p p
= (4 + 12 + 6)/ 14 = 22/ 14.
Clearly, f is changing in the direction of v.

The Chain Rule

It is often useful to be able to think of a given function as a composition of other
functions. For functions of a single real variable, this was straight-forward with little to no
ambiguity as to how one would go about composing any pair of functions. In contrast, the
case for functions of more than one variable presents challenges from the very beginning.
Let’s begin by recalling the situation for a single variable.
Suppose that we have two functions f, g : R ! R. We define the composition as
f g : R ! R as (f g)(x) := f (g(x)). The chain rule tell us how the derivative of f g is
related to the derivatives of f and g. It states that
d(f g) df dg
(x) = (g(x)) (x).
dx dx dx
Example 1.23 In order to refresh your memory, let’s work through an example. Let
f (x) = sin(x) and g(x) = x3 + x + 1. We can write (f g)(x) = sin(x3 + x + 1). We know
that
df dg
(x) = cos(x) and (x) = 3x2 + 1.
dx dx
So the chain rule tells us that
d(f g) d
(x) = (sin(x3 + x + 1)) = cos(x3 + x + 1)(3x2 + 1).
dx dx
We now move on to multivariable functions. Consider the functions f : R2 ! R,
x : R ! R and y : R ! R. For each real number t, we get a corresponding pair (x(t), y(t)).
We can plug this latter pair into f , i.e., we get the function w : R ! R defined by
w(t) := f (x(t), y(t)), which is a composition of f with the functions x and y. Already,
we can see that compositions are more complicated when more than one variable is in
play. Clearly, w depends upon a single variable t and we can consider its derivative with
respect to t. Since t shows up in both inputs for f , should expect that to be reflected in
the formula for dw/dt. Let’s give a rough heuristic picture of what’s going on using the
di↵erence quotient (the following is not a rigorous proof). By definition, we have
dw w(t + t) w(t) f (x(t + t), y(t + t)) f (x(t), y(t))
(t) = lim = lim .
dt t!0 t t!0 t
For each function, x(t) and y(t), we have the tangent line approximation, i.e.,
dx
x(t + t) ⇡ x(t) + (t) t
dt
and
dy
y(t + t) ⇡ y(t) +
(t) t.
dt
Using these approximations in the di↵erence quotient, we get
dx dy
dw f x(t) + dt
(t) t, y(t) + dt
(t) t f (x(t), y(t))
(t) = lim .
dt t!0 t
Now, we use the tangent plane approximation for f , i.e.,
@f @f
f (x + a, y + b) ⇡ f (x, y) + (x, y)(a) + (x, y)(b).
@x @y
This results in the following:

@f @f
dw @x
(x(t), y(t)) dx
dt
(t) t, y(t) + @y
(x(t), y(t)) dy
dt
(t) t
(t) = lim
dt t!0 t
@f dx @f dy
= (x(t), y(t)) (t) + (x(t), y(t)) (t).
@x dt @y dt
This is the chain rule for this situation. Let’s put everything down in the form of a theorem.
Theorem 1.24 (The Chain Rule) Let f : R2 ! R, x : R ! R and y : R ! R. For the

function w : R ! R defined by w(t) := f (x(t), y(t)), we have the following formula for its
derivative (with respect to t):
dw @f dx @f dy
(t) = (x(t), y(t)) (t) + (x(t), y(t)) (t).
dt @x dt @y dt
There is a (slightly) more memorable version of the formula, i.e.,
dw @w dx @w dy
= + .
dt @x dt @y dt
The problem I have with this formula is the ambiguity with which w is treated. Really, w
does not depend upon x and y as variables, only upon t as a variable. So taking a partial
derivative of w with respect to x is unnatural. However, some people find it handy for
remembering the chain rule. If you remember how to interpret the terms correctly, then
you won’t run in to any problems here and the two formulas say exactly the same thing.
In the second, the term @w/@x is to be understood as follows: treat w as a function of
x and y and find its partial derivative with respect to the “variable” x. This is nothing
other than the partial derivative of f with respect to x where we have to remember to let
@f /@x act on the pair (x(t), y(t)) (look back at the single-variable chain rule and you’ll see
a similar thing!).
Examples 1.25 Let’s do some examples in which we can use the chain rule.
1. Let f (x, y) = x sin(y) and suppose that x and y in turn depend upon t as follows:
x(t) = t2 and y(t) = 2t+1. As before, we define the composition w(t) := f (x(t), y(t)).
If we want to find the derivative of w with respect to t, the chain rule tells us that
dw @f dx @f dy
(t) = (x(t), y(t)) (t) + (x(t), y(t)) (t).
dt @x dt @y dt
Since x0 (t) = 2t, y 0 (t) = 2, @f /@x(x, y) = sin(y), and @f /@y(x, y) = x cos(y), we

find that
dw
(t) = sin(y(t))(2t) + x(t) cos(y(t))(2)
dt
= 2t sin(2t + 1) + 2t2 cos(2t + 1).
In this case, we can write out w explicitly as
w(t) = f (t2 , 2t + 1) = t2 sin(2t + 1).
We can find the derivative of this function using the standard single variable methods.
As a check on the chain rule, let’s do this. Applying the product rule and then the
single-variable chain rule, we find that
dw
(t) = 2t sin(2t + 1) + t2 cos(2t + 1)(2)
dt
= 2t sin(2t + 1) + 2t2 cos(2t + 1),
which confirms our result using the chain rule.
2. Let f (x, y) = xy 2 and suppose that x and y in turn depend upon t as follows: x(t) =
e t and y(t) = sin(t). As before, we define the composition w(t) := f (x(t), y(t)). We
will need the derivatives of x and y and the partial derivatives of f . We have that
x0 (t) = e t , y(t) = cos(t), @f /@x(x, y) = y 2 , and @f /@y(x, y) = 2xy. Using the
chain rule, we find that
dw
(t) = y(t)2 ( e t ) + 2x(t)y(t) cos(t)
dt
= e t sin2 (t) + 2e t sin(t) cos(t).
As before, we could write out w explicitly and compute the derivative that way, but I
leave that to you as an exercise.
The Chain Rule (slightly more general): Let’s consider a function of three variables,
f : R3 ! R and three “coordinate” functions u : R2 ! R, v : R2 ! R and r : R2 ! R.
We then define the composition w(x, y) := f (u(x, y), v(x, y), r(x, y)). Note that although
f depends on three inputs, w only depends on two. The chain rule tells us that
@w @w @u @w @v @w @r
= + +
@x @u @x @v @x @r @x
@w @w @u @w @v @w @r
= + + ,
@y @u @y @v @y @r @y
where it is important to understand what each of these terms means. Note that the previous
version of the chain rule is just a special case of this one, as is the single-variable version.
Example 1.26 See if you can compute the derivative of the composite function w(x, y) :=
f (u(x, y), v(x, y), r(x, y)) using the more general version of the chain rule if f (x, y, z) =
cos(x2 + y 2 ) + z, u(x, y) = x cos(y), v(x, y) = x sin(y), and r(x, y) = 1.
1.3 The Unit Normal Vector

The normal vector to a surface contains, in a single vector, information about the
tangent planes to the surface. Often, we use the normal vector of a surface to obtain the
“important” components of a vector field (the idea being that sometimes the tangential
part is not important, but more on that later). For any surface, we can obtain the unit
normal vector in terms of the function describing the surface and its partial derivatives.
Let’s be precise.
Let f : R2 ! R. The graph of f is a surface in R3 . At a point on the surface, we
have the tangent plane. Since we are in three dimensional space, this leaves one direction
which is perpendicular to the tangent plane (and hence, “perpendicular” to the surface).
A vector pointing in this direction is called a normal vector.
If our surface was the xy-plane, i.e., the surface given by the function f (x, y) := 0, then
a normal to the surface would be the vector k. Of course, we could also choose the vector
100k and it would still be normal to the surface since only the direction matters. However,
the choice of k is good since it has length one. A normal vector with length one is called
a unit normal vector. In this case, it is normal to the surface z = 0 at every point.
p interesting example. Define f : R ! R by f (x, y) :=

2
Example
p 1.27 Let’s look at a more
4 x 2 2
y . The surface z = 4 x 2 2
y is the upper hemisphere of radius 2, so
we can
p picture this surface easily. Consider a point (x, y, z) on the hemisphere (so that
z= 4 x 2 2
y ). From a purely geometric standpoint, we expect that the normal to the
surface at (x, y, z) should point along the ray emanating from the origin and passing through
(x, y, z). An example of such a vector is r := xi + yj + zk, though this is not automatically
a unit vector. Of course, this could be corrected by dividing r by its magnitude, i.e., by the
distance from (x, y, z) to the origin. In this case, that distance is equal to 2. Then r/2 is
a unit normal vector. For di↵erent points on the hemisphere, we will be di↵erent normal
vectors.
We don’t want to have to rely upon geometric intuition to find normal vectors, so we
need to develop some theory. In fact, it is not difficult to explicitly construct the unit
normal vector to a surface at every point. We already have all of the tools needed to do
so.
Construction of the unit normal vector

Let’s consider a surface z = f (x, y) for a function f : R2 ! R. First of all, we know that
the normal must be perpendicular to the tangent plane. If we have two distinct vectors
in the tangent plane, then we can use the cross product of these two vectors to obtain
a perpendicular vector, i.e., a normal vector. If we remember the business with tangent
planes and partial derivatives, we realize that we know how to find two vectors in the
tangent plane; we simply use the old trick with the cross-sections. We get the two tangent
vectors
@f @f
u=i+ k and v =i+ k.
@x @y
1.4. GEOMETRIC SIGNIFICANCE OF THE GRADIENT 23
I have suppressed the dependence upon the point (x, y) to make things look nicer. The
unit normal vector is therefore given by
@f @f
u⇥v @x
i @y
j +k
n= =q .
|u ⇥ v| @f 2
+ @f 2
+1
@x @y
It may be good to stress that this vector depends on the coordinates x and y, which makes
it a vector function. Let’s check this formula against the hemisphere, since we know what
the answer should be.
p
Example 1.28 We have f : R2 ! R defined by f (x, y) := 4 x2 y 2 and we want
to find a formula for the unit normal vector n(x, y). The formula requires the partial
derivatives of f , so we compute them:
@f 1 x
(x, y) = (4 x2 y2) 1/2
( 2x) = p
@x 2 4 x2 y2
@f 1 y
(x, y) = (4 x2 y2) 1/2
( 2y) = p .
@y 2 4 x2 y2
Therefore,
✓ ◆ ✓ ◆
p x i p y
j+k p
4 x2 y 2 4 x2 y 2 xi + yj + 4 x2 y2k
n(x, y) = s✓ ◆2 ✓ ◆2 = p
x2 + y 2 + 4 x2 y2
p x p y
+ +1
4 x2 y 2 4 x2 y 2
1 1
= (xi + yj + zk) = r,
2 2
p
where we have used the fact that on the graph of f , we know that z = 4 x2 y 2 . This
is exactly what we expected to get.
As a check for your understanding, find the unit normal vector for the xy-plane, i.e.,
the plane z = f (x, y) where f (x, y) := 0.
1.4 Geometric Significance of the Gradient

We are beginning to see the importance of the gradient of a scalar function, but this
is just the tip of the iceberg. The gradient is an exceptionally useful concept in vector
calculus. Let’s discuss a couple more features of the gradient.
Recall that if we are given a scalar function f , then rf ~ is a vector field and the
directional derivative of f in the direction of the unit vector v is given by
@f ~ · v.
:= rf
@v
The directional derivative of f gives us information about how the function f changes in
the direction of v. So the gradient of f holds all of the information about how the function
f changes.
Example 1.29 Let’s consider a concrete example.
Let f (x, y, z) = x2 + y 2 + z 2 . We can easily compute the gradient of f :
~ = 2xi + 2yj + 2zk = 2r.
rf
So the gradient of f is equal to twice the radial vector r. To understand the geometric
significance of this, let’s consider a level surface of f . Recall that a level surface of f is the
set of points (x, y, z) in space which satisfy the equation
x2 + y 2 + z 2 = c,
p
where c is a constant. We recognize this surface as the sphere of radius c centered at the
origin. As we have seen before, the unit outward normal to the sphere is r/|r|. So the
gradient of f points in the same direction as the unit normal to any level surface of f .
Is this just a coincidence, or special case? The answer is no. This is another general
property of the gradient, but let’s look at a class of surfaces for which we already have an
explicit formula for the normal: graphs.
Consider the graph of a function f (x, y). Recall that we can think of graphs as level
surface in the following way. The graph is given by all points (x, y, z) in space which satisfy
z = f (x, y). If we define the function G(x, y, z) = z f (x, y), then the graph of f is the
same as the level surface G(x, y, z) = 0. We already have a formula for the unit normal for
the graph of f . It is
@f
@x
i @f
@y
j+k
n= q .
@f 2 @f 2
@x
+ @y
+ 1
How is this connected with a gradient? By analogy with the previous example, we should
expect the gradient of the function G (not the gradient of f !) to point in the same direction
as n. The gradient of G is
~ = @ (z
rG f (x, y))i +
@
(z f (x, y))j +
@
(z f (x, y))k =
@f
i
@f
j + k.
@x @y @z @x @y
This vector field clearly points in the same direction as n at every point. This means that
the gradient of G gives us the normal direction to the level surfaces of G. So we could, in
fact, define the unit normal to a level surface of G as1
~
rG
n := .
~
|rG|
This definition works for a general level surfaces, rather than just for surfaces given as
graphs.
1
Notice that this gives the same normal vector field as before in the case of graphs.
1.4. GEOMETRIC SIGNIFICANCE OF THE GRADIENT 25
To gain some more insight into the gradient and level surfaces, let’s connect these two
notions for the gradient, i.e., the gradient in its role for finding directional derivatives and
in its role as the normal direction to a level surface. Let G : R3 ! R and let’s fix a point
(x0 , y0 , z0 ) in space. Then G(x0 , y0 , z0 ) = c for some constant c, which means that the point
(x0 , y0 , z0 ) lies on the level surface G(x, y, z) = c. The unit normal to this level surface at
the point (x0 , y0 , z0 ) is given by the vector
~
rG(x 0 , y0 , z 0 )
.
~
|rG(x 0 , y0 , z0 )|
Recall that the directional derivative of G at (x0 , y0 , z0 ) in the direction of a unit vector v
is equal to
~
rG(x 0 , y0 , z0 ) · v.
If v is any unit vector which is tangent to the level surface G = c, then the directional
derivative of G in the direction of v is equal to 0 (since rG~ is normal to this surface).
This makes sense for another reason. A level surface of G is the set of points for which the
function G has the same value, i.e., where the function doesn’t change. This is precisely
what a directional derivative of 0 means.
~ and v is maximal if v points in the same direction
By contrast, the dot product of rG
of as the gradient of G. This maximizes the directional derivative of G and means that
the direction of greatest change of G is given by the gradient of G.
A simple practical example might be helpful. Consider a topographic map describing a
hill. The curves on the map gives the contours, which are precisely the level curves of the
height function. The gradient of the height function is normal to these level curves and
represents the direction of greatest change of the height function. Moving in tangential
directions to the level curves means we don’t change our height and therefore the directional
derivative is 0. However, if we want to change our altitude in the fastest way possible, we
would choose to follow the gradient of the height function.
We summarize these properties as follows.
Geometric properties of the gradient vector field

Let f : R3 ! R be a smooth function, (x0 , y0 , z0 ) be a point in space, and v be a unit
vector. The directional derivative of f at (x0 , y0 , z0 ) in the direction of v is
@f ~ (x0 , y0 , z0 ) · v.
(x0 , y0 , z0 ) = rf
@v
~ (x0 , y0 , z0 ) 6= 0, then
If rf
~ (x0 , y0 , z0 ) points in the direction of greatest increase of f ,
• rf
~ (x0 , y0 , z0 ) is normal (or perpendicular) to the level surface of f at (x0 , y0 , z0 ),
• rf
~ (x0 , y0 , z0 )| is the maximum rate of change of f at (x0 , y0 , z0 ).
• |rf
Example 1.30 Let’s examine an example with these new insights. Let f : R3 ! R be
given by f (x, y, z) := x2 + y 2 + z 1. First, we compute the gradient of f :
~ = 2xi + 2yj + k.
rf
Consider the point (1, 2, 1). Since f (1, 2, 1) = 12 + 22 + 1 1 = 5, the point (1, 2, 1) lies
on the level surface of f described by f (x, y, z) := x2 + y 2 + z 1 = 3. It follows that the
vector
~ (1, 2, 1) = 2i + 4j + k
rf
is normal to the level surface of f at (1, 2, 1). It’s not a unit vector, but we could simply
divide it by its magnitude
p to fix that
p problem. This also gives us the direction of greatest
increase of f and 22 + 42 + 12 = 21 is the maximum rate of change of f at (1, 2, 1).
1.5 The mean value theorem and Taylor’s formula

It is often useful to estimate the variation of a function between two di↵erent points
using its gradient. In the scalar case we recall that for f : R 7! R there holds
f (x) f (y) = f 0 (⌘)(x y), where ⌘ is some value between x and y. (1.5.1)
In order to generalise this to functions of several variables we consider f : R3 7! R and,

given x, y 2 R3 define h : R 7! R by
h(t) := f (y + t(x y)) = f (t), where (t) = y + t(x y).
We immediately see that h(0) = f (y) and h(1) = f (x). Since h is a scalar function we can
apply (1.5.1) to obtain
f (x) f (y) = h(1) h(0) = h0 (⌘), with ⌘ 2 [0, 1].

1.5. THE MEAN VALUE THEOREM AND TAYLOR’S FORMULA 27
Applying the chain rule to h(t) we obtain
~ (y + t(x
h0 (t) = rf y)) · (x y) (1.5.2)
and therefore
f (x) ~ ( (⌘)) · (x
f (y) = rf y). (1.5.3)
This is the equivalent of the mean value theorem for a function of several variables, observe
that (1.5.3) is indeed similar to (1.5.1), but the derivative is replaced by the directional
derivative.
1.5.1 Taylor’s formula

Another known result from the one variable case is Taylor’s formula. Assume that
f : R 7! R is twice di↵erentiable with continuous derivatives, then for x, y 2 R
1
f (x) = f (y) + f 0 (y)(x y) + f 00 (⌘)(x y)2 , where ⌘ is some value between x and y.
2
(1.5.4)
We use a similar argument as above to extend this to the multivariable case. Let f : R3 7! R
and, given x, y 2 R3 define h : R 7! R by
h(t) := f (y + t(x y)) = f (t), where (t) = y + t(x y).
We immediately see that h(0) = f (y) and h(1) = f (x). Since h is a scalar function we can
apply (1.5.4) to h(t), which gives
1
h(1) = h(0) + h0 (0) + h00 (⌘), with ⌘ 2 [0, 1].
2
Recalling (1.5.2) we see that we only need to evaluate h00 (⌘). In order to do this we develop
the expression in (1.5.2) and derive it term by term. We use the notation x = x1 i+x2 j+x3 k
and y = y1 i + y2 j + y3 k in order to simplify the expression
3
!
d ~ d X @f
h00 (t) = (rf (y + t(x y)) · (x y)) = ( (t))(xj yj )
dt dt j=1
@xj
3 X 3
!
X @ 2f
= ( (t))(xi yi )(xj yj ) .
i=1 j=1
@x i @x j
Studying the last term of the right hand side we observe that this can be written as a
product between a vector and a matrix. To see this define the 3⇥3 matrix H(x) = (hij (x))
where the elements are defined by
@ 2f
hij (x) := (x).
@xi @xj
This matrix is known as the Hessian. It is straightforward to show that

3 X 3
!
X @ 2
f
(x y)T H(x)(x y) = ( (t))(xi yi )(xj yj ) (verify this!)
i=1 j=1
@xi @xj
We may summarize the above results in a theorem.
Theorem 1.31 Let f : R3 7! R be twice di↵erentiable with continuous Hessian H = (hij ),

with elements
@ 2f
hij := .
@xi @xj
Then for a given x, y 2 R3 and (⌘) = y + ⌘(x y) there is ⌘ 2 [0, 1] such that
3 3
~ (y) · (x 1 X X @ 2f
f (x) = f (y) + rf y) + ( (⌘))(xi yi )(xj yj )
2 i=1 j=1 @xj @xi
~ (y) · (x 1
= f (y) + rf y) + (x y)T H( (⌘))(x y).
2
The second order Taylor polynomial T (x) of the function f at a point x̄ may be written
~ (x̄) · (x 1
T (x) = f (x̄) + rf x̄) + (x x̄)T H(x̄)(x x̄).
2
The function T (x) is a good approximation of f (x) in a neighbourhood of x̄. Indeed for
smooth functions this approximation is one order of approximation better than the tangent
plane approximation at x̄.
1.5.2 Optimisation
In many applications in science, engineering and finance we are faced with the problem
of finding extremal values of a function in some subset of Rn . That is the points in some
set, where the function takes it maximum or minimum values. Here we will consider the
framework where ⌦ ⇢ Rn , n 2 N, and the problem then reads: find x 2 ⌦ such that
f (x)  f (y) for all y 2 ⌦.
It is of course a relevant question to ask if such a problem always has a solution. This
question is answered in the following theorem that we state without proof.
Theorem 1.32 If f : ⌦ 7! R is a di↵erentiable function and either ⌦ is a closed and
bounded subset of Rn , or f ! 1 for |x| ! 1 then there is a minimum point x 2 ⌦ such
that
f (x)  f (y) for all y 2 ⌦.
Observe that it is essential that ⌦ is closed or bounded to guarantee the existence of the
minimum or for unbounded domains f ! 1 for |x| ! 1.
Examples 1.33 1. The function f : (0, 1) 7! R with f (x) = x does not have a mini-
mum point in (0, 1). In this case ⌦ = (0, 1) is not closed and f ! 0 for |x| ! 1.
However the lower bound f = 0 is not taken for any x.
2. The function f : [1, 1) 7! R with f (x) = 1/x does not have a minimum point in
[1, 1). In this case ⌦ = [1, 1) is not bounded.
3. Even if ⌦ is unbounded f may have a minimum point. In particular if f (x) increases
to infinity as |x| increases one can always restrict the search for a minimum to a
bounded set.
When solving the minimisation problem two situations can arise:
1. the minimum value is taken in an interior point x of ⌦;
2. the minimum value is taken in a point x in the boundary, @⌦, of ⌦. We may define
the boundary as the x points in ⌦ such that for every ✏ > 0 there exists points y not
in ⌦ such that |x y|  ✏.
When we solve a minimisation problem we must typically consider the two possibilities
separately and first find all local minima in the interior of the domain and then study the
function on the boundary and find local extrema there. The minimum point is then the
point from the two searches that results in the smallest value of f (x).
Examples 1.34 Throughout this section we will use the following minimisation problem
to illustrate the theory. Find x 2 ⌦ realising the minimum
min f (x)
x2⌦
where x = (x, y), f (x) = x2 + 13 y 3 1 2

2
y and ⌦ := {(x, y) 2 R2 : x2 + y 2  4}.
1.5.3 Interior extrema

When searching for extrema in the interior of ⌦ it is enough to consider the critical
points or stationary points.
~ (x) = 0.
Definition 1.35 A point x 2 ⌦ is called a critical or stationary point if rf
The interest in critical points stems from the following lemma, saying that a local extremum
must be a critical point.
Lemma 1.36 Suppose that f : ⌦ ! R has a local extremum (maximum or minimum) at
~ (x̄) = 0.
an interior point x̄ 2 ⌦. Then rf
To see this we assume that rf ~ (x̄) 6= 0 and x̄ is a local extremum. It is then enough to
consider Taylor’s formula (since we always assume that f may be di↵erentiated) and obtain
a contradiction. Let us assume that there is a local minimum in x̄, then for all y 2 Rn s.t.
x̄ + y 2 ⌦
~ (x̄) · y + 1 yT H( (⌘))y
f (x̄)  f (x̄ + y) = f (x̄) + rf (1.5.5)
2
where we have adopted the notation of the previous section. Since all partial derivatives
T H(x)y
of f are bounded we may introduce Hmax = maxy2Rn maxx2⌦ y |y| 2 and then, choosing
y = ✏rf ~ (x̄), with ✏ > 0 small enough so that x̄ + y 2 ⌦
f (x̄)  f (x̄) ~ (x̄)|2 + 1 ✏2 |rf

✏|rf ~ (x̄)|2 Hmax .
2
1
Choosing ✏ = Hmax ✏0 and ✏0 > 0 so small that x̄ + y 2 ⌦ we see that
1 0 1 ~
f (x̄)  f (x̄) ✏ Hmax |rf (x̄)|2 < f (x̄)
2
which is a contradiction.
Once we have found the critical points, we must identify them as local maxima, local
minima, or saddle points. This information is obtained by considering the structure of the
~ (x) = 0,
Hessian. Let x be a critical point, then by Taylor’s formula, since rf
1
f (y) = f (x) + (y x)T H( (⌘))(y x)
2
where (⌘) = y + ⌘(x y), ⌘ 2 [0, 1]. We see that if the Hessian satisfies
1
(y x)T H( (⌘))(y x) > 0 (1.5.6)
2
for all y in some (sufficiently small) neighbourhood of x then
1
f (y) = f (x) + (y x)T H( (⌘))(y x) > f (x)
2
and f (x) is a local minimum. By the continuity of the derivatives, it is enough to study
the value of the Hessian in the critical point. To find out if a critical point is a local maxi-
mum/minimum or indefinite we consider the eigenvalues { i }ni=1 of the Hessian, evaluated
at the critical points. We get the following cases:
1. i > 0 for 1  i  n, this means that H(x) is positive definite and the critical point
x is a strict local minimum. If i 0, with j = 0 for some j, the critical point is
said to be a degenerate local minimum.
2. i < 0 for 1  i  n, this means that H(x) is negative definite and the critical point
x is a strict local maximum. If i  0, with j = 0 for some j, the critical point is
said to be a degenerate local maximum.
3. If neither of the above cases hold, i.e. H(x) has eigenvalues of both signs, then we
say that the critical point is a saddle point.
It follows that to identify local extrema in ⌦ we first find the critical points by solving the
equation rf~ (x) = 0 for x 2 ⌦. Then we classify the critical points as local minima, local
maxima or saddle points using the spectrum of the Hessian.
Examples 1.37 Considering now our example we get
~ (x) = 2xi + y(y

rf 1)j.
~ (x) = 0 for x 2 ⌦ we find two critical points in ⌦, x1 = (0, 0) and x2 = (0, 1).
Solving rf
To classify these points we compute the Hessian matrix

2 0
H(x) = .
0 2y 1
It follows that 
2 0
H(x1 ) =
0 1
and since the eigenvalues of a diagonal matrix coincides with the coefficients on the diagonal
we conclude that x1 is a saddle point. Similarly

2 0
H(x2 ) =
0 1
and we see that x2 is a local minimum.
1.5.4 Extrema on the boundary (not examinable)

For simplicity we assume that the boundary of ⌦ is smooth, i.e. the boundary of the
domain has no corners. When we have identified interior extrema we need to consider the
extrema on the boundary. How to do this depends on what representation of the boundary
is at our disposal. The boundary may either be known through a parametrisation or it may
be known implicitly in the form of (for example) the zero contour of some other function.
We consider the two di↵erent cases below. For simplicity we here restrict ourselves to how
to find the critical points.
Parametrised boundary
In this case the points on @⌦ are given by a function : Rn 1 7! Rn , such that the
mapping from some R 2 Rn 1 to @⌦, : R 7! @⌦ defines @⌦. Then we may find the
extrema on the boundary by solving the following minimisation problem
min f ( (s)).
s2R
To solve this we proceed as before, finding the critical points in R.

Examples 1.38 In our example ⌦ is a disc with radius 2 and the parametrisation of the
boundary may then be written (s) = 2 cos(s)i + 2 sin(s)j. Injecting this in the expression
for f (x) we obtain
8
f (s) = 4 cos2 (s) + sin3 (s) 2 sin2 (s).
3
We may now derive with respect to s to find the critical points of f (s),
df
(s) = 8 cos(s) sin(s) + 8 cos(s) sin2 (s) 4 cos(s) sin(s) = 4 cos(s) sin(s)(2 sin(s) 3).
ds
It follows that f 0 (s) = 0 for s = 0, s = ⇡2 , s = ⇡, s = 3⇡
2
. Which means that the points of
interest on the boundary are (2, 0), (0, 2), ( 2, 0), (0, 2).
Implicit representation of the boundary

In this case we know some function g : Rn 7! R and the boundary @⌦ is given by
the points x 2 Rn such that g(x) = 0. To find the critical points on the boundary in
this case we ressort to what is known as a Lagrange multiplier method this is a method
of optimisation that allows us to solve minimisation problems under equality constraints.
Consider the problem of finding x 2 Rn minimiser of
min f (x)
x2Rn
under the constraint that g(x) = 0. This means that in the minimisation we are only
allowed to consider such x for which g(x) = 0. In order to include the constraint we
introduce an auxilliary function known as the Lagrangian
L(x, ) := f (x) + g(x)
where 2 R is an additional unknown. We may then find the critical (or stationary)
points of the Lagrangian by solving the set of equations obtained by setting
~
rL(x, )=0
and
@L
(x, ) = 0.
@
Observe that by the definition of L(x, ) the second equation is simply the constraint
g(x) = 0. In mechanics f (x) is often some energy and then corresponds to the “virtual”
force required to make the system stay on the “trajectory” g(x) = 0. Considering our
example we get
1.6. AN INTERLUDE ON VECTOR FIELDS 33
Examples 1.39
1 1 2
L(x, ) := f (x) + g(x) = x2 + y 3 y + (x2 + y 2 4).
3 2
To find the stationary points we need to solve
@L
= 2x + 2 x = 0,
@x
@L
= y2 y + 2 y = 0,
@y
@L
= x2 + y 2 4 = 0.
@
First observe that if = 0 the two first equations are exactly those we solved to find the
critical points in the interior and they are not on the boundary. Hence 6= 0. Then
assume > 0. By the first equation x = 0 and then by the constraint (third equation)
y = ±2. It is then immediate from the second equation that x = 0, y = 2, = 1/2 and
x = 0, y = 2, = 3/2 are solutions. For = 1 we see that the first equation always is
satisfied. Using this value in the second equation we obtain the solutions y = 0 and y = 3,
of which only y = 0 is compatible with the third equation, leading to x = ±2. Discarding
the Lagrange multiplier we see that we have identified the same four critical points as
before, (2, 0), (0, 2), ( 2, 0), (0, 2).
Putting it together
Once we have found the critical points in the interior and classified them and we have
identified the points on the boundary in which a local extremum may be taken in the
constrained configuration, we are ready to find the point minimising f (x) over ⌦. Assume
that those points are the set C = {x1 , . . . , xm }, if there are m candidates for the local
extremum. The minimum (or maximum) is then obtained by taking the minimum of f
over the set C
min f (x).
x2C
Examples 1.40 In our example we have found an interior local minimum at (0, 1) and
critical points on the boundary at (0, ±2), (±2, 0). If we evaluate the function f in these
points we find that the minimum f = 14/3 is taken in the point (0, 2). In the local
minimum in the interior the function takes the value f (0, 1) = 1/6.
1.6 An Interlude on Vector Fields

The functions we have discussed so far are known as scalar functions since their ranges
are the real numbers, i.e., the output of each function is a real number. At this point, we
introduce a more general type of function which we call a vector function or vector field.
In three dimensions, such a function assigns to each vector in R3 another vector in R3 , i.e.,
F : R3 ! R3 .
Let’s take a moment to review vector notation. In these notes, boldface letters represent
vectors. When handwriting things, we use an arrow over the top of the symbol to denote
that it’s a vector. We express any vector v as a linear combination of the Cartesian unit
vectors i, j, and k, i.e., given v, there are three unique real numbers v1 , v2 and v3 such
that v = v1 i + v2 j + v3 k. In R3 , any point in space can also be identified with a unique
vector. For example, the point (x0 , y0 , z0 ) is identified with the vector r = x0 i + y0 j + z0 k.
This means that we can think of vectors in R3 as points in R3 and vice versa. This duality
will play an important role in the structure of the theory to come.
Definition 1.41 A vector function, or vector field, is a function F : R3 ! R3 . Using the

decomposition into Cartesian coordinates, we write
F (x, y, z) = F1 (x, y, z)i + F2 (x, y, z)j + F3 (x, y, z)k,
where each component function is a scalar function, i.e., Fi : R3 ! R, i = 1, 2, 3.
Note: It is not uncommon for a vector function in R3 to be written as an ordered triple,

F1 (x, y, z), F2 (x, y, z), F3 (x, y, z) , which is simply an application of the point-vector du-
ality mentioned above. We will avoid this in our course, but I want to make you aware of
it.
Before thinking about vector functions geometrically, let’s simply write down some
examples to get a feel for these beasts.
Examples 1.42 1. We define the vector function r : R3 ! R3 by
r(x, y, z) := xi + yj + zk.
Explicitly, the component functions are r1 (x, y, z) = x, r2 (x, y, z) = y, and r3 (x, y, z) =

z. This vector function takes a point in space and returns the vector corresponding
to that point, i.e., the vector whose tail is at the origin and whose tip is at the point
(x, y, z).
2. We define the vector function F : R3 ! R3 by
F (x, y, z) := i + x2 yj + cos(xyz)k.
Explicitly, the component functions are F1 (x, y, z) = 1, F2 (x, y, z) = x2 y, and F3 (x, y, z) =

cos(xyz).
3. We define the vector function G : R3 ! R3 by
G(x, y, z) := yi + xj.
Explicitly, the component functions are G1 (x, y, z) = y, G2 (x, y, z) = x, and G3 (x, y, z) =

0. Although the coefficient of the k unit vector is 0, this is still a valid vector function
on R3 .
4. We define the vector function F : R3 ! R3 by
r(x, y, z) := i + 2j 13k.
Explicitly, the component functions are F1 (x, y, z) = 1, F2 (x, y, z) = 2, and F3 (x, y, z) =

13. This is an example of a constant vector field. We simply assign the same vector
to each point in R3 .
5. (A really important example) Given a scalar function f : R3 ! R, we define the

gradient vector function
~ (x, y, z) := @f (x, y, z)i + @f (x, y, z)j + @f (x, y, z)k.

rf
@x @y @z
The component functions are the partial derivatives of the scalar function f . This is
one way to get a vector function from a scalar function.
A practical example of a vector field is the function which assigns to each point in
the Earth’s atmosphere a vector pointing in the direction of the wind at that point and
having a magnitude equal to the wind speed at that point. The information contained
in this function not only tells us about the speed of the wind at each point, but also the
direction. That is the key to vector fields. They contain two pieces of information, direction
and magnitude, because those are exactly the two pieces of information that determine a
vector.
Visualizing vector fields

The way to visualize a vector field is by imagining that at every point (x, y, z) in space,
we attach a vector F (x, y, z). For example, suppose that F (1, 2, 0) = i + 2j. We can
visualize this by drawing a picture. First, we locate the point (1, 2, 0). Then we draw the
vector i + 2j with its tail at the point (1, 2, 0). This tells us what the vector function F is
doing at this particular point. If we want to know what is happening elsewhere, we would
have to repeat this process at other points. One can get a good idea of what a vector field
“looks like” if we do this procedure for a number of points, though not too many or the
drawing becomes unreadable. Let’s do this for a few examples.
Examples 1.43 1. Consider the vector function r : R3 ! R3 by
r(x, y, z) := xi + yj + zk.
This one is relatively easy to picture. At each point (x, y, z), we take the vector from
the origin to the point and just translate it so that its tail is at the point. For example,
at the point (1, 0, 0), we take the vector i and translate it along so that its tail is at
(1, 0, 0). To the point (1, 1, 1), we take the vector i + j + k and move it until its tail
is at (1, 1, 1). You get the picture now. If we consider r(x, y, 0) : R2 ! R2 we can
visualize the vector field in two dimensions which is easier. See the left plot of Figure
1.10 (produced using the matlab script vectors.m).
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
Figure 1.10: Left: the vectorfield r(x, y) := xi + yj. Right: the vectorfield F (x, y) :=
yi + xj.
2. Consider the vector function F : R3 ! R3 by
F (x, y, z) := yi + xj.
Notice that since there is no k component, the resulting vector field will just be a
vertical translation of what happens in the xy-plane. To the point (1, 0, z), for any
z, we attach the vector j. To the point (0, 1, z), for any z, we attach the vector i,
and so on. See the right plot of Figure 1.10.
3. Consider the constant vector function F : R3 ! R3 by
F (x, y, z) := i + 2j 13k.
In this case, we attach the same vector, i + 2j 13k to every point in R3 .

The introduction of vector functions into the mix adds a whole new level of power to out
methods, but it also complicates things as far as calculus is concerned. However, the saving
grace is that we understand vector fields by using the scalar coordinate functions, for which
we have already developed multivariable calculus. So things aren’t too bad, but the way
in which calculus is used with vector functions can be confusing at the start, so we will
introduce the necessary objects gradually and (hopefully) in an intuitive manner.
Scalar functions from vector functions

As you might imagine, now that vectors are in play (via vector functions), we can begin
using vector operations like the dot product. For example, given a vector field F , we can
take its dot product with some fixed vector v. The resulting function, F (x, y, z) · v, is
a scalar function. It might look horrible, but we will have to get used to such things.
Another scalar function that we can get from a vector function is taking the norm of the
vector function, i.e., |F (x, y, z)|. Let’s look at some examples.
Examples 1.44 1. Consider the vector function r(x, y, z) := xi + yj + zk and the

vector v := i + 2j k. The dot product r(x, y, z) · v is a scalar function which we
can write down explicitly. We have
r(x, y, z) · v = (xi + yj + zk) · (i + 2j k) = x + 2y z.
Notice that for most points, the value of this scalar function will be nonzero. However,
there are points where the value is 0 and it would be worth your time to think a little
bit about what that would mean geometrically. Consider for instance F (x, y, z) :=
yi + xj and r(x, y) := xi + yj. What is the value of F · r? Can you deduce this
from the plots of Figure 1.10?
2. Consider r(x, y, z) := xi + yj + zk again. We can get another scalar function from

r by considering its norm, i.e.,
p
|r(x, y, z)| := x2 + y 2 + z 2 .
This function returns the distance from the point (x, y, z) to the origin. What are
the level surfaces of this function? The formula given above gives the answer to you
in an algebraic form, but can you explain the shape of the level surfaces on a purely
geometric basis?
3. We now consider a slightly more complicated situation. Let f : R3 ! R be defined by

f (x, y, z) := x2 y+z 3 . Let’s find an explicit expression for the scalar function resulting
from the dot product of rf ~ (x, y, z) with the vector function r(x, y, z) := xi + yj + zk.
We have
~ (x, y, z) · r(x, y, z) = x @f (x, y, z) + y @f (x, y, z) + z @f (x, y, z)

rf
@x @y @z
2 2
= x(2xy) + y(x ) + z(3z )
= 3x2 y + 3z 3 .
These are the sort of games that we will play with vector functions. The notation can
be a little bit intimidating at first, but if you simply keep a clear head and apply the
definitions, things will be all right.
Di↵erential operators acting on vectorfields

We have seen that the gradient can be written as a vector of di↵erential operators
~ := @ i + @ j + @ k.
r
@x @y @z
We can then formally apply it to a vectorfield using the operations we know from linear
algebra to obtain other di↵erential operators that may be applied to vectorfields (recall
that the gradient is applied to scalar functions). Let F : R3 7! R3 . If we first consider the
dot product we obtain what is known as the del operator or the divergence operator:
✓ ◆
~ @ @ @ @F1 @F2 @F3
r·F = i+ j + k · (F1 i + F2 j + F3 k) = + + .
@x @y @z @x @y @z
~ and F we obtain what is

If on the other hand we consider the cross product between r
known as the curl of the vector F .
i j k ✓ ◆ ✓ ◆ ✓ ◆
~ ⇥ F := @ @ @ @F3 @F2 @F1 @F3 @F2 @F1
r @x @y @z = i+ j+ k.
@y @z @z @x @x @y
F1 F2 F3
These di↵erential operators are very important in Mathematical Physics and you will see
them repeatedly in coming courses.

1402 Notes

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

1402 Notes

Hochgeladen von

Copyright:

Verfügbare Formate

1402: Mathematical Methods 2

March 13, 2017

1.1 Multivariable Functions

Figure 1.1: Graph of the function f (x) = x2 for x 2 [ 5, 5].

Examples 1.2 1. f : R2 ! R, where f (x, y) = x + y.

2. g : R2 ! R, where g(x, y) = 3xy + y 3

4. With h as above, if it is understood that F, ~, and ~ are real numbers, then

As with functions of a single variable, it is useful to have methods for visualizing

Examples 1.5 1. If h : R2 ! R is defined by h(x, y) = x2 + y 2 , then the graph of h is

Figure 1.2: Graph of the function h(x, y) = x2 + y 2 for x 2 [ 5, 5], y 2 [ 5, 5].

Figure 1.3: The coordinate planes.

2. The function f : R2 ! R defined by f (x, y) = 2 is a constant function. The graph

Figure 1.4: Cross-section of z = x2 + y 2 by z = 2. Right plot view parallel to the z-axis.

Therefore, z = 4 + y 2 is a parabola pointing upward and shifted up by 4 (in the yz-plane,

Definition 1.9 Consider the function f : R2 ! R. The cross-section of the graph of f

Figure 1.5: Cross-section of z = x2 + y 2 by x = 2.

20 40 60 80 100 120 140 160 20 40 60 80 100 120 140 160

Figure 1.6: Left contours of f (x, y) = x2 + y 2 . Right contours of f (x, y) = 2x + y 2 .

Definition 1.11 Given a function f : R3 ! R, we define a level surface of f to be the set

Alternative Coordinate Systems

Figure 1.7: Left: illustration of cylindrical coordinates. Right: illustration of spherical

1. f (x, y, z) = x2 + y 2 + z 2 . This becomes f (⇢, ✓, ) = ⇢2 . In many ways, this is better.

2. g(x, y, z) = xy + y 2 + xz. I encourage you to carry out this computation yourself.

1.2 Partial Di↵erentiation

Functions of two variables

Definition 1.16 Let f : R2 ! R. The partial derivative of f with respect to x at (x0 , y0 )

Functions of three variables

Definition 1.17 Let f : R3 ! R. The partial derivative of f with respect to x is

Examples 1.18 Computing Partial Derivatives

1. Suppose we have the function f : R2 ! R given by f (x, y) = x2 + y 2 . Let’s find the

3. Let g : R3 ! R be defined as g(x, y, z) = xy + y + 2. What are the three partial

or with respect to y, which gives

Some geometric aspects of partial di↵erentiation

h(x) ⇡ h(x0 ) + h0 (x0 )(x x0 ).

This approximation of h by a line (or linear function) is of great use to a mathematician

Figure 1.8: Tangent plane.

~ (x0 , y0 , z0 ) := @f (x0 , y0 , z0 )i + @f (x0 , y0 , z0 )j + @f (x0 , y0 , z0 )k.

Figure 1.9: Directional derivative.

As a result, we define the directional derivative as follows.

Example 1.22 1. Let f : R3 ! R be defined by f (x, y, z) = x2 + y 2 + z 2 . To find p

By definition, the directional derivative of f at the point (1, 2, 3) in the direction of

Clearly, f is changing in the direction of v.

The Chain Rule

Now, we use the tangent plane approximation for f , i.e.,

This results in the following:

Theorem 1.24 (The Chain Rule) Let f : R2 ! R, x : R ! R and y : R ! R. For the

There is a (slightly) more memorable version of the formula, i.e.,

Since x0 (t) = 2t, y 0 (t) = 2, @f /@x(x, y) = sin(y), and @f /@y(x, y) = x cos(y), we

1.3 The Unit Normal Vector

p interesting example. Define f : R ! R by f (x, y) :=

Construction of the unit normal vector

1.4 Geometric Significance of the Gradient

We summarize these properties as follows.

Geometric properties of the gradient vector field

1.5 The mean value theorem and Taylor’s formula

In order to generalise this to functions of several variables we consider f : R3 7! R and,

h(t) := f (y + t(x y)) = f (t), where (t) = y + t(x y).

f (x) f (y) = h(1) h(0) = h0 (⌘), with ⌘ 2 [0, 1].

Applying the chain rule to h(t) we obtain

1.5.1 Taylor’s formula

h(t) := f (y + t(x y)) = f (t), where (t) = y + t(x y).

This matrix is known as the Hessian. It is straightforward to show that

We may summarize the above results in a theorem.

Theorem 1.31 Let f : R3 7! R be twice di↵erentiable with continuous Hessian H = (hij ),