Sie sind auf Seite 1von 67

Numerical Methods

for
Mathematics Students
Part 1: The Elements
by

John Loustau
Department of Mathematics and Statistics
Hunter College (CUNY)

© John Loustau 2008 & 2009. All rights reserved.


2 NMfMS_part1.nb

Table of Contents
Part 1: The Elements
Introduction to Part 1

Chapter 1: Beginnings

Introduction

1. Basics of Programming in Mathematica

2. Errors in Computation

Chapter 2: Finding Roots

Introduction

1. Newton's Method

2. Secant Method

3. Linear Systems of Equations

4. Functions of Several Variables: Finding Roots and Extrema

Chapter 3: Interpolating and Fitting

Introduction

1. Polynomial Interpolation

2. Bezier Interpolation

3. Least Squares Fitting

4. Cubic Splines and B-Splines

Chapter 4: Numerical Differentiation and Integration


2. Bezier Interpolation
NMfMS_part1.nb 3
3. Least Squares Fitting

4. Cubic Splines and B-Splines

Chapter 4: Numerical Differentiation and Integration

Introduction

1. Finite Differences and Finite Difference Method

2. Numerical Integration 1: Trapezoid Method and Simpson's Rule

3. Numerical Integration 2: Midpoint Method and Gaussian Quadrature

4. An Application: First Order ODE

References
4 NMfMS_part1.nb

Introduction to Part 1

With the advent of the mathematical fourth generation language (4GL) the standard numerical methods course
has changed dramatically. At one time the most challenging aspect of a numerical methods course was the
programming. Students were able to execute only a few routines. Graphical output was beyond the reach of any
beginning class. Now programming with sophisticated graphical output is routine. What was once a two-
semester course is now a one-semester course.

This set of lectures is for a one-semester introductory course in numerical methods. The material we present is
the standard fair for a first class in numerical methods. We start with a discussion of error, work through
standard equation solution procedures in one and several variables. We include several techniques in curve
fitting and interpolation. Perhaps the most useful are the cubic splines. We pay special attention to this topic.
When we develop the finite difference we also introduce the finite difference method (FDM) to numerically
solve a differential equation and apply it to the one dimensional heat equation. Under numerical integration we
formulate several numerical techniques including the trapezoid, midpoint, Simpson's rule and touch on Gauss-
ian quadrature. Whenever there are competing procedures leading to similar results we pay attention to the
trade-offs, speed in processing, accuracy and reliability.

We do develop numerical methods as a tool more than as a topic. We often omit several important topics
included in introductory textbooks. It is our intention to prepare students to solve standard undergraduate level
problems with a 4GL, use the graphical techniques to gain insight in material arising in other courses and open
the way to more advanced course work in numerical methods. We certainly do omit material that will only be
useful later. For instance we omit Lagrange and Hermite polynomials. These are critical when doing finite
element method. We anticipate that they will be developed then.

These materials are based on Mathematica version 6.0 by Wolfram Research. However, most any mathemati-
cal 4GL will do. With this in mind the reader should be aware that each mathematical 4GL is different. Mathe-
matica contains excellent procedures for computing the anti-derivative of a function. When computing the
definite integral, it will first attempt to find an anti-derivative and use numerical techniques only when this
search fails. In the section on calculation errors we include a problem. The output will be significantly different
when executed in a different 4GL.

As our students are mathematics and statistics majors, we do include the mathematical foundations. Since they
have already taken a basic course in linear algebra and multivariate calculus, standard results from these topics
are referenced as they arise. Other results that may not be part of the students background are included. On the
other hand our students often have little or no prior programming background. Hence, the course is presented
in a twice weekly format with one lecture session and one session in the lab. Early in the semester the lab time
focuses on programming issues. But as the semester progresses the students become more proficient program-
mers. Then the lab session becomes another opportunity to develop the mathematical foundation of the topic.

Most students taking the course find that it helps them to feel comfortable with multivariate calculus and linear
algebra. In these terms it provides a foundation for more advanced theoretical course work. Some students find
careers programming in this or another 4GL. Still others use this course as the first step toward course work in
parameter estimation in statistics and course work in numerical solutions to differential equations.

We develop the basics of Mathematica programming in the first two chapters. We begin with the elementary
arithmetic operations in Chapter 1. In Chapter 2 we introduce loops and conditionals along with special func-
focuses on programming issues. But as the semester progresses the students become more proficient program-
mers. Then the lab session becomes another opportunity to develop the mathematical foundation of the topic.
NMfMS_part1.nb 5

Most students taking the course find that it helps them to feel comfortable with multivariate calculus and linear
algebra. In these terms it provides a foundation for more advanced theoretical course work. Some students find
careers programming in this or another 4GL. Still others use this course as the first step toward course work in
parameter estimation in statistics and course work in numerical solutions to differential equations.

We develop the basics of Mathematica programming in the first two chapters. We begin with the elementary
arithmetic operations in Chapter 1. In Chapter 2 we introduce loops and conditionals along with special func-
tions that compute the derivative and integral and solve a linear system of equations. Along the way we
develop useful terminology and techniques that are essential in the following chapters.

Numerical methods is a topic of trade-offs. For any given problem, there is rarely a single technique that works
best in all circumstances. Rather, we must decide what is important and employ the procedure that will satisfy
our needs. Generally speaking, the issues are accuracy, efficiency and control. Each time we develop a tech-
nique, we will present its strengths and weaknesses.
6 NMfMS_part1.nb

Chapter 1 Beginnings

Introduction

This chapter provides a brief introduction to programming for those who have never programmed before. For
those who have programmed, this is the introduction Mathematica. One of the several advantages to using a
4GL such as Mathematica is that it makes numerical methods accessible to all students with multivariate
calculus and linear algebra. Indeed most students catch on very quickly to programming in Mathematica and
are doing complicated programs well before the end of the course. To support the learning process, there are
tutorials available by selecting Help from the system menu and then Documentation Center. The first item
available in documentation contains the basic tutorials. For those who prefer hard copy references, we suggest
the Schaum's Outline Series: Mathematica listed in references at the end.

For a reader with prior programming experience, this chapter serves to introduce the quirks of Mathematica.
Some of the quirkiness of Mathematica has to with that fact that it was originally developed by mathemati-
cians, and therefore it has a mathematician's point of view. If your background is with VB or C++ or another of
the computer scientist developed programming products you will find Mathematica to be similar on the surface
but significantly different at lower levels. If you have never programmed in an interpreted 4GL, then you have
something to get used to.

In this chapter we introduce the terminology associated to computer error. Computers must necessarily repre-
sent decimal values in a finite number of significant digits. Often the representation is close to the actual value
but not exact (for instance €13 is not 0.3333333 ). During normal arithmetic operations, the error inherent in the
representation is often magnified yielding ridiculous results. When using any computer system you most
always be cognizant of the potential of error in your calculations.

1. Basics of Programming in Mathematica

We use Mathematica version 7.

During this semester you will be programming in Mathematica. To begin with, you will need to be able to do
the following.

1. The basic arithmetic operations (addition, subtraction, multiplication, division, exponentiation and roots.)
2. Define a function
3. Loops (Do Loop or While Loop)
4. Conditionals (If ... then ... else )
5. Basic graphics (point plot, function plot, parametric plot)
6. Basic matrix and vector operations.

You can find descriptions of each of these in the first two or three chapters of Schaum's Outline of the Theory
and Problems of Mathematica [1] In addition Mathematica version 6 includes a tutorial accessible via the Help
menu. There are some comments that need to be made.
2. Define a function
3. Loops (Do Loop or While Loop)
4. Conditionals (If ... then ... else ) NMfMS_part1.nb 7

5. Basic graphics (point plot, function plot, parametric plot)


6. Basic matrix and vector operations.

You can find descriptions of each of these in the first two or three chapters of Schaum's Outline of the Theory
and Problems of Mathematica [1] In addition Mathematica version 6 includes a tutorial accessible via the Help
menu. There are some comments that need to be made.

A. Error messages in Mathematica are cryptic at best. After a while you will begin to understand what they
mean and use them to debug your program. But this will take some experience with Mathematica.

B. When multiplying a and b, you may write a* b or a b (with a space between). But if you write ab (without a
space), then Mathematica will think that ab is a new variable instead of the product of two variables. Because
of how Mathematica spaces input, it is not always easy to distinguish between a b and ab. You will save
yourself a lot of time if you always use the aesterick for multiplication.

C. You may be used to using the three grouping symbols, parentheses,( ), square brackets,[ ], and braces, { },
interchangeably. In Mathematica you may not do this. Parenthesis may only be used in computations. Square
brackets are only used around the independent variables of a function while braces are only used for vectors
and matrices. For instance each of the following expressions will cause an error in a Mathematica program:

[a + bD2, f (x), (x,y).

The correct expressions are

(a + bL2, f [x], {x, y}.

D. When defining a function in your program, you always follow the independent variable(s) with an under-
score. This is how Mathematica identifies the independent variable. Later when you reference the function, you
must not use the underscore. For instance the following statements defines a function as xe-x, evaluates the
function at x = 1 and then defines a second function as the derivative of the first.

f [x_] = x*Exp[-x];

y = f[1];

g[x_] = D[ f[x], x];

E. All reserved words in Mathematica begin with upper case letters. When you define a function or a variable,
it is best to use names that begin with lower case letters. This way functions and variables you define will not
clash with the ones Mathematica has reserved. For instance, 'Pi' is 3.14...., and 'I' is -1 . Any attempt to use
these symbols for any other purpose will at the very least cause your output to be strange.

F. There are two ways to indicate the end of a line of program code. First, you may simply hit the 'return' key
and go to the next line. Second, you may end the line with a semicolon. However, the result upon execution is
lightly different. In the first case the result of the line of code is printed when the code is executed. In the
second the output of the line of code is suppressed. For instance if you enter

3+5

and execute, Mathematica will return the value, 8. On the other hand if you enter
these symbols for any other purpose will at the very least cause your output to be strange.

8F. There are two


NMfMS_part1.nb ways to indicate the end of a line of program code. First, you may simply hit the 'return' key
and go to the next line. Second, you may end the line with a semicolon. However, the result upon execution is
lightly different. In the first case the result of the line of code is printed when the code is executed. In the
second the output of the line of code is suppressed. For instance if you enter

3+5

and execute, Mathematica will return the value, 8. On the other hand if you enter

3 + 5;

and execute, then there is no printed output.

G. It is best to have only one line of program code per physical line on the page. For short programs, violating
this rule should not cause any problems. For long and involved programs, debugging is often a serious prob-
lem. If you have several lines of code on the same physical line, may have trouble noticing a particular line of
code that is causing an error. For instance,

z = x + y;
x = z + 1;

is preferred over

z = x + y; x = z + 1;

2. Errors In Computation

Errors arise from several sources. There are the errors in data collection and data tabulation. This sort of data
error needs to be avoided as much as possible. Usually this is accomplished by quality assurance procedures
implemented at the team management level. This is not our concern in numerical methods. Programming errors
are also a quality control issue. These errors are avoided by following good practices of software engineering.
For our own programs, we are best advised to be a simple as possible: simple in program design, simple in
coding. A mundane program is much easier to control and modify than a brilliant but somewhat opaque one.
The simple one may take longer to code or longer to execute, but within bounds is still preferable.

There are errors that arise because of the processes we use and the equipment that we execute on. Both are
errors due to the discrete nature of the digital computer. These errors cannot be prevented and hence must be
controlled via error estimation.

First, the computer cannot hold an infinite decimal. Hence, the decimal representation of fractions such as €13
and €23 are inherently incorrect. Further, subsequent computations using these numbers are incorrect. A small
error in the decimal representation of a number when carried forward through an iterated process may accumu-
late and result in an error of considerable size. For instance when solving a large linear system of equations, an
error introduced in the upper left corner will iterate through the Gauss-Jordan process causing a significant
error in the lower right corner.

The second type of error arises from discrete processes. For instance, suppose you have an unknown function f
(x). Suppose also that you know that f (1) = 1 and f (1.1) = 1.21. Then it is reasonable to estimate
First, the computer cannot hold an infinite decimal. Hence, the decimal representation of fractions such as €13
and €23 are inherently incorrect. Further, subsequent computations using these numbers are incorrect. A small
NMfMS_part1.nb 9

error in the decimal representation of a number when carried forward through an iterated process may accumu-
late and result in an error of considerable size. For instance when solving a large linear system of equations, an
error introduced in the upper left corner will iterate through the Gauss-Jordan process causing a significant
error in the lower right corner.

The second type of error arises from discrete processes. For instance, suppose you have an unknown function f
(x). Suppose also that you know that f (1) = 1 and f (1.1) = 1.21. Then it is reasonable to estimate

df 1.21 - 1
dx
(1) » 1.1-1
= 0.21
0.1
= 2.1.

If in fact, f (x) = x2, then our estimated derivative is off by 0.1. But without knowing the actual function, we
have no choice but to use the estimation. We are faced with one of two alternatives, doing nothing or proceed-
ing with values that we expect are flawed. The only reasonable alternative is to proceed with errors provided
we can estimate the probable error.

We introduce some simple terminology. Suppose that there is a computation and that x denotes the actual value
~
while x denotes the value that we computed then

~
e=x-x

~ ~
x| x
is called the error. In turn |x - x| is called the absolute error and x
is called the relative error. The relative
absolute error is defined analogously.

It is reasonable to ask why we should care about e if we already know x. Certainly this is the case. However it
is useful to have precise definitions for these terms. More importantly there are situations where we can esti-
mate the error without knowing the actual value.

A second comment is in order. It is preferable to use the relative or relative absolute error. This is because these
values are dimensionless. For instance, consider the example of the derivative of the squaring function. If the
data is given in meters, then e = 0.1meters. If the data were instead displayed in kilometers then e = 0.001 and
or centimeters, e = 10. Even though the error is the same, the impression is different. However, for relative
error the value is 0.1, independent of how the data is represented.

In Exercise 1 below, you are asked to execute a simple calculation which should always yield the same result
independent of the values used. In this problem you are asked to use several different values of x. Unexpect-
edly, the results will vary across a broad spectrum of possible answers. In a simple calculation like this it is
possible to look at how the numbers are represented and determine why the error occurs. But in any actual
situation the calculations are so complex that such an analysis is virtually impossible. When executing a compu-
tation you should always have a idea of what the results should be. If you get impossible output and there is no
error in your program, then you are looking at small errors in numerical representation compounded over
perhaps thousands of separate arithmetic operations.

Exercises:

Hx+yL2 | 2 xy | y2
1. Consider the function f(x, y) = x2
. We expect that if x ¹ 0, then f(x, y) = 1. Set y = 103 and com-
pute f for x = 10.0-1 , 10.0-2, 10.0 , 10.0 , 10.0-5, 10.0-6, 10.0-7, 10.0-8. For each value of x compute the
-3 -4

absolute error.
2. Redo Problem 1 using x = 10-1, 10-2, 10-3, 10.0-4, 10.0-5, 10.0-6, 10.0-7, 10.0-8. Why are the results
tation you should always have a idea of what the results should be. If you get impossible output and there is no
error in your program, then you are looking at small errors in numerical representation compounded over
perhaps
10 thousands of separate arithmetic operations.
NMfMS_part1.nb

Exercises:

Hx+yL2 | 2 xy | y2
1. Consider the function f(x, y) = x2
. We expect that if x ¹ 0, then f(x, y) = 1. Set y = 103 and com-
pute f for x = 10.0-1 , 10.0-2, 10.0 , 10.0 , 10.0-5, 10.0-6, 10.0-7, 10.0-8. For each value of x compute the
-3 -4

absolute error.
2. Redo Problem 1 using x = 10-1, 10-2, 10-3, 10.0-4, 10.0-5, 10.0-6, 10.0-7, 10.0-8. Why are the results
different.
NMfMS_part1.nb 11

Chapter 2: Finding Roots

Introduction

In this chapter we develop the basic procedures necessary to solve equations and locate the extrema of a func-
tion. We do this in the context of a course supported by a 4GL programming language. We have all solved
max/min problems and solved systems of linear systems of equations with pencil and paper. For the most part
we do this for carefully selected text book problems. You may have found these problems to be tedious. Now
we find that Mathematica makes these problems effortless and routine. Even in cases with hundreds of
unknowns.

We begin the chapter with Newton's method. Most calculus courses include Newton's method for finding roots
of differentiable functions. If you have done a problem with pencil and paper you know that doing two or three
iterations of Newton's method is a nightmare. Even the simplest cases are not the sort of thing most students
want to do. Now we see that it is easy to program and provides an excellent problem for the beginning student.
In addition Mathematica provides a built in function that performs the method.

We follow Newton's method by the secant method to find the root of a function. Then we look at linear systems
of equations with nonsingular coefficient matrix. Again Mathematica provides a built-in function to solve
linear systems. You have undoubtedly solved several systems via Gauss Jordan elimination. Now you can solve
systems with thousands of unknowns in a minute or two. Additionally there is a Mathematica function that
describes the particulars of any execution of the Gauss Jordan process.

We end the chapter considering max/min problems in several variables. Along the way we show that finding
the roots of a functions of several variables may be recast as a max/min problem. In addition these procedures
can be used to solve linear systems with singular coefficient matrix.

1. Newton's Method
~ ~
Suppose that you have a function f(x) = y and want to find a root for f. Recall that x is a root of f provided f(x) =
0. If f is continuous and f(x1) > 0 and f(x2) < 0 then you would know that f must have at least one root between
x1 and x2. This result (The Intermediate Value Theorem) was stated but not proved in Calculus I. The intermedi-
ate values theorem is usually proved in Real Analysis. There is an intuitively simple but inefficient means to
~ x1 +x2 ï ï ï
determine a good approximation for x. First consider the midpoint of x1 and x2, = x. If f(x) = 0, then x is a
ï ï ï ï
2

root. If f(x) > 0, then replace x1 by x and repeat the process. If f(x) < 0, then replace x2 by x and repeat. The
following Mathematica code segment demonstrates this process. We use the fact that a*b > 0 if and only if a
and b have the same sign.

test-root = x1;
test-value = 10-5;
While[ Abs[ f[test-root]] > test-value,
x1 + x2
test-root = 2
;
If[ f[test-root]*f[x1] ³ 0,
x1 = test-root,
ï ï ï ï
root. If f(x) > 0, then replace x1 by x and repeat the process. If f(x) < 0, then replace x2 by x and repeat. The
following Mathematica code segment demonstrates this process. We use the fact that a*b > 0 if and only if a
and bNMfMS_part1.nb
12 have the same sign.

test-root = x1;
test-value = 10-5;
While[ Abs[ f[test-root]] > test-value,
x1 + x2
test-root = 2
;
If[ f[test-root]*f[x1] ³ 0,
x1 = test-root,
x2 = test-root
];
];
Print[test-root];

In this code fragment, we assume that f , x1 and x2 are already known. Notice that we use 10-5 as the 'kickout'
threshold. In particular, as soon as the absolute value of f at the current test-root is less than 10-5, the process
stops and the approximate root is printed.

A more efficient means to locate an approximate root is called Newton's method. You probably saw this in
Calculus. In this case we suppose that f is differentiable and that we want to find a root for f near x1. Then the
idea is that the tangent line for f at x1 is a good approximation for f near x1. Now the equation for the tangent is
easily solved for a root, say x2. In addition we may suppose that x2 is closer to the root of f than x1.

For instance if f(x) = x Cos(x), then there is a root at A near x = 1.6. (See Figure 1.1)

Figure 1.1: f HxL = xCosHxL

1 A 2 3 4 5

-1

-2

-3

If we start the process with x = 2, f '(2) » -2.2347, f(2) » -0.8233 and the tangent to f at 2 is given by h(x) =
f '(2) (x - 2) + f(2). Now h crosses the x-axis at 1.62757. Figure 1.2 shows f together with the tangent.

Figure 1.2: f and the tangent at B = H2,f H2LL

1 2 3 4 5

-1 B

-2

-3
-3

If we start the process with x = 2, f '(2) » -2.2347, f(2) » -0.8233 and the tangent to f at 2 is NMfMS_part1.nb
given by h(x) = 13
f '(2) (x - 2) + f(2). Now h crosses the x-axis at 1.62757. Figure 1.2 shows f together with the tangent.

Figure 1.2: f and the tangent at B = H2,f H2LL

1 2 3 4 5

-1 B

-2

-3

If we write this out mathematically, then starting at x1, the tangent line has slope f '(x1) and passes through the
point (x1, f(x1)). Hence,

y | f Hx1 L
f '(x1) = x | x1
.

Setting y = 0 and solving for x, we get

f''(x1) (x | x1) = | f Hx1),

or

f Hx1 L
f ' Ix1 M
(2.1.1) x = x1 - .

Replacing x1 by x we get an iterative process that we can repeat until the absolute value of f Hx1) is less then
some threshold value.

Newton's method is implemented in Mathematica via the FindRoot command. For instance f(x) = xCos(x) has a
root between x = 1 and 3. The following Mathematica statement will implement Newton's method to get an
approximate value for the root.

x-root = FindRoot[ xCos(x) == 0, {x, 2} ];

If you do a search on FindRoot in Mathematica help, you will see that there are options available to the program-
mer. You may set the threshold or use the default kickout value imposed my Mathematica. In addition, you
may set the number of iterations for the process or accept the Mathematica preset iteration count.

Before passing on, several words of caution are in order. First, if f ' Ix1M = 0 then the process will fail. Indeed,
in this case, the tangent line is parallel to the x-axis. Second, if f ' Ix1M is not zero, but nearly zero, then x given
by (2.1.1) will be far from x1. Indeed, for f(x) = x Cos(x), x1= 0.9, then the slope of f at x1 is approximately
0.0833 and the process will locate the root near 8. (See Figure 1.3)

Figure 1.3: f and the tangent at C = Hx1 , f Hx1 LL


6

4
may set the number of iterations for the process or accept the Mathematica preset iteration count.

Before passing on, several words of caution are in order. First, if f ' Ix1M = 0 then the process will fail. Indeed,
in this case, the tangent line is parallel to the x-axis. Second, if f ' Ix1M is not zero, but nearly zero, then x given
14 NMfMS_part1.nb

by (2.1.1) will be far from x1. Indeed, for f(x) = x Cos(x), x1= 0.9, then the slope of f at x1 is approximately
0.0833 and the process will locate the root near 8. (See Figure 1.3)

Figure 1.3: f and the tangent at C = Hx1 , f Hx1 LL


6

2
C

2 4 6 8
-2

-4

-6

-8

Finally, it is possible that Newton's method will fail to find any approximate root. Indeed the process may
cycle. In particular, starting at a value x1 you may pass on to a succession of values x2, x3, ... , xn only to have
xn = x1. Once you are back to the original value, then the cycle is set and further processing is useless. The
following diagram shows a function f where f(1) = f(-1) = 1, f '(1) = €12 and f '(-1) = €12 . Hence, x1=1, x2= -1,
x3=1 and so forth. The example is constructed using a Bezier curve. It is indeed a function but not the sort of
thing one would expect under normal circumstances. Later when we develop Bezier curves we will revisit this
example.

Figure 1.4: A case where Newton's method cycles


1.0

0.8

0.6

0.4

0.2

-1.0 -0.5 0.5 1.0

The best rule of thumb is that you should know your function before employing Newton's method.

Exercises:

1. Plot the function f(x) = xe| x- 0.16064.


a. Use FindRoot to locate a root near x = 3.
b. Write a program in Mathematica implementing Newton's method. Use your program to locate a near x = 3.
Set the maximal number of iterations to 100 and the kickout threshold to 10| 5. How many iterations does your
program actually execute before it stops?
c. Redo (b) with a kickout threshold set to 10-2. Using the result of (a) as the actual and this result as the
computed, calculate the relative absolute error.

Figure 1.5: f HxL =


x
x2 + 1

0.4
a. Use FindRoot to locate a root near x = 3.
b. Write a program in Mathematica implementing Newton's method. Use your program to locate a near x = 3.
Set the maximal number of iterations to 100 and the kickout threshold to 10| 5. How many iterations does your15
NMfMS_part1.nb
program actually execute before it stops?
c. Redo (b) with a kickout threshold set to 10-2. Using the result of (a) as the actual and this result as the
computed, calculate the relative absolute error.

Figure 1.5: f HxL =


x
x2 + 1

0.4

0.2

-15 -10 -5 5 10 15

-0.2

-0.4

x
2. Figure 1.5 show the graph of f(x) = x2 + 1
together with the point (1.5, f(1.5)).
a. Use FindRoot to solve f(x) = 0 starting at 1.5. What happens? Why?
b. Write your own program to execute Newton's method starting at x = 1.5. What is the output for the first 10
iterations?

2. Secant Method

The secant method is closely related to Newton's method. On the one hand it is less efficient, while on the other
it is free of the anomalies which may arise with Newton's method. In particular, it will always locate an approxi-
mate root and the root will be in the intended vicinity.

For the secant method we begin with a known positive and negative value for the function. Suppose that the
function is given by f(x) = y. If f is continuous and f(x1) > 0 and f(x2) < 0 then you know that f must have at
least one root between x1 and x2. The secant method proceeds by considering the line connecting (x1, f(x1)) and
ï ï
(x2, f(x2)). We let x denote the point where the line (or secant) intersects the x-axis. If f(x) = 0 then we have the
ï ï ï
root between x1 and x2. If f(x) > 0 then we replace x1 with x and proceed. Otherwise, we replace x2 with x.

For instance if f(x) = x Cos(x), then there is a root between 1 and 2. Setting x1 = 1 and x2 = 2, then the line is
f H2L | f H1L ï
given by Λ(x) = 2|1
(x -1) and x = 1.39364. The following diagram shows f along with the secant.

Figure 2.1: f with the secant joining H1, f H1LL and H2, f H2LL
0.5

0.5 1.0 1.5 2.0 2.5 3.0


-0.5

-1.0

-1.5

-2.0

-2.5

-3.0
root between x1 and x2. If f(x) > 0 then we replace x1 with x and proceed. Otherwise, we replace x2 with x.

For instance if f(x) = x Cos(x), then there is a root between 1 and 2. Setting x1 = 1 and x2 = 2, then the line is
f H2L | f H1L ï
16 NMfMS_part1.nb
given by Λ(x) = 2|1
(x -1) and x = 1.39364. The following diagram shows f along with the secant.

Figure 2.1: f with the secant joining H1, f H1LL and H2, f H2LL
0.5

0.5 1.0 1.5 2.0 2.5 3.0


-0.5

-1.0

-1.5

-2.0

-2.5

-3.0

Returning to the general procedure, the points (x, y) on the secant must satisfy

f Hx2 L | f Hx1 L y | f Hx1 L


x2 | x1
= x | x1
.

Setting y = 0 and solving for x, we get

(2.2.1) x = x1 - f Hx1L
x2 | x1
f Hx2 L | f Hx1 L
.

Equation (2.2.1) is the operative statement in the sense that any program that implements the secant method
must include this statement. Programming the secant method is nearly the same as programming Newton's
method. In this you will replace (2.1.1) for Newton's method with (2.2.1) for the secant method. However there
is one wrinkle. In Newton's method you have a single seed which is replaced at each iteration by the current
approximate root. For the secant method there are two seeds, x1 and x2. When you compute x (via Equation
2.2.1, you must replace one of seeds before you compute the next x. For this purpose you will need to evaluate f
at the three points, x, x1 and x2. We know that f(x1) and f(x2) will have different signs. If f(x) and f(x1) have the
same sign, then replace x1, if they have different sign (hence, f(x) and f(x1) have the same sign, then replace x2.

The secant method is in fact an approximate form of Newton's method. You see this by rewriting (2.2.1) as

f Hx1 L
(2.2.2) x = x1 - f Hx2 L | f Hx1 L .
x2 | x1

f Hx2 L | f Hx1 L
If |x2 | x1| is small, then the denominator on the right hand side, x2 | x1
, is very near to f ' Ix1M. As the itera„
tive process proceeds, we should expect the successive values for x2 and x1 to converge together. In this case
the expression (2.2.2) for the approximate root via the secant method will converge to the expression given in
(2.1.1) when using Newton's method.

~
Suppose f is decreasing and concave down in the interval [x1, x2], as is the case for the example. Let x denote
ï
the approximate root derived from Newton's method and let x denote the approximate root derived from the
ï ~
secant method. It is easy to see that x £ x and that the actual root must lie between. The following figure illus-
trates this for the example case.

Figure 2.2: The case for function concave down in an interval


0.5

0.5 1.0 1.5 2.0 2.5 3.0


~
Suppose f is decreasing and concave down in the interval [x1, x2], as is the case for the example. Let x denote
ï NMfMS_part1.nb 17
the approximate root derived from Newton's method and let x denote the approximate root derived from the
ï ~
secant method. It is easy to see that x £ x and that the actual root must lie between. The following figure illus-
trates this for the example case.

Figure 2.2: The case for function concave down in an interval


0.5

0.5 1.0 1.5 2.0 2.5 3.0


-0.5

-1.0

-1.5

-2.0

-2.5

-3.0

ï ~ ï ~ ~ ï ~ ï
In particular, if x denotes the actual root, then |x - x| £ |x - x|. Similarly, |x - x| £ |x - x|. In particular |x - x| is an
~ ï
upper bound on the absolute error. In other words, we may use |x - x| to estimate the error without knowledge of
x. Notice that the same assertion holds if f is increasing and concave up in the interval [x1, x2].

The secant method is also implemented in Mathematica via the FindRoot command. For instance we know that
f(x) = x Cos(x) has a root between x = 1 and 2. The following Mathematica statement will implement the secant
method to get an approximate value for the root.

x-root = FindRoot[ x Cos(x) == 0, {x, 1, 2} ];

Again when using this option of the FindRoot statement, you have access to the iteration count and kickout
threshold.

Exercises:

1. Consider the function f(x) = xe| x- 0.16064.


a. Use FindRoot to locate a root between x = 2 and x = 3.
b. Write a program in Mathematica implementing the secant's method. Use your program to locate a root near x
= 3. Set the maximal number of iterations to 100 and the kickout threshold to 10| 5. How many iterations does
your program actually execute before it stops? How does the secant method compare to Newton's method?
c. Use the result of Exercise 1.b from the previous work on Newton's method along with the result of the prior
section to get a upper bond on the absolute error.

3. Linear Systems of Equations

Here we develop procedures for solving a linear system of equations with non-singular coefficient matrix. The
case for singular coefficient matrices will be presented in the next section.

First we need to consider the code necessary to do matrix arithmetic. The following statements define and
display a four dimensional column vector, v.

v = {1,2,3,4} ;
3. Linear Systems of Equations
18 NMfMS_part1.nb

Here we develop procedures for solving a linear system of equations with non-singular coefficient matrix. The
case for singular coefficient matrices will be presented in the next section.

First we need to consider the code necessary to do matrix arithmetic. The following statements define and
display a four dimensional column vector, v.

v = {1,2,3,4} ;
Print[MatrixForm[v]];

The next two steps define 4 by 4 matrices A and B. The following statement multiplies the two matrices, then
prints the product, C, as a matrix. The final step multiplies A times v to get the 4-tuple w. Finally, we multiply
w by a scalar 5. Notice that you use a period to multiply matrices and matrices times vectors whereas you use
the * to multiply scalars and vectors.

A = {{1,2,3,4},{2,3,4,5},{5,4,3,2},{4,3,2,1}};
B = {{0,1,0,0},{1,0,0,0},{0,0,0,1},{0,0,1,0}};
C = A . B;
Print[MatrixForm[C]];
w = A . v;
w = 5*w;

Some other useful linear algebra functions are Transpose[A], Transpose[v], Inverse[A], IdentityMatrix[n],
Norm[v] and Length[v]. The first one produces the transpose of the matrix A. The next one changes the col-
umn vector v into a row vector. The next two produce the inverse of A, and the n by n identity matrix. Norm[v]
returns the length of the vector v while Length[v] returns 4 if v is a four-tuple. Finally, A[[i]][[j]] = A[[i,j]]
returns the ijth entry of A and v[[i]] is the ith entry of the vector v.

Now consider a linear system Ax = b, where A is a non-singular n by n matrix and x, b lie in Rn. The condition
that A is non-singular assures us that the system has an unique solution given by x = A-1 b. In addition we know
that A is non-singular if and only if A is row equivalent to the n by n identity, In.

The process of solving a linear system of equations is called Gauss-Jordan elimination. This method is imple-
mented in Mathematica. Suppose you have the linear system

1 2 3 x1 1
4 5 6 x2 = 2 .
7 8 9 x2 3

In order solve this system in Mathematica, you define the coefficient matrix and constant vector via

coefMat = {{1,2,3},{4,5,6},{7,8,9}};
conVec = {1,2,3};

And then solve the system via the statement.

solVec = LinearSolve[coefMat, conVec];

The particular form of Gauss-Jordan elimination employed by Mathematica is called LU-Decomposition. In


order to understand what Mathematica does when solving a linear system, we need to look at LU-Decomposi-
coefMat = {{1,2,3},{4,5,6},{7,8,9}};
conVec = {1,2,3}; NMfMS_part1.nb 19

And then solve the system via the statement.

solVec = LinearSolve[coefMat, conVec];

The particular form of Gauss-Jordan elimination employed by Mathematica is called LU-Decomposition. In


order to understand what Mathematica does when solving a linear system, we need to look at LU-Decomposi-
tion. At this stage you need to recall a number of things about row equivalence, elementary row operations and
linear systems of equations.

1. Matrices A and B are row equivalent provided there exist elementary matrices E1, ..., Em with B = (Ûm
i=1 Ei )A.
2. Multiplication on the left by an elementary matrix implements the corresponding elementary row operation.
3. There are three elementary row operations.
4 The type-1elementary row operation exchanges two rows of a matrix. It is denoted by EHi, jL, indicating that
rows i and j are exchanged.
5. The type-2 elementary row operation multiplies a row by a nonzero scalar. It is denoted by EΑHiL, indicating
that row i is multiplied by Α.
6. The type-3 elementary row operation adds a scalar times one row to another. It is denoted by
EΑHiL+ j , indicating that Α times row i is added to row j.
7. All elementary matrices are non-singular. Their inverses are given by: EHi, jL-1= EHi, jL, EΑHiL-1 = E €1 HiL and
Α

EΑHiL+ j -1
= E| ΑHiL+ j . Notice that the inverse of an elementary matrix is the elementary matrix that reverses the

× ×
operation.
8. If Ax = b, Bx = c are linear systems and D is a product of elementary matrices then B = DA, c = Db implies
that the two systems have the same solutions.

Gauss-Jordan elimination hinges on the fact that if A were upper triangular, then the system, Ax = b, could be
easily solved. Indeed the row-echelon form of A is upper triangular. The U in LU-Decomposition refers to this
upper triangular matrix. The problem of solving the linear system reduces to finding U, row equivalent to A and
upper triangular.

Suppose A = [Αi, j ] and that Α1,1 ¹ 0. Then it is a simple matter to clear out the entries of the first column of A
below the 1,1 entry. For this purpose we only need elementary operations of type-3. The corresponding elemen-
tary matrices, EΑH1L+ j are all lower triangular since each j > 1. Since the product of lower triangular matrices is
lower triangular, then we have a lower triangular matrix L1 with L1A = A1 = [Α'i, j ] where Α'1, j = 0 for all j > 1.
Furthermore, L1 is nonsingular.

Next if Α'2,2 ¹ 0, then we can clear out below Α'2,2 using only type-3 elementary operations. As before, there is a
lower triangular matrix L2 with L2 L1A = L2 A1= A2 = [Α''i, j ]. As before, L2 L1 is lower triangular and nonsingu-
lar. If Α''3,3 ¹ 0, then the process continues. Indeed, if at each step the next diagonal entry is not zero, then we
clear out below the diagonal element so that at the end we have

(2.3.1) (Ln··· L2 L1)A = U.

where each Li is lower triangular and nonsingular and U is upper triangular. Multiplying (2.3.1) by ( Ln···
L1L-1= L1-1···Ln-1= L yields the relation A = LU, where L is lower triangular (the inverse of a lower triangular
matrix is lower triangular) and U is upper triangular. This is the LU-decomposition for A in this case.
lower triangular matrix L2 with L2 L1A = L2 A1= A2 = [Αi, j ]. As before, L2 L1 is lower triangular and nonsingu-
lar. If Α''3,3 ¹ 0, then the process continues. Indeed, if at each step the next diagonal entry is not zero, then we
clearNMfMS_part1.nb
20 out below the diagonal element so that at the end we have

(2.3.1) (Ln··· L2 L1)A = U.

where each Li is lower triangular and nonsingular and U is upper triangular. Multiplying (2.3.1) by ( Ln···
L1L-1= L1-1···Ln-1= L yields the relation A = LU, where L is lower triangular (the inverse of a lower triangular
matrix is lower triangular) and U is upper triangular. This is the LU-decomposition for A in this case.

Now we only need consider what to do when we encounter 0 on the main diagonal. In this case, if Αii = 0, there
must be a non-zero entry below the diagonal. Notationally, there is an entry Αij¹ 0 with i > j. If this were not
the case then A would be singular. For instance if we began with

1 2 3
A= 4 8 6
7 5 9

Then upon clearing the first column, we have

1 2 3
A1 = 0 0 -6 .
0 -9 -12

Now if we apply a type-1elementary operation, EH2,3L, this will produce the upper triangular matrix

1 2 3
0 -9 -12 = EH2,3L(E-7 H1L+3 E-4 H1L+2)A
0 0 -6

In particular, whenever we encounter a zero on the diagonal, then we must introduce a type-1 elementary
matrix which interchanges the rows to bring a nonzero entry to the diagonal. Looking more closely at the
product of a type-1 and a type-3 we see easily that

EHi, jL EΑHsL+t = EΑHsL+t EHi, jL

provided s,t are not elements of the set {i,j}. Otherwise

EHi, jL EΑHiL+t = EΑH jL+t EHi, jL and EHi, jL EΑH jL+t = EΑHiL+t EHi, jL.

These equations allow us to restate (2.3.1) as

(2.3.1') (Ln··· L2 L1)(Pm × × × P2 P1 )A = U,

where the P j are type-1 elementary matrices. This brings us to the following theorem.

Theorem 2.3.1: If A is a non-singular n by n matrix, then there exists a lower triangular matrix L, an upper
triangular matrix U and a matrix P with PA = LU. The upper triangular matrix U is the row echelon form of
A, L is a product of type-3 elementary matrices and P is a product of type-1 elementary matrices.

The matrix, P, which is the product of type-1 matrices is called a permutation matrix. The idea here is simple.
(2.3.1') (Ln··· L2 L1)(Pm × × × P2 P1 )A = U,
NMfMS_part1.nb 21
where the P j are type-1 elementary matrices. This brings us to the following theorem.

Theorem 2.3.1: If A is a non-singular n by n matrix, then there exists a lower triangular matrix L, an upper
triangular matrix U and a matrix P with PA = LU. The upper triangular matrix U is the row echelon form of
A, L is a product of type-3 elementary matrices and P is a product of type-1 elementary matrices.

The matrix, P, which is the product of type-1 matrices is called a permutation matrix. The idea here is simple.
Each type-1 elementary operation permutes the rows of the matrix A. In turn the product is another permutation
of the rows, hence the term, permutation matrix.

For any nonsingular matrix A, this information is available via the Mathematica function LUDecomposition.
For instance, for the matrix,

1 2 3
A= 4 8 6
7 5 9

LUDecomposition[A]

will return the following output: {{{1,2,3},{7,-9,-12},{4,0,-6}},{1,3,2},1}. The first item in the list is a 3 by 3
matrix. The upper triangular part of this matrix is U. The lower triangular part (after placing 1's along the
diagonal) is L. Now we know that

1 2 3 1 0 0
U = 0 -9 -12 , L = 7 1 0 .
0 0 -6 4 0 1

Next {1,3,2} is interpreted as the permutation that sends 1 ® 1, 2 ® 3, 3 ® 2. Hence,

1 0 0
P= 0 0 1 .
0 1 0

×
The final entry in the output is called the condition number. This number is useful when estimating the error.
We develop this idea now. For this purpose we will denote the standard basis vectors by ei .

× × × ×
Recall that the norm of a vector v = Úni =1 Ξi ei is given by ||v|| = (Úni =1 Ξi2L12. Furthermore if ||v|| £ 1, then |Ξi | £ 1
×
for each i. Now for this vector v and any linear transformation A, we use the triangle inequality to get

× × × × ×
||Av|| = ||A Úni =1 Ξi ei || = ||Úni =1 Ξi (Aei )|| £ Úni =1|Ξi | ||A ei || £ Úni =1 ||A ei || = M.

× ×
× × × × × × × ×
If we set ||A|| = M and call this number the norm of A, then we have proved, ||Av|| £ ||A|| provided ||v|| £ 1. More
generally, for any v, 1× v has norm 1 and so ||Av|| = ||v|| || 1× Av|| = ||v|| ||A 1× v|| £ ||A|| ||v||. We have derived that
ÈÈvÈÈ ÈÈvÈÈ ÈÈvÈÈ
×
for any vector v,

× ×
(2.3.2) ||Av|| £ ||A|| ||v||.
for each i. Now for this vector v and any linear transformation A, we use the triangle inequality to get

× × × × ×
= ||A Úni =1 Ξi ei || = ||Úni =1 Ξi (Aei )|| £ Úni =1|Ξi | ||A ei || £ Úni =1 ||A ei || = M.
||Av|| NMfMS_part1.nb
22

× ×
× × × × × × × ×
If we set ||A|| = M and call this number the norm of A, then we have proved, ||Av|| £ ||A|| provided ||v|| £ 1. More
generally, for any v, 1× v has norm 1 and so ||Av|| = ||v|| || 1× Av|| = ||v|| ||A 1× v|| £ ||A|| ||v||. We have derived that
ÈÈvÈÈ ÈÈvÈÈ ÈÈvÈÈ
×
for any vector v,

× ×
(2.3.2) ||Av|| £ ||A|| ||v||.

×
We are now able to do the error analysis for a linear system of equations. Let Ax = b be a linear system with
ï ï ï ï

×
nonsingular coefficient matrix A. Let x be a computed solution and set b = Ax. Then r = b - b is called the
ï
residual. Notice that the residual is a vector. This is also the case for the error, e = x - x. We now see that the
norm error is bounded by the norm of A inverse and the norm of the residual.

× ï ï ï ï ×
(2.3.3) ||e|| = || x - x || = ||A-1 b - A-1b || = ||A-1 b | b || £ ||A-1|| ||b - b|| = ||A-1|| || r ||.

And

(2.3.4) || b || = || Ax|| £ ||A|| ||x||.

Therefore, from (2.3.3) and (2.3.4) we get a bound for the norm relative error.

× × × × ×
ÈÈ eÈÈ ÈÈA-1 ÈÈ ÈÈ r ÈÈ ÈÈ A-1 ÈÈ ÈÈ r ÈÈ ÈÈ b ÈÈ ÈÈ A-1 ÈÈ ÈÈ r ÈÈ ÈÈ A ÈÈ ÈÈ x ÈÈ ÈÈ A ÈÈ ÈÈ A-1 ÈÈ ÈÈ r ÈÈ
ÈÈ x ÈÈ ÈÈ x ÈÈ ÈÈ x ÈÈ ÈÈ x ÈÈ
(2.3.5) £ = £ = .
ÈÈ b ÈÈ ÈÈ b ÈÈ ÈÈ b ÈÈ

The value || A || || A-1 ÈÈ is the condition number for A. Equation 2.3.5 states that the relative norm error is
bounded by the condition number times the relative norm residual. The latter quantity is known without knowl-
edge of x. Certainly we want the condition number to be small. If the condition number is very large, then the
coefficient matrix for the system in singular or nearly singular and the results returned by LinearSolve are not
considered reliable. When this is the case Mathematica will return a warning that the coefficient matrix is 'ill
conditioned.'

Exercises:

1. Apply LUDecompostion to the following matrices. Write out L, U and P. Which matrices are ill conditioned?

1. 1. 0.
a. 1. 1. 3. ,
0. 1. -1.

1. 2. 1. 7.
2. 0 1. 4.
b. ,
1. 0 2. 5.
1. 2. 3. 11.

1. 2. 3.
c. 1. 1. 1. .
5. 7. 9.
1. 1. 0.
a. 1. 1. 3. , NMfMS_part1.nb 23

0. 1. -1.

1. 2. 1. 7.
2. 0 1. 4.
b. ,
1. 0 2. 5.
1. 2. 3. 11.

1. 2. 3.
c. 1. 1. 1. .
5. 7. 9.

2. The Mathematica statement Eigensystem[A] returns n+1 vectors for an n by n matrix A. The entries of the
first vector are the eigenvalues of A. The remaining vectors are the corresponding eigenvectors. Apply Eigensys-
tem to the matrices listed in (1). Recall that a matrix is singular if it has zero as an eigenvalue. When looking at
computer output a number close to zero should be considered zero.Which of the matrices in (1) are singular?

3. The following exercises leads to an estimator for the condition number of a nonsingular matrix A.
a. Let v be an eigenvector for A with eigenvalue Α. Prove that ||Av|| = |Α| ||v||. Prove that |Α| £ ||A||.
b. Prove that Α is an eigenvalue for A if and only if €Α1 is an eigenvalue for A-1. (Hint: consider v = A-1A.)
c. Let Αmax be the eigenvalue of A with largest absolute value, and let Αmin be the eigenvalue of A with smallest
absolute value. Prove that |Αmax| £ ||A|| and |Αmin| £ ||A-1||.
ÈΑmaxÈ
ÈΑminÈ
d. Prove that £ C, the condition number of A.

4. Functions of Several Variables: Finding Roots and Extrema

The techniques we develop in this section are also referred to as Newton's method since they use derivatives
and a single initial estimate to establish an iterative process to search for a root. They apply to functions f : Rn®
Rm which are differentiable. Therefore they apply to linear systems which are not square or to square linear
2
systems whose coefficient matrix is singular. On the other hand, setting g = f ·f alternately f if m = 1, then
the roots of f are minima of g. Hence, we need only consider the problem of finding extrema.

The techniques are applicable to optimal control theory and sensitivity analysis. Sensitivity analysis is of
particular interest. Here you define a function f which measures an outcome from given independent (input)
variables. However the parameters necessary to express f may not be known with certainty. For instance a
formula in finance may depend on the variance of a random variable. But it is often the case that the variance,
Σ2, is not known exactly. Sensitivity analysis attempts to determine how the outcome will vary with changes in
df
the estimate for Σ2. In effect the analyst is seeking areas where dΣ
is maximal.

We begin by looking at an example. Consider f(x, y) = x2+ y2 mapping R2 to R (see figure 6.1). The graph of f
is a subset of R3 , and the single minimum of f is at (0, 0). Suppose we start the search for a root at (1, 2) and
that Γ(t) is line in the domain of f containing (1,2). Then h = f ë Γ is a function from R to R, hence, ordinary
single variable max/min techniques apply to h. Now Γ(t) = (1,2) + t v, where v is the direction vector for Γ.
Recall that the gradient of f is the direction in the domain of maximal change. Hence it is reasonable to take v =
Ñf (1,2). For the case at hand Ñf = (2x, 2y). So evaluating Ñf at (1,2) we get Γ(t) = (1,2) + t (2, 4) = (1+2t, 2 +
df
the estimate for Σ2. In effect the analyst is seeking areas where dΣ
is maximal.

24 NMfMS_part1.nb
We begin by looking at an example. Consider f(x, y) = x2+ y2 mapping R2 to R (see figure 6.1). The graph of f
is a subset of R3 , and the single minimum of f is at (0, 0). Suppose we start the search for a root at (1, 2) and
that Γ(t) is line in the domain of f containing (1,2). Then h = f ë Γ is a function from R to R, hence, ordinary
single variable max/min techniques apply to h. Now Γ(t) = (1,2) + t v, where v is the direction vector for Γ.
Recall that the gradient of f is the direction in the domain of maximal change. Hence it is reasonable to take v =
Ñf (1,2). For the case at hand Ñf = (2x, 2y). So evaluating Ñf at (1,2) we get Γ(t) = (1,2) + t (2, 4) = (1+2t, 2 +
4t). Notice that (0, 0), the location of the extremum, lies on Γ. Hence, the minimal value of h is also the mini-
mal value for f.

Figure 4.1: f Hx, yL= x2 + y2

15

10
2
5

0
0
-2

0
-2
2

Nevertheless we continue with the calculations, h(t) = H1 + 2 tL2+ H2 + 4 tL2, dh


dt
= 24(1+2t). Solving dh
dt
= 0
yields t1 = -0.5. Now Γ(t1) = (0,0) the location of the minimum of f.

This was a particularly simple case. In general Γ(t1) = (x1, y1) will not be the location of the extremum. But if
we start at (x0, y0), are seeking a minimum and f (x1, y1) < f (x0, y0), then it is reasonable to repeat the process
using (x1, y1L and end when the value of f is sufficiently small or the iteration count has reached a predeter-
mined limit or the values of f are no longer decreasing.

We now state the general process for functions of several variables. Suppose we seek a minimum of f mapping
Rn to Rm .

1. Start with a function f : Rn® Rm and an initial estimate x0 in Rn.


2a. If m > 1, replace f by f · f so that f takes values in R.
2b. If m = 1, replace f by f 2 to ensure that f takes values in R.
3. Compute a = Ñf (x0L.
4. Set Γ(t) = x0 + t a and h = f (Γ(t)).
5. Compute h'(t) and solve h'(t) = 0 via FindRoot to yield t0.
6. Set x1 = Γ(t0).
7. If f (x0L < f (x1L, then exit (the process has failed).
8. If the iteration count exceeds the maximum, exit (the process has failed).
9. If | f (x1L - f (x0L | is less sufficiently small, exit (possible success).
10. Go back to Step 3 using x1.
3. Compute a = Ñf (x0L.
4. Set Γ(t) = x0 + t a and h = f (Γ(t)). NMfMS_part1.nb 25
5. Compute h'(t) and solve h'(t) = 0 via FindRoot to yield t0.
6. Set x1 = Γ(t0).
7. If f (x0L < f (x1L, then exit (the process has failed).
8. If the iteration count exceeds the maximum, exit (the process has failed).
9. If | f (x1L - f (x0L | is less sufficiently small, exit (possible success).
10. Go back to Step 3 using x1.

Before proceeding, we should comment on Steps 4 and 5. If you have defined f in step 1 using the Mathemat-
ica format f[x_, y_] and have defined Γ to take values as length 2 lists of the form {a, b} (as 2-tuples), then you
cannot directly evaluate f at Γ in Step 4. The problem is that Mathematica does not identify functions of two
variables and functions of 2-tuples. You get around this by defining h via h[t_] = f[Γ[t][[1]],Γ[t][[2]]]. In Step
5, the question is how to set the initial estimate for t when setting up the FindRoot. Note that since Γ(0) = x0,
then it is reasonable to use t = 0 as the initial estimate.

We consider another example. Suppose f (x, y) = Cos2(x) e y + 1. The minima for the function occur at the odd
multiples of x = €Π2 . (See Figure 6.2). If we start the search at (0, 1), then t0 = -35.7979, (x0, y0M = (0, -96.2817)
and we are way out on the negative y-axis. Even though the value of f is nearly zero (about 1.4 times 10-47), no
further processing will take us any closer to an actual minimum.

Figure 4.2: f Hx,yL = Cos2 HxLe y +1

150

100 6

50

0 4

-5
2
0

5 0

There are many alternate choices for the direction vector for Γ. One choice is similar to the secant method. In
this case we begin with the Taylor expansion for f.

× × ×
(2.4.1) f (x+s) = f(x) + Ñf(x)·s + €12 H s LT H(x) s + R2,

¶2 f
where R2 is the remainder term and H(x) is the Hessian of f. The ijthentry of H(x) is given by ¶xi ¶x j
(x).

× ×
×
Now if f (x+s) = f(x), we might expect a local extrema between x+s and x. This expectation would follow from
Rolle's theorem. Hence, we take Γ to be the line connecting x = x+s. In this case €12 H s LT H(x) s is approxi„
×
mately equal to -Ñf(x)·s. Hence if we solve
× × ×
(2.4.1) f (x+s) = f(x) + Ñf(x)·s + €12 H s LT H(x) s + R2,

26 NMfMS_part1.nb ¶2 f
where R2 is the remainder term and H(x) is the Hessian of f. The ijthentry of H(x) is given by ¶xi ¶x j
(x).

× ×
×
Now if f (x+s) = f(x), we might expect a local extrema between x+s and x. This expectation would follow from
Rolle's theorem. Hence, we take Γ to be the line connecting x = x+s. In this case €12 H s LT H(x) s is approxi„
×
mately equal to -Ñf(x)·s. Hence if we solve

(2.4.2) €12 H(HxL s = -Ñf (x),

×
and use the result as the direction vector for Γ, we can expect a local extrema between f (x+s) and f(x). This is
not a problem if the Hessian is nonsingular. In this case replace Step 3 in the process by the following state„
ment.

×
3'. Compute a as the solution to the linear system €12 H(HxL s = -Ñf (x).

Exercises:

1 1 x
1. Let f (x, y) = = (x + y, x + y). Solve f = 0 using (2, -1) as the initial estimate. Note that the
1 1 y
dimension of the kernel of f has dimension 1. Find all solutions to f (x, y) = (2, 2). (Remember to start with f ·
f. )

2. Use the Hessian method to find a minimum to f(x, y) = x2+ y2. Use (2, 2) as the initial estimate.

3. Consider the linear transformation

x 4x-2 y+3z-5w
y 3x+3 y+5z-8w
L(x, y, z, w) = A. = ,
z -6 x - y + 4 z + 3 w
w -4 x + 2 y - 3 z + 5 w

where

4 -2 3 -5
3 3 3 -8
A= .
-6 -1 4 3
-4 2 -3 5

a. Use LUDecomposition to determine if A is singular or non-singular. (Do not forget to introduce a decimal
point to the data.) How does this impact the problem of solving an equation of the form L(x, y, z, w) = (x0, y0,
z0, w0)?

b. Use the gradient method to solve L(x, y, z, w) = (1, 1, 1, – 1).


è Use (5, 5, 5, 5) for the initial estimate.
è Use at least 35 interations.
è Use 10-5 as the tolerance in Step 9.
è Make certain to use two 'if'' statements, one for Step 7 and one for Step 9.
a. Use LUDecomposition to determine if A is singular or non-singular. (Do not forget to introduce a decimal
point to the data.) How does this impact the problem of solving an equation of the form L(x, y, z, w) = (x0, y0,
z0, w0)? NMfMS_part1.nb 27

b. Use the gradient method to solve L(x, y, z, w) = (1, 1, 1, – 1).


è Use (5, 5, 5, 5) for the initial estimate.
è Use at least 35 interations.
è Use 10-5 as the tolerance in Step 9.
è Make certain to use two 'if'' statements, one for Step 7 and one for Step 9.

c. Redo part b using (1, 2, 3, 4) as the initial estimate.

d. Why is it possible for the solution to b and c to be different?

e. Prove that if v is the solution to b and w is the solution to c, then v – w is a solution to L(x, y, z, w) = (0, 0, 0,
0). What is the kernel of a linear transformation?

f. Use LinearSolve to get a solution to L(x, y, z, w) = (1, 1, 1, – 1). Is this solution trusted? Why? What was the
condition number from part a?
28 NMfMS_part1.nb

Chapter 3 Interpolating and Fitting

Introduction

We introduce the following terminology. Suppose we are given a set of points P1, ···, Pn be n points in the
plane R2. We may want to find a curve (function) which passes through the points (interpolating) or a curve
which passes near to the point (fitting). If we want the curve to pass through the points, then we may have to
accept anomalies on the curve. If we are willing to accept a curve that only passes near the points, then we may
place stronger restrictions on the curve. We will soon see how this 'give and take' materializes.

Of the several techniques there is 'no best of all', no method that gives best results under all circumstances. The
spline, with applications in computer graphics, visualization, robotics and statistics, is perhaps the most widely
used. The curve is twice continuously differentiable, depends only on point data and faithfully reflects the
tendencies of the input data. On the other hand among the techniques we present, splines have the most com-
plex mathematical foundation. For all of these reasons we spend the most time with them.

On the other hand polynomial interpolation is the oldest of the techniques. It has the most developed theory and
is widely used as a technique for approximating integrals and approximating solutions to differential equations.

Least squares fitting in the linear case provides the numerical technique used for linear regression. We see in
Part 3 that higher degree least squares plays an important role in error estimation for the finite element method.

The remaining technique is Bezier interpolation. This procedure was developed originally to be used by engi-
neers when resolving artist designs. In particular, Bezier curves were developed as a tool to help an engineer
must resolve a concept drawing into three dimensional coordinates.

1. Polynomial Interpolation

We begin by looking at the Taylor expansion of a function. Consider the function f (x) = x e-x - 1. Plotting this
function on the interval [1,4]:

f[x_] = x*Exp[-x] - 1;
Plot[f[x], {x,1,4}];

shows a decreasing function with an inflection point.

Figure 1.1: f HxL = xe-x | 1


-0.65

-0.70

-0.75

-0.80

-0.85
f[x_] = x*Exp[-x] - 1;
Plot[f[x], {x,1,4}];
NMfMS_part1.nb 29

shows a decreasing function with an inflection point.

Figure 1.1: f HxL = xe-x | 1


-0.65

-0.70

-0.75

-0.80

-0.85

1.5 2.0 2.5 3.0 3.5 4.0

Thinking of this curve as being more or less cubic, we can develop the cubic Taylor polynomial for f expanded
at the midpoint, x = 2.5.

df d2 f 3
1 d f
g (x) = f (2.5) + dx
(2.5) (x - 2.5) + 2!1 2 (2.5) (x - 2.5L2 + 3! dx3
(2.5) (x - 2.5L3.
dx

When developing g you will need to compute the derivatives of f . Recall that the derivatives of f are computed
in Mathematica via D[f[x],x], D[f[x],x,x], etc. If you plot f and g on the same axis you will see that the cubic
Taylor polynomial provides a remarkably good approximation of the more complex function. Figure 1.2 shows
the graph of g together with the graph of f. The graph if g is above f on the left and below on the right.

Figure 1.2: f together with the Taylor expansion at x = 2.5


-0.65

-0.70

-0.75

-0.80

-0.85

1.5 2.0 2.5 3.0 3.5 4.0

A numerical measurement of the 'goodness of fit' is given by the L2 norm of f - g,

ÈÈ f - g ÈÈ2 = [Ù1 (f - gL2 dxD12,


4

This is called the norm error. The mean norm error is

4 | 1 Ù1
1 4
[ (f - gL2 dxD12.

The finite Taylor expansion produces a high quality one point interpolation provided we know the original
function. However, suppose we have points and no function. Consider points P1, ···, Pn+1 in R2, Pi = (xi , yi ).
We are looking for a polynomial p of degree n that interpolates the n + 1 points. Suppose that p(x) = Úni =0Αi xi .
To determine p we must find the coefficients Α0, ···, Αn. Notice that we can write the polynomial as a row
vector times a column vector,
This is called the norm error. The mean norm error is

4 | 1 ÙNMfMS_part1.nb
1 4 2 12
30 [ 1 (f - gL dxD .

The finite Taylor expansion produces a high quality one point interpolation provided we know the original
function. However, suppose we have points and no function. Consider points P1, ···, Pn+1 in R2, Pi = (xi , yi ).
We are looking for a polynomial p of degree n that interpolates the n + 1 points. Suppose that p(x) = Úni =0Αi xi .
To determine p we must find the coefficients Α0, ···, Αn. Notice that we can write the polynomial as a row
vector times a column vector,

Α0

= Úni =0Αi xi .
Α1
(x0, x1 ···, xn)
×××
Αn

Our requirement for p is that it interpolate the n+1 points. In particular, p(xi ) = yi , then for each i. Hence, we
have

Α0
Α1
p(xi ) = (xi 0, xi 1 ···, xi n) = yi .
×××
Αn

Collecting these equations we then get the following matrix equation

Α0 y1
1 x11 ××× x1n
Α1 y2
× × × × = .
n
××× ×××
1 xn+1 1 × × × xn+1
Αn yn+1

where xi 0 = 1. This is a linear system of equations where the xi and yi are known while the Αi are unknown.
Hence we can use the LinearSolve function in Mathematica to find the coefficients of p provided the coeffi-
cient matrix is non-singular. The matrix is called a Vandermonde matrix. It is always nonsingular provided the
xi are distinct.

Theorem 3.1.1: The Vandermonde matrix

1 x11 ××× x1n


× × × ×
1 xn+11 × × × xn+1n

is nonsingular, provided the scalars xi are distinct.

Proof: Indeed, the Vandermonde matrix is singular only if the columns are dependent. Hence, only if there are
scalars Β0, ···, Βn, not all zero with

1 x11 x1n 0
Β0 × × × + Β1 × × × + ··· + Βn × × × = × × × .
1. xn+11 xn+1n 0
is nonsingular, provided the scalars xi are distinct.
NMfMS_part1.nb 31

Proof: Indeed, the Vandermonde matrix is singular only if the columns are dependent. Hence, only if there are
scalars Β0, ···, Βn, not all zero with

1 x11 x1n 0
Β0 × × × + Β1 × × × + ··· + Βn × × × = × × × .
1. xn+11 xn+1n 0

Hence, for each i = 1, 2, … , n+1 we have

Β0 + Β1 xi 1 + ··· + Βn xi n= 0.

But the polynomial Β0 + Β1 x1 + ··· + Βn xn is not zero, has degree n and therefore has at most n distinct roots.
However, we just shown that it has n + 1 distinct roots, x1, ..., xn+1. As this is impossible, we are led to the
conclusion that the Vandermonde matrix is nonsingular. Ÿ

There is another way to do polynomial interpolation. The outcome is the same, but nevertheless, the approach
does provide insight. As in the previous case we begin with n+1 points in R2, denoted P1, ..., Pn+1 with Pi = (xi ,
yi ). For each i, we set

Ûj¹i Ix - x j M.
Hx-x1 L Hx-x2 L... Hx-xi-1 L Hx-xi+1 L... Hx-xn+1 L
{i (x) = Hx-x1 L Hx-x2 L... Hx-xi-1 L Hx-xi+1 L... Hx-xn+1 L
=Û 1
j¹i Ixi -x j M

It is not difficult to see that the polynomials {i (x) satisfy {i (xi ) = 1, {i (x j ) = 0 whenever j ¹ i and q(x) =
Ún+1
i=1 yi {i (x) interpolates the given points. (See Exercise 5.) The polynomials {i (x) are called Lagrange polynomi-
als. We now see that the two polynomials interpolations, p derived from the vandermonde matrix and q derived
from the Lagrange polynomials are in fact the same.

Theorem 3.1.2: Referring to the given notation, p(x) = q(x).

Proof: We begin by setting r = p – q. Hence, r is a degree n polynomial. Since p(xi ) = yi = q(xi ) for each i = 1, 2,
..., n+1, then r has n+1 roots, x1 , ..., xn+1 . But if r is not identically zero, then it can have at most n roots.
Therefore, r = 0 and p = q. Ÿ

If the point Pi lie on the graph of a function f, then it is natural to ask how well does p approximate f. If f has at
least n+1 continuous derivatives then we can estimate the error, e(x) = f(x) - p(x).

Theorem 3.1.3: Suppose that f is a real valued function defined on an interval [a, b] and suppose that f has at
least n+1 continuous derivatives. Further, take a £ x1 < ... < xn+1 £ b, with f(xi ) = yi . If p is the polynomial
interpolation of the points (xi , yi ), then the error e(x) = f(x) - p(x) is given by

f Hn+1L HΞL
Ûi Hx - xi L,
Hn+1L!
(3.1.1) e(x) =

for some Ξ in (a, b). In particular, |e(x)| is bounded by M


Hn+1L!
Hb - aLn+1, where M is the maximal value of f Hn+1L
on the interval.

so that e(x) = f(x) - p(x) = Ûi Hx - xi Lg(x). Next take Ζ Ε [a, b] distinct from the
eHxL
Ûi Hx-xi L
Proof: We define g(x) = ,
x and set
interpolation of the points (xi , yi ), then the error e(x) = f(x) - p(x) is given by

f Hn+1L HΞL
Ûi Hx - xi L,
32 NMfMS_part1.nb

Hn+1L!
(3.1.1) e(x) =

for some Ξ in (a, b). In particular, |e(x)| is bounded by M


Hn+1L!
Hb - aLn+1, where M is the maximal value of f Hn+1L
on the interval.

so that e(x) = f(x) - p(x) = Ûi Hx - xi Lg(x). Next take Ζ Ε [a, b] distinct from the
eHxL
Ûi Hx-xi L
Proof: We define g(x) = ,
xi and set

h(x) = f(x) - p(x) - Ûi Hx - xi Lg(Ζ).

Note that we cannot be certain that g is defined at the xi , however our choice of Ζ ensures that h is defined on
[a, b] with n+1 continuous derivatives.

Now each xi is a root of h and in addition h(Ζ) = e(Ζ) - Ûi HΖ - xi Lg(Ζ) = 0. Hence, h has n+2 roots in the interval
[a, b]. Furthermore, h is continuous on the interval and differentiable on the open interval (a, b). Hence, we
may apply Rolle's theorem to the interval between each pair of successive roots. Therefore, between each pair
dh
of roots there is a root of the derivative of h. Hence, dx
has at least n+1 roots on the interval (a, b). Repeating
d2 h
this argument, 2 has at least n roots in (a, b). Continuing, the k th derivative of h has at least n + 2 - k roots. So
dx
that the n + 1st derivative has at least 1 root. We denote this root by Ξ = ΞΖ since Ξ depends on our choice of Ζ.
Now

0 = hHn+1LHΞL = f Hn+1L(Ξ) - pHn+1L(Ξ) - g(Ζ) d n+1 Ûi Hx - xi L Èx=Ξ .


n+1

dx

But p is degree n, so pHn+1L = 0. Also d n+1


n+1 Ûi Hx - xi L = (n+1)! (irrespective of Ξ). Therefore, g(Ζ)(n+1)! =
dx
f Hn+1L(Ξ), or

f Hn+1L HΞL
Ûi Hx - xi L,
Hn+1L!
e(x) =

where Ξ is derived via repeated applications of Rolle's theorem and depends on the choice of Ζ. However, the
value f Hn+1LHΞL is independent of Ζ. Now for the final statement on the bound for the error magnitude we note
that since f is n+1 times continuously differentiable, then f Hn+1L is continuous and hence has maximum value
on the interval. Ÿ

The bound for the absolute error is important as it forms the basis numerical integration. In turn it is therefore
critical for the supporting theory of piecewise collocation method, a technique for approximating the solution
of a partial differential equation. On the other hand, the estimate for the error magnitude is of little use if we do
not know f. Indeed it is not difficult to find functions where M is very large. Nor is it difficult to find functions
where the error is large. Thethe following example is a case in point.

Returning to the function f (x) = xe-x - 1 and the four points Pi = (xi , f (xi )), x = 1,2,3 and 4. The polynomial
interpolation, p(x), of the points will again provide an approximation of f by a cubic polynomial. As in the case
of the Taylor interpolation, it is remarkably close to f.

1
On the other hand consider the function f (x ) = 1 + x2
. In this case, pick a finite sequences of points along the
of a partial differential equation. On the other hand, the estimate for the error magnitude is of little use if we do
not know f. Indeed it is not difficult to find functions where M is very large. Nor is it difficult to find functions
where the error is large. Thethe following example is a case in point. NMfMS_part1.nb 33

Returning to the function f (x) = xe-x - 1 and the four points Pi = (xi , f (xi )), x = 1,2,3 and 4. The polynomial
interpolation, p(x), of the points will again provide an approximation of f by a cubic polynomial. As in the case
of the Taylor interpolation, it is remarkably close to f.

1
On the other hand consider the function f (x ) = 1 + x2
. In this case, pick a finite sequences of points along the
graph of f, which are symmetric about the y-axis. Use these points to produce a polynomial interpolation of the
f. (See Exercise 3 below.) The problem is that the polynomial will look nothing like the function. Further, the
more points you choose the less the polynomial resembles f. Looking at the graph if f we see that the function
seems not to be a polynomial function. (Note the asymptotic behavior. No polynomial can reproduce this type
of behavior.) Hence we should not expect that there is a polynomial function that interpolates it well.

1
There is another problem with polynomial interpolation. Consider again the function f(x) = and select five
1 + x2
1
points P1 = (-4, 17
), P2 = (-2, €15 ), P3 = (2, €15 ), P4 = (4, 1
17
) from the graph of f. Next select P = (0, y) where y Ε
[0.2, 0.3]. Figure 3.3 shows the resulting polynomials for three values of y. Suppose that the location of the
points came from some measuring or sampling process. Then small errors just in this one y yield significantly
different results. Looking at the resulting curves we see that shape of the curves is different. Further the change
in y is magnified 20 times at p(5). This is an inherent problem with polynomial interpolation. The technical
term for the problem is that polynomial interpolation lacks local control. Later we develop spline curves. These
curves were developed precisely to resolve the local control problem.

Figure 1.3a: y = 0.3, pH5L = 0.11 Figure 1.3b: y = 0.25, pH5L = -0.04 Figure 1.3c: y = 0.2, pH5L = -1.9
0.30 0.25 0.2

0.25 0.20
0.1
0.15
0.20
0.10
0.15 -4 -2 2 4
0.05
-0.1
-4 -2 2 4
-4 -2 2 4

In spite of the problem we just noted, polynomial interpolation is an important and productive tool for numeri-
cally solving differential equations. When this technique is used special care is taken to ameliorate the problem
we see in Figure 1.3.

Because the Taylor expansion requires more information than is usually available, it is often ignored as an
interpolation technique. However, there is an important application, which should not be ignored. In the next
section we will develop a class of parametric cubic interpolations. Consider the setting where Β(t) = (Β1(t),
Β2(t)) in R2 and each Βi is an ordinary cubic polynomial. When Β represents a function, then it possible to solve
x = Β1(t) for t and then substitute this in Β2 to yield Β = (x, f(x)). However the resulting function is rarely integra-
ble. On the other hand you can get values for f and its derivatives. Hence, you can write the cubic Taylor
expansion for f and this is easily integrated.

Finally, in Exercise 7 we introduce the idea of piecewise polynomial interpolation. The basic idea of polyno-
mial interpolation is that the more points that we interpolate, then the better the polynomial will approximate
the original function. However, as we add more and more points then the degree of the polynomial increases. In
piecewise polynomial interpolation, we subdivide the interval into smaller and smaller subintervals while
interpolating the function by polynomials of fixed degree on each subinterval.

Exercises:
ble. On the other hand you can get values for f and its derivatives. Hence, you can write the cubic Taylor
expansion for f and this is easily integrated.
34 NMfMS_part1.nb

Finally, in Exercise 7 we introduce the idea of piecewise polynomial interpolation. The basic idea of polyno-
mial interpolation is that the more points that we interpolate, then the better the polynomial will approximate
the original function. However, as we add more and more points then the degree of the polynomial increases. In
piecewise polynomial interpolation, we subdivide the interval into smaller and smaller subintervals while
interpolating the function by polynomials of fixed degree on each subinterval.

Exercises:

1. Compute the norm error and the mean norm error for the function f(x) = xe-x - 1 and its cubic Taylor expan-
sion about x = 2.5. Use the interval [1, 4].

2. For f(x) = xe-x - 1,


a. Compute the polynomial interpolation, p, for the Pi = (xi , f (xi )), x = 1,2,3 and 4.
b. Plot the graph of f and p on the same axes for the interval [1, 4].
c Compute the norm error and mean norm error.
d. Is p better or worse than the cubic Taylor interplant.
e. Use Theorem 3.1.3 to estimate the maximal absolute error for the interpolation p.

1
3. Compute the polynomial interpolation of the points (xi , 1+x2i
) for xi = -2, -1, 0, 1, 2. Plot the polynomial
1
against the graph of f (x ) = 1 + x2
. Compute the norm error and mean norm error.

4. Repeat Exercise 3 with additional points on the x-axis and 3 and -3, 4 and -4. Does this produce a better
approximation of the function f ?

5. Consider the points Pi = (xi , yi ), i = 1, 2, ..., n+1, in the real plane and the corresponding Lagrange polynomi-
als {i .
a. Prove that for each i, {i is a degree n polynomial with {i (x j ) = 0 if j ¹ i, and {i (xi ) = 1.
b. Prove that q(x) = Ún+1
i=1 yi {i (x) interpolates the given points.

6. Use Theorem 3.1.3 to estimate the maximal absolute error in 3 and in 4. Does adding the additional points
increase or decrease the error estimate?

1
7. Consider the function f(x) = 1+x2
on the interval [-4, 4].
a. Determine the maximal value for f H3LHxL on the interval.
4 - H -4L
b. Divide the interval into 40 subintervals of length 0.2 = 40
. In particular determine -4 = a0 < a1 < ... < a40
with ak+1 - ak = 0.2.
c. Compute the second degree polynomial interpolation of f on the subinterval [ak , ak+1] using the three values
ak +ak+1
ak , 2
and ak .
d. Plot the result of part c and overlay the plot of f.
ÈΜÈ
e. Use Theorem 3.1.3 to prove that the absolute error |e(x)| is bounded by 6
H0.2L40, where Μ is the value com-
puted in part a.
f. Prove that as the number of subintervals goes to ¥, then error converges to zero.

8. Researchers reporting in a chemical engineering journal reported the following data on tungsten production
as a function of temperature measured in degrees Kelvin.
ak +ak+1
ak , 2
and ak .
d. Plot the result of part c and overlay the plot of f.
ÈΜÈ
H0.2L40,
NMfMS_part1.nb 35
e. Use Theorem 3.1.3 to prove that the absolute error |e(x)| is bounded by 6
where Μ is the value com-
puted in part a.
f. Prove that as the number of subintervals goes to ¥, then error converges to zero.

8. Researchers reporting in a chemical engineering journal reported the following data on tungsten production
as a function of temperature measured in degrees Kelvin.

t 700 800 900 1000


fHtL 0.067 0.083 0.097 0.111

They determined that the data fit to the following function (to 3 decimal places accuracy),

f(t) = 0.02424I 303.16 M


t 1.27591
.

a. Use a cubic polynomial to interpolate the given data. Use this polynomial to estimate the values at t = 750,
850 and 950.
b. Assuming that f is the correct predictor for tungsten, determine the mean absolute error for the three esti-
mates in part a.
c. Again assuming that f is correct, use Theorem 3.1.3 to calculate the estimated error for the cubic polynomial
interpolation as a function of t. Then determine the mean absolute estimated error for the three values of t.
d. Is the actual mean absolute error smaller than the estimated mean absolute error?

2. Bezier Interpolation

Bezier interpolation arose around 1960 to solve a problem in the manufacturing industry. When a new product
is begun, a designer will produce a rendering. Engineers will then produce specifications from the designer's
drawing. In the automobile or aircraft industries, the engineers task was indeed difficult. The designer would
produce a concept drawing of the car or aircraft. From this drawing, the engineers would have to specify the
requirements for the sheets of metal for the exterior, the necessary frame and then they could infer the shapes
and sizes of the spaces for the passenger compartments, the engine compartment, etc. The task was nearly
impossible causing cost overruns and time delays.

The tools at their disposal were primitive. Often they would produce a wooden model of the object, then slice
the wooden object with a saw to create a sequence of cross sections. Next the cross sections where projected on
a large screen at actual size, then these images were traced and measured to yield the data necessary for con-
struction. This was the context when Bezier interpolation was introduced.

The requirement is to infer a parametric cubic curve with end points at B1and B4, and with tangent lines at
these end points passing through B2 and B3. If we designate the parametric curve in R2 as Β(t) = (Β1(t), Β2(t))
with t in [0,1], then these requirements may be stated as

€d Β(0) = 3(B2 - B1), dt


Β(0) = B1, Β(1) = B4; dt €d Β(1) = 3(B4 - B3).

Bezier, an engineer in the French automobile industry came up with following procedure. It is based a geomet-
ric construction. First fix four points, B1, B2, B3, B4, called guide points. Then connect the four guide points
with line segments. (See Figure 2.1a.) Next fix a real t in the interval [0, 1]. Using t, locate the point B11= B1+ t
The requirement is to infer a parametric cubic curve with end points at B1and B4, and with tangent lines at
these end points passing through B2 and B3. If we designate the parametric curve in R2 as Β(t) = (Β1(t), Β2(t))
with t in [0,1], then these requirements may be stated as
36 NMfMS_part1.nb

€d Β(0) = 3(B2 - B1), dt


Β(0) = B1, Β(1) = B4; dt €d Β(1) = 3(B4 - B3).

Bezier, an engineer in the French automobile industry came up with following procedure. It is based a geomet-
ric construction. First fix four points, B1, B2, B3, B4, called guide points. Then connect the four guide points
with line segments. (See Figure 2.1a.) Next fix a real t in the interval [0, 1]. Using t, locate the point B11= B1+ t
(B2 | B1L on the line segment connecting B1 to B2. Similarly, locate B12and B13 between B1, B2 and B3, B4,
respectively, and connect these points with line segments. (See Figure 2.1b) Repeat the process with the three
points B11, B12 and B13 to derive two additional points B21 and B22 on the segments connecting B11 to B12 and
B12 to B13. Finally calculate Β(t) = B21+ t (B22 - B21). (See Figure 2.1c) If we write Β(t) in terms of the original
four points to get

(3.2.1) Β(t) = H1 - tL3 B1 + 3tH1 - tL2 B2 + 3t2 H1 | tL B3 + t3 B4.

Figure 2.1a: Four guide points with segments Figure 2.1b: 2nd -level points with line segments, t = 0.6.
B3
1.4 1.4

1.2 1.2 B12


B2
1.0 1.0
B13
0.8 0.8

0.6 0.6 B11


0.4 B4 0.4

0.2 0.2
B1

1 2 3 4 5 6 1 2 3 4 5 6

Figure2.1c: 3rd -level points with line segments and ΒHtL. Figure 2.1d: The Bezier curve
1.4 1.4

1.2 1.2

1.0 B22 1.0


B21 ΒHtL
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

1 2 3 4 5 6 1 2 3 4 5 6

Letting t vary in the unit interval, (3.2.1) describes a parametric cubic. Figure 2.1d shows the plot of Β(t). The
associated coordinate functions are given by

Β1(t) = Ht - 1L3 x1 + 3t2(1 - t)x2 + 3tH1 | tL2 x3 + t3 x4,

Β2(t) = Ht - 1L3 y1 + 3t2(1 - t)y2 + 3tH1 | tL2 y3 + t3 y4,

where Bi = (xi , yi ).

Just as in the case of the polynomial interpolation, we require four points to do a cubic interpolation. However
in this case the necessary information includes only two points on the curve (the starting point and ending
point) and the slope of the curve at these two points. More complicated curves can be constructed by piecing
Β1(t) = Ht - 1L3 x1 + 3t2(1 - t)x2 + 3tH1 | tL2 x3 + t3 x4,

Β2(t) = Ht - 1L y1 + 3t (1 - t)y2 + 3tH1 | tL y3 + t y4,


NMfMS_part1.nb 37
3 2 2 3

where Bi = (xi , yi ).

Just as in the case of the polynomial interpolation, we require four points to do a cubic interpolation. However
in this case the necessary information includes only two points on the curve (the starting point and ending
point) and the slope of the curve at these two points. More complicated curves can be constructed by piecing
successive Bezier curves together.

Return to the function f (x) = x e-x - 1. We can derive the Bezier interpolation of f by setting B1 = (1, f(1)) and
B4 = (4, f(4)). We use the derivative of f at 1 and 4 to determine the other two guide points. Since the points on
d
function graph are given by (x, f(x)), then the tangent vectors to the graph are dx
(x, f(x)) = (1, f '(x)). Hence, the
tangent vector at B1 is (1, f '(1)). But this vector must also satisfy (1, f '(1)) = 3(B2 - B1). Therefore, B2 = B1 +
€13 (1, f '(1)). Similarly, B3 = B4 - €13 (1, f '(4)). As in the previous cases, the Bezier interpolation of f is a good
approximation of the original curve.

In Section 2.1 we showed a curve for which Newton's method failed because the process cycled, the third
estimated root was equal to the first, the fourth equal to second and so forth. We created this curve using a
Bezier curve. We started with B1= (-1,1) and B4= (1,1). Next, we wanted the slope at B1to be - €12 and equal to
€12 at B4, so that Newton's method would return points (1,0) and (-1, 0). Using the technique described above,
we had

(-1, - €12 ) = 3(B2 - B1) and (1, €12 ) = 3(B4 - B3).

One purpose of interpolating points is to use the interpolating function to compute the integral of a unknown
function inferred from the points. We will see later, that the parametric form of the Bezier curve is significantly
more difficult to deal with than the polynomial or Taylor interpolation.

Exercises:

1. Use the Bezier technique to interpolate f (x) = x e-x - 1 for x Ε [1, 4]. Plot f and Β on the same axes.

1
2. Interpolate f(x) = x2 + 1
between -2 and 2. Use one Bezier curve between -2 and 0 and another between 0 and
2.

3. Do the illustration at the end of Section 2.1. If the curve is given by Β(t) = (Β1(t), Β2(t)), then use the Mathe-
matica statement, Solve[Β1(t) ==x, t], to solve for x in terms of t. Insert the result into Β2(t). The result will give
you an expression for the curve in the form f (x) = y. Use Expand to fully resolve f. Plot f.

4. Given a set of points P0, P1, ..., Pn with Pi = (xi , yi ) in the plane, then the nth Berstein polynomial is given my

pn(t) = Úni=0 H1 - tLi tn-i Pi ,


n
i

n n!
i! Hn-iL!
where = is the usual binomial coefficient.
i
matica statement, Solve[Β1(t) ==x, t], to solve for x in terms of t. Insert the result into Β2(t). The result will give
you an expression for the curve in the form f (x) = y. Use Expand to fully resolve f. Plot f.
38 NMfMS_part1.nb

4. Given a set of points P0, P1, ..., Pn with Pi = (xi , yi ) in the plane, then the nth Berstein polynomial is given my

pn(t) = Úni=0 H1 - tLi tn-i Pi ,


n
i

n n!
i! Hn-iL!
where = is the usual binomial coefficient.
i

a. Prove that the cubic Bezier curve Β(t) defined on four points is identical with the third Berstein polynomial.
b. Prove that for any n, pn(0) = P0 and pn(1) = Pn.
c. Use parts a, b to define a generalization of cubic Bezier curves.

There is a theorem in real analysis that states: any continuous function on a closed interval my be approximated
uniformly by a sequence of Berstein polynomials.

3. Least Squares Fitting

We begin with a word of caution. Least square fitting in the linear case arises also in the context of linear
regression. This is more of a coincidence than anything else. It is true that in both cases a line is fit to a finite
set of points. In addition, the line arises from the same minimization process. Beyond that the processes are
different and distinct. Least squares fitting in the numerical methods context is a procedure that begins with a
set of points, and then guides the researcher to a polynomial, which seems to fit well to the points. On the other
hand linear regression begins with a set of points sampled from a distribution and includes assumptions on the
distribution and the sample. Then a line is inferred. In particular the line is derived by minimizing the variance
of a related distribution. Furthermore, statistics are returned indicating confidence intervals for the slope and y-
intercept of the line. In addition a general statistic is returned, which indicates whether linear regression was a
reasonable approach to the data. In short, least squares fitting is a process that begins with a set of points and
returns a best fitting polynomial. Regression is a statistical process that applies to a sample from a distribution,
fits a line to the sample and returns statistical information about the reliability of the process. In this course we
are concerned only with the former.

In Part 3 we will see that least squares fitting plays an important role in error estimation for the finite element
method.

We begin with points P1, ···, Pn; Pi = (xi , yi ). We are expecting to find a line (y = mx + b) which best fits the
point set. In order to proceed we must define 'best fits'. Indeed, the term least squares refers to the following
definition of 'best fits'. Suppose we were to calculate the vertical distance from each of the points to the line
and then total the squares of all the distances. We will say that a line 'best fits' the point set, if this number (sum
of squared distances) is minimal. Notice that we have described a calculus max/min problem.

With this description in mind we write out the term for the total calculated distance as a function of the slope
and y-intercept of the line. Now, the vertical distance from Pi = (xi , yi ) to the line y = mx + b, is |yi | Hmxi + b)|.
Since we are heading toward a calculus style max/min process, the absolute value is inconvenient. Hence,
square each of these terms and then add to get a function Σ with independent variables m and b.

Σ(m, b) = Úi Hyi | Hm xi + bLL2.


and then total the squares of all the distances. We will say that a line 'best fits' the point set, if this number (sum
of squared distances) is minimal. Notice that we have described a calculus max/min problem.
NMfMS_part1.nb 39

With this description in mind we write out the term for the total calculated distance as a function of the slope
and y-intercept of the line. Now, the vertical distance from Pi = (xi , yi ) to the line y = mx + b, is |yi | Hmxi + b)|.
Since we are heading toward a calculus style max/min process, the absolute value is inconvenient. Hence,
square each of these terms and then add to get a function Σ with independent variables m and b.

Σ(m, b) = Úi Hyi | Hm xi + bLL2.

Next we apply standard max/min techniques to Σ. First, we differentiate the dependent variable with respect to
the independent variable.

¶Σ
¶m
= Úi 2Hyi | Hm xi + bLL(-xi ) = 2 Úi (mxi 2 + (b - yi )xi )

¶Σ
¶b
= Úi 2(yi | Hm xi + bLL = 2 Úi (yi | Hm xi + bL).

Setting these two terms to zero and reorganizing them a little we get

0 = Úi (mxi 2 + (b - yi )xi ) = m (Úi xi 2) + b (Úi xi ) - Úi xi yi

0 = Úi Hyi | Hm xi + bLL = m I-Úi xi M - nb + Úi yi .

Now these two equations can be recast as a 2 by 2 linear system with unknowns m and b.

Úi xi yi = (Úi xi 2)m + (Úi xi ) b

Úi yi . = (Úi xi )m + nb.

Or in matrix notation,

Úi xi 2 Úi xi m Úi xi yi
Úi xi Úi yi
= .
n b

x1 × × × xn
Next set A = . Then it is immediate that
1 ××× 1

Ú xi 2 Úi xi
x1 1
x1 × × × xn
Úi xi
AA = T × × = i
1 ××× 1 n
xn 1

and

Úi xi yi
y1

Úi yi
A × =
yn

Hence we may rewrite the 2 by 2 system as

y1
Úi xi
i i
AAT = × × =
1 ××× 1 n
xn 1
40 NMfMS_part1.nb

and

Úi xi yi
y1

Úi yi
A × =
yn

Hence we may rewrite the 2 by 2 system as

y1
m T
AA =A × ,
b
yn

x1 × × × xn
where A = .
1 ××× 1

This form of the linear system is most suitable for our calculations. It is straight forward to prove that the
coefficient matrix, AAT , is necessarily non-singular provided that no two xi are equal.

To get a feel for how this looks, consider the following example. Suppose we have points P1 = (-5, 3L, P2 =
(-4, 2), P3 = (-2, 7), P4 = (0, 0), P5= (1, 5), P6= (3, 3), P7 = (5, 5). The following figures shows the points and
the resulting least squares line.

Figure 3.1: Seven data points with least squares fit


7

-4 -2 2 4

If we had asked for a quadratic polynomial that best fit the point set, then we would be looking for three coeffi-
cients, a, b and c. Setting up the problem as in the linear case to get Σ( a,b,c), differentiating with respect to the
three variables, setting the resulting terms to zero and solving we would get the following linear system.

a y1
T b =A ×
AA
c yn

where

x12 × × × xn2
A = x1 × × × xn
1 ××× 1
AAT b = A ×
c yn
NMfMS_part1.nb 41

where

x12 × × × xn2
A = x1 × × × xn
1 ××× 1

There are similar expression for the cubic least squares problem, etc. The data shown in Figure 3.1 would seem
to be cubic (see Exercise 1 below).

At the top of the section we mentioned that least squares fitting was separate and distinct from linear regres-
sion. Before ending the section we add some details to that statement. The setting for linear regression starts
with two random variables, X and Y, together with the hypothesis that Y is a linear function of X. In particular
we are supposing that Y = aX + b, where the parameters a and b are to be determined. Then the process is to
select the parameters so as to minimize the variance of Y - X. When you do this calculation against sample data
(supposing that the sample was done with replacement), the process is exactly the same as the one described
above for the linear case. However, within the statistical context, the process returns values that measure the
correctness of the hypothesis and provide confidence intervals for the two parameters. On the other hand on the
linear case is used.

In the numerical methods context there is no means to measure the correctness of the fit and no confidence
intervals for the parameters. However, in Chapter 13 we will see least squares fitting used to approximate the
solution to a partial differential equation. In this case the points that drive the least squares fitting will arise
from numerical processes. We will have a means to measure how well these values approximate the actual
values and then use the least squares process to fill in intermediate values.

We end this section with an important application. Exponential growth is common in biology as well as the
other sciences. For instance bacterial growth is exponential. Epidemics show expontential growth during their
early stages. Exponential growth is characterized by the statement that the rate of change of population size is
proportional to the current size. In particular, if f(t) represents the number of organisms in a bacteria growth at
df
time t, then rate of is proportional to population size means that dt
= Γf(t). Hence, by integrating both sides, we
get f(T) = ΑÙ0 f HtL dt, or f(t) = Αe Βt , where Γ = €ΑΒ and Α = f(0).
T

Next we turn this situation upside down. Suppose we have a pairs (ti , yi ) of data, which because of the setting
we know to be related via an exponential, yi = Αe Β t i , but we do not know Α and Β. We can solve this problem
with least squares fitting. We write y = Αe Βt and take the log of both sides. This yields, Log[y] = Log[Α] + Βt.
Now we see that Log[y] is a linear function of t. Hence, we have the technique. First we take the Log of the yi ,
then we fit these values to the ti using a linear least squares fitting. The result is a line, y = at + b. And b = Β
and Α = ea. Exercise 6 is an example of this sort of problem.

Exercises:

1. Fit the data P1 = (-5, 3L, P2 = (-4, 2), P3 = (-2, 7), P4 = (0, 0), P5= (1, 5), P6= (3, 3), P7 = (5, 5) to a linear
(Figure 3.1), a quadratic and a cubic. In each case calculate the sum of squares from the curve to the points.
Which curve gives the best fit.

2. For the linear case prove that AAT is non-singular provided that there is a pair i and j, with xi ¹ x j , as follows.
a. Let (x1, ..., xn) and (y1, ..., yn) be elements of Rn. Prove that (Úi xi yi L2 = (Úi x2i ) (Úi yi2) -
Exercises:
42 NMfMS_part1.nb

1. Fit the data P1 = (-5, 3L, P2 = (-4, 2), P3 = (-2, 7), P4 = (0, 0), P5= (1, 5), P6= (3, 3), P7 = (5, 5) to a linear
(Figure 3.1), a quadratic and a cubic. In each case calculate the sum of squares from the curve to the points.
Which curve gives the best fit.

2. For the linear case prove that AAT is non-singular provided that there is a pair i and j, with xi ¹ x j , as follows.
a. Let (x1, ..., xn) and (y1, ..., yn) be elements of Rn. Prove that (Úi xi yi L2 = (Úi x2i ) (Úi yi2) -
Úi< j Ixi y j - x j yi M .
2

b. Set each yi = 1 in (a), and conclude that (Úi xi L2 = n (Úi x2i ) - Úi< j Ixi - x j M .
2

c. Prove that AAT is singular if and only if (Úi xi L2 = n (Úi x2i ). Conclude that AAT provided that stated condition
holds.

3. State and prove a result analogous to Exercise 2 for the case of the quadratic least squares fitting.

4. Suppose that P1, ..., Pn are points in R3, Pi = (xi , yi , zi ) and let that f(x, y) = a + bx + cy + dxy. Derive a
procedure to determine a, b, c and d so that Úni=1 ÈÈ Hxi , yi , zi L - (xi, yi, f Hxi, yiL )ÈÈ2 is minimized. Note that we
use ||(xi, yi, ziL|| to denote [x2i + yi2 + z2i D12.

5. Prove for the least squares fit of n+1 points to an nth degree polynomial A is the transpose of the vander-
monde matrix. Conclude the least squares fitting for this case is equivalent to the polynomial interpolation
process described in Section 3.1.

6. The following data is known to be related via an exponential, y = Αe Βt . Use the procedure described in the
section to identify Α and Β.

t 6.221 4.93898 11.4164 9.13563 13.7273 13.3522 10.6354 14.1693 9.67667 16.0862
y 1.66611 2.41672 4.70308 4.62067 7.95687 10.8308 14.5888 31.5388 47.2135 53.374

t 18.2355 13.243 16.2672 20.2108 15.1733 22.8416 23.3023 21.7688 23.9842 25.4667
y 67.8773 98.8846 155.689 274.55 494.796 754.759 758.462 1540.63 2059.3 2380.1

Plot the data along with y = Αe Βt on the same axis.

7. This exercise is an extension of Exercise 8, Section 1. Recall that we had a function f(t) =
M
t 1.27591
0.02424I 303.16 that predicted the amount of tungsten production as a function of temperature. Tempera-
ture is measured in degrees Kelvin. The following data is an extension of the data given before. Plot the data
then by inspection determine the degree of least squares that would fit the data best (degree 1, 2, or 3). Execute
the least squares fitting and use it to predict the three values, f(750). f(850) and f(950). Determine the mean
absolute error and compare the result to the previous result. Which method was better?

t 300 400 500 600 700 800 900 1000 1100


fHtL 0.024 0.035 0.046 0.058 0.067 0.083 0.097 0.111 0.125

t 1200 1300 1400 1500 1600 1700 1800 1900 2000


fHtL 0.14 0.155 0.17 0.186 0.202 0.219 0.235 0.252 0.269
the least squares fitting and use it to predict the three values, f(750). f(850) and f(950). Determine the mean
absolute error and compare the result to the previous result. Which method was better?
NMfMS_part1.nb 43

t 300 400 500 600 700 800 900 1000 1100


fHtL 0.024 0.035 0.046 0.058 0.067 0.083 0.097 0.111 0.125

t 1200 1300 1400 1500 1600 1700 1800 1900 2000


fHtL 0.14 0.155 0.17 0.186 0.202 0.219 0.235 0.252 0.269

4. Cubic Splines and B-Splines

The term spline refers to a large class of curves. Some interpolate the given points while others fit the data.
They are commonly used in many areas of application, including statistics, probability, engineering and com-
puter graphics. One reason for their wide use is that splines exhibit local control. Another advantage to cubic
spline interpolation or fitting over polynomial interpolation is that the curve can simulate asymptotic behavior.
For this reason these curves are often used in statistics to approximate a density function.

In this section we develop two classes of spline, the classical cubic spline and the B-spline. In addition, cubic
Hermite interpolation introduced in Part 2 and further developed in Part 3 is often referred to as cubic orthogo-
nal spline interpolation. Similar to the Bezier curve, a spline is a parametric curve. However, the spline is
formed from several segments where each segment is a parametric cubic. We state precisely,

Definition 3.4.1: Let [a, b] Ì R with a partition a = t0 < t1 < ... < tn = b. A spline Σ defined on [a, b] is a para-
medic curve such that
a. for each i, Σ restricted to [ti-1 , ti ] is a parametric cubic, denoted Σi ,
b. Σi (ti ) = Σi+1 (ti ),
c. Σ is twice differentiable at each ti , i = 1, 2, ..., n - 1.

The points Σ( ti ) are called the knot points. The splines developed in this section will be twice continuously
differentiable at the knot points. Before continuing, we remark that Bezier curves are splines with a single
segment.

The original use of the term spline arose in drafting. A spline was a thin wooden strip that a draftsman bends
and pins on his table to provide a firm edge against which he could draw a curve. As we shall see the paramet-
ric cubic is derived from the physical properties of the wooden object. The result is a linear system with size
equal to the number of points to be interpolated.

We begin with the mathematical analysis of the draftsman's spline. Specifically this spline is a thin elastic strip
(traditionally made of wood), which is attached to a drafting table by pins and bent by weights (called ducks).
Considering the spline as a beam and the ducks as concentrated loads on the beam, then the deformed beam
satisfies the Bernoulli-Euler equation

(3.4.1) EI k(x) = M(x),

where EI is a constant called the coefficient of rigidity, M(x) is the bending moment and k(x) denotes the
curvature of the beam. Further, M(x) is a linear function of x. Letting Σ( x) denote the arc represented by the
bent beam, then the curvature in terms of Σ is given by

Σ''
k(x) = .
I1+Σ' 2 M
32
(3.4.1) EI k(x) = M(x),
44 NMfMS_part1.nb

where EI is a constant called the coefficient of rigidity, M(x) is the bending moment and k(x) denotes the
curvature of the beam. Further, M(x) is a linear function of x. Letting Σ( x) denote the arc represented by the
bent beam, then the curvature in terms of Σ is given by

Σ''
k(x) = .
I1+Σ' 2 M
32

Assumption 1: the beam deflections are small, ||Σ'|| ` 1. Equivalently stated, Σ'is negligible.

Hence, we may suppose that k(x) = Σ''. Since M ''= 0, then the fourth derivative ΣH4L= 0. These leads to following.

Assumption 2: Σ is twice continuously differentiable at each duck and cubic between the ducks.

We are now in a position to solve for Σ.

Suppose that the pins have x-coordinates at a and b and at a = x0< x1< · · · < xn = b.
(3.4.2) On the interval [xi-1, xi ] Σ = Σi , a parametric cubic.
(3.4.3) Σ is twice continuously differentiable at each xi
(3.4.4) Σi (xi ) = yi = Σi+1(xi ).

In turn we set

(3.4.5) Σ'(xi ) = mi and Σ''(xi ) = Mi

Since the second derivative of Σi is linear, then we have

Hxi - xL Hx - xi-1 L
(3.4.6) Σ''i (xL = Mi-1 Di
+ Mi Di
,

where Di = xi - xi-1. Indeed, the right hand side of (3.4.6) is the unique linear function which is Mi-1 at xi-1 and
Mi at xi .

Integrating (3.4.6) twice we get successively

Hxi - xL2 Hx - xi-1 L2


(3.4.7) Σ'i (x) = Mi-1 2 Di
+ Mi 2 Di
+ C,

Hxi - xL3 Hx - xi-1 L3


(3.4.8) Σi (x) = Mi-1 6 Di
+ Mi 6 Di
+ Cx + D.

Evaluating (3.4.8) at xi-1and xi we get

Hxi - xi-1 L3
yi-1 = Mi-1 6 Di
+ Cxi-1 + D,

Hxi - xi-1 L3
yi = Mi 6 Di
+ Cxi + D.

Subtracting yi-1 from yi we have

Hxi - xi-1 L3 Hxi - xi-1 L3


yi - yi-1 = Mi 6 Di
- Mi-1 6 Di
+ C(xi - xi-1)
Evaluating (3.4.8) at xi-1and xi we get

Hxi - xi-1 L3
NMfMS_part1.nb 45
yi-1 = Mi-1 6 Di
+ Cxi-1 + D,

Hxi - xi-1 L3
yi = Mi 6 Di
+ Cxi + D.

Subtracting yi-1 from yi we have

Hxi - xi-1 L3 Hxi - xi-1 L3


yi - yi-1 = Mi 6 Di
- Mi-1 6 Di
+ C(xi - xi-1)

or

yi | yi-1 Hxi - xi-1 L


= (Mi - Mi-1) 6
+ C.
xi | xi-1

Now solving for C by substituting into (3.4.7) we get

Hxi - xL2 Hx - xi-1 L2 yi | yi-1 Hxi - xi-1 L


(3.4.7' ) Σ'i (x) = Mi-1 2 Di
+ Mi 2 Di
+ - (Mi - Mi-1) 6
.
xi | xi-1

Using similar but significantly more complicated calculations we get

yi-1 D yi D
D= Di
xi + Mi-1 6i xi - Di
xi-1 + Mi 6i xi-1.

Hence we have

Hxi - xL3 Hx - xi-1 L3 yi | yi-1 Hxi - xi-1 L yi-1 D yi


(3.4.8' ) Σi (x) = Mi-1 6 Di
+ Mi 6 Di
+ ( - (Mi - Mi-1) 6
)x + Di
xi - Mi-1 6i xi - Di
xi-1 +
xi | xi-1
D
Mi 6i xi-1.

Note that the unknowns in equation (3.4.8') are Mi-1and Mi , and that the equations are linear in these two
variables. That is, (3.4.8' ) describes a linear system of equations. We now organize (3.4.8' ) to get an expres-
sion for the second derivatives (the Mi ) without reference to x, the independent variable, and then write the
linear system in matrix form.

Hxi - xL3 Hx - xi-1 L3 yi | yi-1 Hxi - xi-1 L yi-1 D yi D


Σi (x) = Mi-1 6 Di
+ Mi 6 Di
+ ( - (Mi - Mi-1) 6
)x + Di
xi - Mi-1 6i xi - Di
xi-1 + Mi 6i xi-1 =
xi | xi-1

Hxi - xL3 Hx - xi-1 L3 yi | yi-1 D yi-1 D yi D


Mi-1 6 Di
+ Mi 6 Di
+ D x - (Mi - Mi-1) 6i x + Di
xi - Mi-1 6i xi - Di
xi-1 + Mi 6i xi-1 =
i

Hxi - xL3 Hx - xi-1 L3 yi yi-1 D D yi-1 D yi D


Mi-1 6 Di
+ Mi 6 Di
+Dx - Di
x - Mi 6i x + Mi-1 6i x+ Di
xi - Mi-1 6i xi - Di
xi-1 + Mi 6i xi-1 =
i

Hxi - xL3 Hx - xi-1 L3 yi yi-1 D D yi-1 D yi D


Mi-1 6 Di
+ Mi 6 Di
+Dx - Di
x - Mi 6i x + Mi-1 6i x + Di
xi - Mi-1 6i xi - Di
xi-1 + Mi 6i xi-1 =
i

Hxi - x) - Mi-1 Hxi - x) =


Hxi - xL3 Hx - xi-1 L3 yi D yi-1 Di
Mi-1 6 Di
+ Mi 6 Di
+ D (x - xi-1) - Mi 6i (x - xi-1) + Di 6
i

Hxi - xL3 Hx - xi-1 L3 yi D yi-1 D


Mi-1 6 Di
+ Mi 6 Di
+ (D - Mi 6i )(x - xi-1) + ( Di
- Mi-1 6i )(xi - x).
i
Hxi - xL3 Hx - xi-1 L3 yi yi-1 D D yi-1 D yi D
Mi-1 6 Di
+ Mi 6 Di
+Dx - Di
x - Mi 6i x + Mi-1 6i x+ Di
xi - Mi-1 6i xi - Di
xi-1 + Mi 6i xi-1 =
i

Hxi - xL Hx - xi-1 L3
46 NMfMS_part1.nb
3 yi yi-1 D D yi-1 D yi D
Mi-1 6 Di
+ Mi 6 Di
+Dx - Di
x - Mi 6i x + Mi-1 6i x + Di
xi - Mi-1 6i xi - Di
xi-1 + Mi 6i xi-1 =
i

Hxi - x) - Mi-1 Hxi - x) =


Hxi - xL3 Hx - xi-1 L3 yi D yi-1 Di
Mi-1 6 Di
+ Mi 6 Di
+ D (x - xi-1) - Mi 6i (x - xi-1) + Di 6
i

Hxi - xL3 Hx - xi-1 L3 yi D yi-1 D


Mi-1 6 Di
+ Mi 6 Di
+ (D - Mi 6i )(x - xi-1) + ( Di
- Mi-1 6i )(xi - x).
i

Hence,

Hxi - xL3 Hx - xi-1 L3 yi D yi-1 D


Σi (x) = Mi-1 6 Di
+ Mi 6 Di
+ (D - Mi 6i )(x - xi-1) + ( Di
- Mi-1 6i )(xi - x)
i

and

Hxi+1 - xL3 Hx - xi L3 yi+1 Di+1 yi Di+1


Σi+1(x) = Mi 6 Di+1
+ Mi+1 6 Di+1
+ (D - Mi+1 6
)(x - xi ) + ( Di+1
- Mi 6
)(xi+1 - x)
i+1

Now we differentiate these last two expressions with respect to x, evaluate the derivative at xi and use Σ'i (xi ) =
Σ'i+1(xi ) to get

Hxi - xi-1 L2 yi Di yi-1 Di Hxi+1 - xi L2 yi+1 Di+1 yi Di+1


Mi 2 Di
+D - Mi 6
- Di
+ Mi-1 6
= - Mi 2 Di+1
+ Di+1
- Mi+1 6
- Di+1
+ Mi 6
i

D yi Di yi-1 Di Di+1 yi+1 Di+1 yi Di+1


Mi 2i + Di
- Mi 6
- Di
+ Mi-1 6
= - Mi 2
+ Di+1
- Mi+1 6
- Di+1
+ Mi 6

Now separating the terms with the M's from the others, we have

D Di Di+1 Di+1 yi+1 | yi yi | yi-1


Mi-1 6i + Mi 3
+ Mi 3
+ Mi+1 6
= Di+1
- Di

6
Multiplying by Di+1 + Di
yields

Di Di+1 6 yi+1 | yi yi | yi-1


Di+1 + Di
Mi-1 + 2 Mi + Di+1 + Di
Mi+1 = Di+1 + Di
( D - Di
)
i+1

or

(3.4.9) Μi Mi-1 + 2Mi + Λi Mi+1 = di

where

Di Di+1 6 yi+1 | yi yi | yi-1


Μi = Di+1 + Di
, Λi = Di+1 + Di
, di = Di+1 + Di
( D - Di
).
i+1

The system of equations (3.4.9) has n+1 unknowns and n-1 equations. Setting boundary values (M0and Mn)
yields a square system. Note that setting these boundary values in fact designates the curvature at the end
points. The resulting system is

2 Λ1 0 * 0 0 0 0 M1 d1 | Μ0 M0
where

yi+1 | yi NMfMS_part1.nb 47
Di Di+1 6 yi | yi-1
Μi = Di+1 + Di
, Λi = Di+1 + Di
, di = Di+1 + Di
( D - Di
).
i+1

The system of equations (3.4.9) has n+1 unknowns and n-1 equations. Setting boundary values (M0and Mn)
yields a square system. Note that setting these boundary values in fact designates the curvature at the end
points. The resulting system is

2 Λ1 0 * 0 0 0 0 M1 d1 | Μ0 M0
Μ1 2 Λ2 * 0 0 0 0 M2 d2
0 Μ1 2 * 0 0 0 0 M3 d3
* * * * * * * * * = * .
0 0 0 * Μn-3 2 Λn-3 0 Mn-3 dn-3
0 0 0 * 0 Μn-2 2 Λn-2 Mn-2 dn-2
0 0 0 * 0 0 Μn-1 2 Mn-1 dn-1 | Λn-1 Mn

Notice that spline curves arise from an interpolation process based on a given set of points. Hence, we have a
process similar to polynomial interpolation. However, based on the physical model, splines will have local
control. In particular what took place in Figure 1.3 cannot happen here. This fact lead researchers to generalize
the idea of a spline to get a broader class of curves which on the one hand interpolate point sets and on the
other exhibit local control.

The basic idea is to remove the physical context from the spline while keeping properties 3.4.2 - 3.4.4. In
particular, suppose we have a list of points, Pi = (xi , yi ), i = 0, ... , n. Suppose further that xi £ xi+1. Now seek a
function Σ defined on [x1, xn] with the following properties,

(3.4.10) for each i = 1, 2, ..., n - 1, Σi , the restriction of Σ to [xi , xi+1], is a cubic polynomial,

(3.4.11) Σi (xi+1) = Σi+1(xi+1), Σ'i (xi+1) = Σ'i+1(xi+1), Σ''i (xi+1) = Σ''i+1(xi+1).

In short, the function Σ is locally a cubic polynomial and globally twice continuously differentiable. In turn we
carry this line of thought a step further. We know that P3 , the set of all polynomials of degree less than of
equal to three, is a four dimensional vector space. Does P3 have a basis p1, p2, p3 p4 with the property that for
any x1, x2, x3, x4 and x5 in R,

x1 p1(1) + x2 p2(1) + x3 p3(1) + x4 p4(1) = x2 p1(0) + x3 p2(0) + x4 p3(0) + x5 p4(0)

x1 p'1(1) + x2 p'2(1) + x3 p'3(1) + x4 p'4(1) = x2 p'1(0) + x3 p'2(0) + x4 p'3(0) + x5 p'4(0)

and

x1 p''1(1) + x2 p''2(1) + x3 p''3(1) + x4 p''4(1) = x2 p''1(0) + x3 p''2(0) + x4 p''3(0) + x5 p''4(0)?

If the answer is affirmative, then defining Σ1 = Úi xi pi (t) and Σ2 = Úi xi+1 pi (t - 1) we will have (3.4.10) and
(3.4.11) for the intervals [0, 1] and [1, 2]. Now setting pi (X) = Ú3j=0 Αi j X j so that pi (1) = Αi 0 + Αi 1 + Αi 2 + Αi 3
and pi (0) = Αi 0, the three equations become

(3.4.12) Ú4i=1 Hxi - xi+1)Αi 0 + xi Αi 1 + xi Αi 2 + xi Αi 3 = 0,


and
48 NMfMS_part1.nb
x1 p''1(1) + x2 p''2(1) + x3 p''3(1) + x4 p''4(1) = x2 p''1(0) + x3 p''2(0) + x4 p''3(0) + x5 p''4(0)?

If the answer is affirmative, then defining Σ1 = Úi xi pi (t) and Σ2 = Úi xi+1 pi (t - 1) we will have (3.4.10) and
(3.4.11) for the intervals [0, 1] and [1, 2]. Now setting pi (X) = Ú3j=0 Αi j X j so that pi (1) = Αi 0 + Αi 1 + Αi 2 + Αi 3
and pi (0) = Αi 0, the three equations become

(3.4.12) Ú4i=1 Hxi - xi+1)Αi 0 + xi Αi 1 + xi Αi 2 + xi Αi 3 = 0,

(3.4.13) Ú4i=1 Hxi - xi+1)Αi 1 + 2xi Αi 2 + 3xi Αi 3 = 0,

(3.4.14) Ú4i=1 2 Hxi - xi+1)Αi 2 + 6 xi Αi 3 = 0.

Note that the unknowns are the Αi j and that we may set the xi in any manner we choose. Choosing x1 = x2 = x3
= x4 = 0 and x5 ¹ 0 we get Α4,0 = 0 from (3.4.12). In turn Α4.1 = Α4,2 = 0. Hence p4 = Α4 X 3. Using similar
reasoning we get the remaining polynomials, p1 = Α1(-X 3 + 3X 2 - 3X + 1), p2 = Α2(3X 3 - 6X 2 + 4), p3 =
Α3(-3X 3 + 3X 2 + 3X + 1). By setting each Αi = €16 , we have Ú4i=1 pi = 1. These four polynomials are called the
B-spline basis functions.

The B-spline functions may be used to define a powerful fitting process. Given a list of points, Pi = (xi , yi ), i =
1, ... , n, we define

(3.4.15) Σ j (t) = Ú4i=1 pi (t)P j+i-1 = €16 [( - t3+ 3t2- 3t + 1)P j + (3t3- 6t2+ 4)P j+1 + ( -3t3+ 3t2+ 3t + 1)P j+2+
t3 P j+3],

d2 d2
€d Σ j (1) = dt
where t Ε [0, 1] and j £ n - 3. By construction, Σ j (1) = Σ j+1(0), dt €d Σ j+1(0) and 2 Σ j (1) = 2 Σ j+1(0).
dt dt
Hence the curve formed by joining the n - 3 curves is twice continuously differentiable and each Σ j is a paramet„
ric cubic. We call this curve a B-spline, the given points are called the guide points and the Σ j are called the
spline segments. Finally, the points where the spline segments join are called the knot points.

Now it is easy to check that the B-spline does not interpolate the guide points. However by construction the
following statements hold. Each point on the curve Σ j is a linear combination of the four guide points for the
segment. The sum of the coefficients in the linear combination equals 1. Hence, the curve lies within the
convex hull of the four guide points. (The convex hull of a point set is the smallest convex set that contains the
points.) Therefore we may consider the B-spline as a fitting process. The following diagram illustrates the
convex hull property. The spline segment is generated by the four points.

Figure 4.1: The B-spline segment and the convex hull


2.0
P2
1.5

1.0
P4

0.5

P1
1 2 3 4 5

-0.5
P3
-1.0
segment. The sum of the coefficients in the linear combination equals 1. Hence, the curve lies within the
convex hull of the four guide points. (The convex hull of a point set is the smallest convex set that contains the
points.) Therefore we may consider the B-spline as a fitting process. The following diagramNMfMS_part1.nb
illustrates the 49
convex hull property. The spline segment is generated by the four points.

Figure 4.1: The B-spline segment and the convex hull


2.0
P2
1.5

1.0
P4

0.5

P1
1 2 3 4 5

-0.5
P3
-1.0

In summary, B-splines are parametric cubic curves which fit the given set of points, guide points. Each B-
spline segment is determined by four of the guide points. For instance, segment Σ j is determined by the four
guide points, P j , P j+1, P j+2 and P j+3. In practice each segment traces a curve very close to its four determining
guide points. The B-spline curve (the union of the segments) is twice continuously differentiable. This high
level of smoothness is the reason that B-splines are so often used. In addition, you can predict the effect on the
curve that arises from the change in one of the guide points. This is because of the convex hull property. Any
given guide point is included in the calculation of at most four curve segments, or equivalently at most four of
the bounding convex hulls. Therefore, if you change a guide point, then you can predict the change in the curve
by looking at how the convex hulls change. Figures 4.2 shows a two segment B-spline. The convex hull for the
first segment is formed by P1 P2 P3 P4. For the second segment the convex hull is determined by P2 P3 P5 P4. In
Figure 4.3, we have moved P3.

Figure 4.2: Two B-spline segments with knot point A


2.0
P2

1.5

1.0
P4

0.5

P1 P5
1 2 3 4 5 6
A
-0.5

P3
-1.0
Figure 4.3: Two B-spline segments, P3 is moved to the left
2.0
P2

1.5

1.0
P4

0.5

P1 P5

1 2 3 4 5 6
A
-0.5

P3
-1.0
P1 P5
1 2 3 4 5 6
A
50 NMfMS_part1.nb
-0.5

P3
-1.0
Figure 4.3: Two B-spline segments, P3 is moved to the left
2.0
P2

1.5

1.0
P4

0.5

P1 P5

1 2 3 4 5 6
A
-0.5

P3
-1.0

B-splines improve on polynomial interpolation because of local control. Since there is no need to solve a large
linear system, then B-splines are preferred over cubic splines. Indeed, unless exact interpolation is absolutely
required, B-splines are the technique of choice.

Exercises:

1. Plot the four B-spline basis functions.

2. Complete the derivation of the B-Spline basis functions.

1
3. Compute the B-spline fit to the points (xi , 1+x2i
) for xi = -4, -2, -1, 0, 1, 2, 4. Plot the parametric B-spline
1
against the graph of f (x ) = 1 + x2
. Compare your output with Exercise 3 of Section 1.
4. In Exercise 3, the fourth guide point is (0, 1). Leaving the other guide points unchanged, repeat Exercise 3
using (0, 1.1) for the fourth guide point. Compare the output to the curve in 3.

5. Repeat Exercise 4 with fourth guide point (0, 1.2) or (0.0.9) or (0, 0.8).

6. Repeat Exercise 7 of Section 3 with the same data. This time fit B-Splines to the data and plot the curve on
the same graph with the data. Use the B-Spline to predict values for f(750). f(850) and f(950). Determine the
mean absolute error and compare the result to the result using least squares fitting. Which method was better?
NMfMS_part1.nb 51

Chapter 4 Numerical Differentiation and Integration

Introduction

In this chapter we look at basic constructs of the calculus from the standpoint of numerical methods. The topics
included in this chapter often arise under the title Numerical Analysis.

Finite differences, the discrete form of the derivative, is the entry point into numerical processes to approxi-
mate the solution of a differential equation. In the first section we develop a few of the finite difference formu-
lae and use them to introduce the finite difference method approximation of the one dimensional heat equation.

Numerical integration is also important for approximating the solution to differential equations. In particular
numerical integration is used in finite element method. In this case the integrands have simple indefinite inte-
grals. Numerical integration is preferred because it is faster. On the other hand the integral arising from arc
length is not usually integrable in the indefinite sense. In this case numerical integration is the only alternative.
In addition there is the case of functions that have been approximated by parametric curves such as Bezier
curves or B-splines. It is possible to realize the parametric representation Γ(t) = (Γ1 (t), Γ2 (t)) in the form (x,
f(x)). For instance you set x = Γ1 (t), then use the Solve function in Mathematica to find t in terms of x. Subse-
quently you substitute the result into Γ2 . However the resulting function, f, commonly has no anti-derivative.
Again numerical integration is essential.

Finally we end the chapter with brief development of numerical techniques applied to first order ordinary
differential equations. In particular this material provides another application to numerical integration.

1. Finite Differences and Finite Difference Method

If we start with a function, f, which is real valued and differentiable at x0, then the derivative is given by
f HxL | f Hx0 L f HxL | f Hx0 L
f '(x0L = limx®x0 x | x0
. Hence for x near to x0, the derivative at x0 is approximately equal to x | x0
. This
is the basic idea behind the finite difference. Finite differences are useful because the researcher often knows
values of a function without specifically knowing the function. With finite differences the researcher can infer
approximate values of the derivative of an unknown function. As we see below, finite differences provide a
valuable tool for solving differential equations.

b|a
Let f be real valued function defined on an interval [a,b]. Set Dx = n
and consider the partition a = x0 < x1 <
··· < xn = b, where for each i, xi - xi-1 = Dx. Suppose further that f is three times continuously differentiable, so
that we may write the Taylor expansion for f at xi , and evaluated at xi+1.

1 d2
HDxL2 + 1 d3
HDxL3 + O(HDxL ),
d 4
(4.1.1) f(xi+1 ) = f(xi + Dx) = f(xi ) + dx
f(xi ) (Dx) + 2! dx2
f(xi ) 3! dx3
f(xi )

where J(HDxL4) where this term is derived from the Taylor remainder term and is of the form CHDxL4 for a
constant C. In particular this term converges to zero as HDxL4goes to zero. In turn, we have

1 d2
f(xi ) HDxL2 - 1 d3
f(xi ) HDxL3 + O(HDxL ),
d 4
(4.1.2) f(xi-1 ) = f(xi - Dx) = f(xi ) - f(xi ) (Dx) + 2 3
n
··· < xn = b, where for each i, xi - xi-1 = Dx. Suppose further that f is three times continuously differentiable, so
that we may write the Taylor expansion for f at xi , and evaluated at xi+1.
52 NMfMS_part1.nb

1 d2
HDxL2 + 1 d3
HDxL3 + O(HDxL ),
d 4
(4.1.1) f(xi+1 ) = f(xi + Dx) = f(xi ) + dx
f(xi ) (Dx) + 2! dx2
f(xi ) 3! dx3
f(xi )

where J(HDxL4) where this term is derived from the Taylor remainder term and is of the form CHDxL4 for a
constant C. In particular this term converges to zero as HDxL4goes to zero. In turn, we have

1 d2
HDxL2 - 1 d3
HDxL3 + O(HDxL ),
d 4
(4.1.2) f(xi-1 ) = f(xi - Dx) = f(xi ) - dx
f(xi ) (Dx) + 2! dx2
f(xi ) 3! dx3
f(xi )

In (4.1.1), subtracting f(xi ) from both sides and dividing by Dx yields

f Hxi+1 L | f Hxi L
Hxi L + Hxi L HDxL + Hxi L HDxL2 + O(HDxL3) = Hxi L + O(DxL.
2 3
d f 1 d f 1 d f d f
(4.1.3) Dx
= dx 2! dx2 3! dx3 dx

f Hxi+1 L | f Hxi L
The expression on the left hand side of (4.1.3), Dx
, is called the forward difference of f at xi . The right
hand side of Equation 4.1.3 ensures us that the forward difference converges to the derivative order Dx.

Repeating the same procedure with (4.1.2) we get

f Hxi L | f Hxi-1 L
Hxi L - Hxi L HDxL + Hxi L HDxL2 + O(HDxL3) = Hxi L + O(DxL.
2 3
d f 1 d f 1 d f df
(4.1.4) Dx
= dx 2! dx2 3! dx3 dx

f Hxi L | f Hxi-1 L
As with the forward difference, the left hand side of (4.1.4), Dx
, approximates the derivative of f at xi .
We call this expression the backward difference. Again this value converges to the derivative of f by order Dx.

Next we add (4.1.3) to (4.1.4) and divide by 2 to get

f Hxi+1 L | f Hxi-1 L
Hxi L + Hxi L HDxL2 + O(HDxL3) = Hxi L + O(HDxL2).
3
d f 1 d f d f
(4.1.5) 2 Dx
= dx 3! dx3 dx

The left hand side of (4.1.5) is called the central difference of f at xi . It converges to the derivative order Dx2.
Since the central difference converges faster, O(HDxL2), then we may conclude that the central difference yields
a better approximation of the derivative than either the forward or backward differences.

In turn, if we subtract (4.1.4) from (4.1.3) and divide by Dx we get

f Hxi+1 L | 2 f Hxi L + f Hxi-1 L


Hxi L + O(HDxL2).
d2 f
(4.1.6) 2 = 2
Dx dx

The term on the left side of (4.1.6) approximates the second derivative of f at xi . It is called the second differ-
ence of f at xi . Again the second difference converges O(HDxL2). From the viewpoint of forward, backward and
central, the second difference is considered a central difference.

Notice that when defining the finite differences we used an uniform partition along the x-axis. Generally
speaking, finite differences perform best when the partition is uniform. With the aid of more advanced tech-
niques researchers however do achieve good results with non-uniform partitions. For our purposes here, we
will restrict attention to the uniform case. In addition there are many other forms of the finite difference which
are used in particular circumstances. For a more complete treatment, see Part 3 below.

We now show how finite differences may be used to approximate the solution of a differential equation. The
ence of f at xi . Again the second difference converges O(HDxL2). From the viewpoint of forward, backward and
central, the second difference is considered a central difference.
NMfMS_part1.nb 53

Notice that when defining the finite differences we used an uniform partition along the x-axis. Generally
speaking, finite differences perform best when the partition is uniform. With the aid of more advanced tech-
niques researchers however do achieve good results with non-uniform partitions. For our purposes here, we
will restrict attention to the uniform case. In addition there are many other forms of the finite difference which
are used in particular circumstances. For a more complete treatment, see Part 3 below.

We now show how finite differences may be used to approximate the solution of a differential equation. The
treatment of the finite difference method given here is by no means exhaustive. It will serve to only convince
the reader that finite differences are useful. Consider the following commonly occurring setting. Suppose an
unknown function is defined on an interval of the x-axis and a time interval, u = u(x,t), u:[a,b] × [0,1] ® R and
satisfies the following differential equation.

¶u ¶2 u
(4.1.7) ¶t
=Α ¶x2
.

This equation is called the one dimensional heat equation, as it represents the flow of heat across a thin rod.
However this equation arises in settings, which have nothing to do with heat flow. For now we focus on the
heat flow setting. In this context, the values u = u(x,t) represent temperatures at the location x and time t.

If we partition the intervals [a,b] and [0,1] in a uniform manner, and write u(xi , tn L = uni , then we can recast
(4.1.7) in finite difference form using the forward difference for time and the second difference for space.

un+1
i | uni uni+1 | 2 uni + uni-1
Dt
= Α 2 .
Dx

In the notation of the topic, this rendering is referred to as FTCS, for forward time, central space. Now this
equation can be rearranged as

(4.1.8) un+1
i = Λ uni-1 + (1 - 2Λ) uni + Λ uni+1.

Α Dt
where Λ = 2 . Notice that (4.1.8) states that the temperature at a location and time step is a linear combination
Dx
of the prior temperatures at the location and at the nearest neighbors. Hence, if we knew values for u at t = 0,
then we could use (4.1.8) to calculate values at t = t1. Then by repeating the process we can get values for each
successive time step.

But before doing this we need to ensure that the problem has unique solution. In the literature this is referred to
setting a 'well posed problem'. We generally assume that the underlying physics is deterministic. The require-
ment that the problem have an unique solution is consistent with this assumption. To achieve a deterministic
problem it is necessary for that we know the temperatures at the boundary. In particular we must have prior
knowledge of un0 and unk+1 for each time step. For simplicity we suppose that the boundary temperatures are
time independent. We set un0 = Τ0 and unk+1 = Τk+1 and rewrite the first and the last equations of (4.1.8) as

(4.1.9) un+1
1 - ΛΤ0 = (1 - 2Λ) un1 + Λ un2, un+1
k - ΛΤk+1 = Λ unk-1 + (1 - 2Λ) unk .

Now we recast (4.1.8) and (4.1.9) in matrix format. In particular we show the derivation of the values at the
Hn + 1Lth time step from the nth.

n un+1
0 k+1
time independent. We set un0 = Τ0 and unk+1 = Τk+1 and rewrite the first and the last equations of (4.1.8) as
54 NMfMS_part1.nb
(4.1.9) un+1
1 - ΛΤ0 = (1 - 2Λ) un1 + Λ un2, un+1
k - ΛΤk+1 = Λ unk-1 + (1 - 2Λ) unk .

Now we recast (4.1.8) and (4.1.9) in matrix format. In particular we show the derivation of the values at the
Hn + 1Lth time step from the nth.

un1 un+1
1
1-2Λ Λ 0 * 0 0 0
un2 un+1
2
Λ 1-2Λ Λ * 0 0 0
0 Λ 1-2Λ * 0 0 0 un3 un+1
3 ΛΤ0
(4.1.10) * * * * * * * * = * - * .
0 0 0 * 1-2Λ Λ 0 unk-2 un+1
k-2
ΛΤk+1
0 0 0 * Λ 1-2Λ Λ unk-1 un+1
k-1
0 0 0 * 0 Λ 1-2Λ
unk
un+1
k

Consider the following example. Suppose there is a thin rod initial which is insulated along its length. Suppose
that the temperature is initially zero everywhere, and that the left end is suddenly heated and kept at 20 degrees.
Finally we set Α = €12 . In notation we have set the spatial interval is [-5, 5], Dx = 0.1, Dt = 0.01. For the initial
setting take u(x, 0) = 0 and for the boundary values take u(-5, t) = 20 and u(5, t) = 0. Using (4.1.10) we solve
for approximate values of u along the interval. The following plot shows that local temperatures after 0.1
seconds.

Figure 1.1: Temperatures after 10 iterations


20

15

10

-4 -2 2 4

This is the basic idea behind the finite difference method for solving differential equations. The particular
technique used above is called the explicit or forward Euler method. This and other related techniques remark-
ably successful. However, all these techniques have their limitations. Compare the output from Exercises 1 and
3 below. The change in Λ causes dramatic changes in the output. Indeed, the output of Exercise 3 is impossible.
The difficulty arises because of computational error. In Part 2 we show how Λ is related to the error. In Part 3
we revisit this issue again giving a more comprehensive treatment of computational stability.

Exercises:

1. Solve the following problem using forward Euler method. Take Α = €12 , the spatial interval is [-5, 5], Dx =
0.1, Dt = 0.01. For the initial setting take u(x, 0) = 20 and for the boundary values take u(-5, t) = 0 and u(5, t) =
0. Plot the temperatures at t = 0.03, 0.06, 0.08, 0.1.
The difficulty arises because of computational error. In Part 2 we show how Λ is related to the error. In Part 3
we revisit this issue again giving a more comprehensive treatment of computational stability.
NMfMS_part1.nb 55

Exercises:

1. Solve the following problem using forward Euler method. Take Α = €12 , the spatial interval is [-5, 5], Dx =
0.1, Dt = 0.01. For the initial setting take u(x, 0) = 20 and for the boundary values take u(-5, t) = 0 and u(5, t) =
0. Plot the temperatures at t = 0.03, 0.06, 0.08, 0.1.

2. The following problem develops the backward Euler method.


a. Express (1.7) by using the backward difference in the time variable to yield a linear relation analogous to
(1.8).
b. Write the result of (a) in matrix from. You should get a matrix expression in the form un-1+ ∆ = Aun, where
un denotes the state of the system at t = tn. Now use LinearSolve interactively so get values for each un.
c. Redo problem 1 using the technique that you just developed.

3. Redo problem 1 with Α = 2. Notice that the results are not well behaved. The problem here is that the for-
ward Euler method for this geometric setting is not stable then Λ > 0.5. Stability of transient or time dependent
processes is covered in subsequent material.

x
4. Consider the following function, u(x, y) = x + x2 + y2
defined on the domain D = {(x, y) : -2 £ x £ 2, 0 £ y £
2, x2 + y2 ³ 1}.
a. Graph the domain D.
b. Create a list, PtLst of points in D, so that Pi = (xi , yi ) is in PtLst provided xi = -2 + 0.2m and yi = 0.2n for
some positive integers m and n. Before proceeding Plot the list of points to be certain you have all points in D
whose x and y coordinates step by increments of 0.2.
c. For each i, plot the vector Hxi , yi ) + ÑuHxi , yi ). Use the arrowhead feature when drawing the vectors. You
should see the vector field describing the flow of a non-viscous, incompressible fluid around a semi-circular
obstruction.
d. For the interior points of D, replace the gradient in c with the central difference in the x and y directions,

(uHxi + 0.2, yi ) - uHxi | 0.2, yi ), uHxi , yi + 0.2) - uHxi , yi - 0.2)).

Plot the vector field.


e. Compare the output in c against the output in d. Compute the mean, normed error.

¶u ¶2 u
5. Diffusion of a chemical through a permeable membrane is given by ¶t
=Α ¶x2
, where u(t, x) represents the
concentration of the chemical at location x and time t. The constant Α is determined by the membrane. This
setting arises when modelling the absorption of a substance into a cell. In this case the membrane is the cell
wall. We want to use FDM to model this the diffusion through a membrane and determine when stasis occurs.

Consider a 4 point model, x0, x1 outside the cell and x2, x3 inside the cell. Set Α = 0.05, Dx = 0.25, Dt = 0.01.
Set up a FDM model with x0 and x3 as boundary points. The initial condition will be u(0, x0) = u(0, x1) = 0.075
and u(0, x2) = u(0, x3) = 0.025. We want to determine t and u(t, x2) when stasis has been reached. For our
purposes we define stasis as u(tn+1, x2) – u(tn, x2) < 0.0001.
Consider a 4 point model, x0, x1 outside the cell and x2, x3 inside the cell. Set Α = 0.05, Dx = 0.25, Dt = 0.01.
Set up a FDM model with x0 and x3 as boundary points. The initial condition will be u(0, x0) = u(0, x1) = 0.075
and u(0,
56 x2) = u(0, x3) = 0.025. We want to determine t and u(t, x2) when stasis has been reached. For our
NMfMS_part1.nb

purposes we define stasis as u(tn+1, x2) – u(tn, x2) < 0.0001.

2. Numerical Integration 1: Trapezoid Method and Simpson's Rule

Let f be a bounded real valued function defined on an interval [a,b]. Further suppose we have a partition of [a,
b], a = x0 < x1 < ··· < xn = b. If we know f or at least have the values f(xi ), then we can easily approximate the
integral of f over [a,b]. Indeed, if indeed if f is positive on [a, b] and we join the points (xi , f(xi )) with line
segments then the area under the resulting polygon will approximate the area under f. Since the figure formed
by connecting the four points (xi-1 , 0), (xi , 0), (xi , f(xi )) and (xi-1, f(xi-1)) is a trapezoid, we will get the area
under the polygon as the sum of areas of trapezoids. (See Figure 2.1)

Figure 2.1: Function, polygon and a single trapezoid


4

1.0 1.5 2.0 2.5 3.0 3.5


0

Using the standard formula for the area of a trapezoid (the length of base times the average of the heights) we
have

f Hxi L + f Hxi-1 L
(4.2.1) Ùa f(x) dx » Úni=1
b
2
(xi - xi-1).

Notice that if f is negative on [a, b], then (4.2.1) will evaluate negatively for each trapezoid corresponding to
the integral of a negatively valued function. If f(xi ) = 0, then the corresponding trapezoid degenerates to a right
triangle with area €12 f(xi| 1L(xi -xi-1). Hence, (4.2.1) continues to hold. Finally consider the case where f takes
both positive and negative values. From the previous remarks we need only consider the case where f(xi ) and
f(xi-1) have different sign. In this case there is no trapezoid, that the figure formed by the four points (xi-1 , 0),
(xi , 0), (xi , f(xi )) and (xi-1, f(xi-1)) is the union of two disjoint right triangles. One triangle is above the x-axis
and the other is below. In this case

f Hxi L + f Hxi-1 L f Hxi L f Hxi-1 L


2
(xi - xi-1) = 2
(xi - xi-1) + 2
(xi - xi-1),

the sum of the two areas. All together the right hand side of Equation 4.2.1 provides an approximation for the
integral on the left hand side. This technique is called the trapezoid method. It seems to be intuitively obvious,
easy to implement and hence a reasonable procedure. Later we will learn that it is not as accurate as other less
intuitive processes.

In previous sections we have developed procedures to approximate an unknown by parametric cubics. For these
cases the implementation of the trapezoid method or any of the techniques developed below is considerably
more difficult. Take Γ(t) = (x(t), y(t)). Now suppose that we have Γ(t0) = a, Γ(t1) = b and we are given a parti-
f Hxi L + f Hxi-1 L f Hxi L f Hxi-1 L
2
(xi - xi-1) = 2
(xi - xi-1) + 2
(xi - xi-1),

the sum of the two areas. All together the right hand side of Equation 4.2.1 provides an approximation for the57
NMfMS_part1.nb

integral on the left hand side. This technique is called the trapezoid method. It seems to be intuitively obvious,
easy to implement and hence a reasonable procedure. Later we will learn that it is not as accurate as other less
intuitive processes.

In previous sections we have developed procedures to approximate an unknown by parametric cubics. For these
cases the implementation of the trapezoid method or any of the techniques developed below is considerably
more difficult. Take Γ(t) = (x(t), y(t)). Now suppose that we have Γ(t0) = a, Γ(t1) = b and we are given a parti-
tion a = x0 < x1 < ··· < xn = b of [a, b]. Then for each i, we must first solve x(t) = xi to locate ti . Afterward, we
can evaluate y(ti ) to get the height at xi . Hence, (4.2.1) now becomes

(4.2.2) Ùa f(x) dx » Úni=1


b yi + yi-1
2
(xi - xi-1),

where x(ti ) = xi and y(ti ) = yi . So the problem becomes how to locate the ti . For this purpose we can use New-
ton's method via the FindRoot function.

FindRoot[ x[t] == xi , {t, Τi }];

where Τi is the estimate for ti required by the method. We claim that Τi = ti-1 is a reasonable estimate. First of
all we know t0. Second, in order for the trapezoid method to produce a reasonable estimate of the integral, the
xi must be close together. But if x is continuous or nearly continuous, then the ti will also be close. Neverthe-
less, Newton's method must always be suspect as it may produce a ti , which is not between a = t0 and b = tn.
Therefore, it is best to test the value returned for reasonableness. In particular, we expect
ti to be between a and b. Further it is reasonable to expect that t increases as x increases. We may implement
this in the following manner.

t_val = FindRoot[ x[t] == xi , {t, Τi }];


If [ t_val[[1]][[2]] < Τi || t_val[[1]][[2]] > tn,
exit and print an error statement,
continue with processing
];

As mentioned, one of the important applications for numerical integration is computing arc length. In particu-
lar, if f(x) = y for x in the interval [a, b], then the length of the graph of f is given by Ùa 1 + I f 'HtLM
b 12
dx. But if
the curve is given parametrically, Γ(t) = (Γ1(t), Γ2(t)), then the arc length is known directly from the parametric
formulation as Ùa IΓ1'HtL + Γ2'HtLM
b 12
dt .

Next we look at an alternative to the trapezoid method. Notice that the area of the trapezoid returns the actual
value of the integral when the function is linear. But most functions that we consider have curved graphs, not
polygonal graphics. Simpson's Rule is characterized by the fact that it returns the actual integral for a certain
quadratics. In other words it assume that the graph of the function is more or less a parabolic. First we state
Simpson's rule and then we will verify the claim. For the interval [a + h, a + h], the estimated integral, via
Simpson's rule, is 2h( €16 f(a+h) + €23 f(a) + €16 f(a - h)), If f(x) = Hx - aL2 , then

2h( €16 f (a+h) + €23 f (a) + €16 f(a - h)) = 2h( €16 h 2+ €16 h 2) = €23 h3,

which is exactly the integral of f. Therefore, Simpson's Rule is exact in the case of the quadratic polynomial
value of the integral when the function is linear. But most functions that we consider have curved graphs, not
polygonal graphics. Simpson's Rule is characterized by the fact that it returns the actual integral for a certain
quadratics.
58
In other words it assume that the graph of the function is more or less a parabolic. First we state
NMfMS_part1.nb
Simpson's rule and then we will verify the claim. For the interval [a + h, a + h], the estimated integral, via
Simpson's rule, is 2h( €16 f(a+h) + €23 f(a) + €16 f(a - h)), If f(x) = Hx - aL2 , then

2h( €16 f (a+h) + €23 f (a) + €16 f(a - h)) = 2h( €16 h 2+ €16 h 2) = €23 h3,

which is exactly the integral of f. Therefore, Simpson's Rule is exact in the case of the quadratic polynomial
centered at the interval midpoint. Now we recast the technique in the same format as the trapezoid method.
Suppose we have an interval [a, b] together with a partition, a = x0 < x1 < ··· < xn = b. Then Simpson's rule
states that

(4.2.3) Ùa f(x) dx » Úni=1 ( €16 f Hxi L + €23 f I M + €16 f Hxi-1L)(xi - xi-1).


b xi +xi-1
2

Exercises:

1. Let f(x) = xe-x- 1 and set the interval to [1, 4].


a. Use integration by parts to compute the integral of f on the given interval. Use Mathematica to evaluate the
exponentials.
b. Use the Integrate command in Mathematica. Compare the result with the result in (a).

2. Let f(x) = xe-x- 1 and set the interval to [1, 4]. Consider the partition 1 < 1.5 < 2 < 2.5 < 3 < 3.5 < 4.
a. Compute the integral of f using the trapezoid method for the given partition.
b. Compute the integral of f using Simpson's rule for the given partition. Compare these results with those in 1.

3. Let f(x) = xe-x- 1 and consider the guide points (1, f(1)), (1.5, f(1.5)), (2, f(2)), (2.5, f(2.5)), (3, f(3)), (3.5,
f(3.5)) and (4, f(4)).
a. Use a B-spline to fit this set of guide points. Plot the resulting curve.
b. Beginning with the B-spline, use the trapezoid method to approximate the integral of f. Compare this result
with the output of 1 and 3 above.

4. Use Simpson's rule to compute the arc length of f(x) = xe-x- 1, x Ε [1, 4], with the partition given in Problem
2 above.

5. Compute the arc length of the B-spline derived in Problem 3(a) above. In this case the curve is the join of
four B-spline segments, Σi , i = 1,2,3,4. Compute the length of each B-spline segment individually using a
uniform partition with Dt = 0.1.

3. Numerical Integration 2: Midpoint Method and Gaussian Quadrature

The midpoint is remarkable both because of its simplicity and its accuracy. As before we begin with a function
f defined on an interval [a, b] and a partition a = x0 < x1 < ... < xn = b. For the interval [xi-1, xi D , let Α denote
the midpoint of the interval and 2h the length. The interval now becomes [Α + h, Α + h] and the trapezoid
f HΑ+hL + f HΑ-hL
approximation for the integral is 2h 2
= h(f(Α+h) + f(Α-h)). If we were to replace f(Α+h) + f(Α-h) by
2h f(Α), then the method would be called the midpoint rule. In this case (4.2.1) becomes

(4.3.1) Ùa f(x) dx » Úni=1 f I M(xi


b xi + xi-1
2
- xi-1).
3. Numerical Integration 2: Midpoint Method and Gaussian Quadrature

The midpoint is remarkable both because of its simplicity and its accuracy. As before we beginNMfMS_part1.nb
with a function59
f defined on an interval [a, b] and a partition a = x0 < x1 < ... < xn = b. For the interval [xi-1, xi D , let Α denote
the midpoint of the interval and 2h the length. The interval now becomes [Α + h, Α + h] and the trapezoid
f HΑ+hL + f HΑ-hL
approximation for the integral is 2h 2
= h(f(Α+h) + f(Α-h)). If we were to replace f(Α+h) + f(Α-h) by
2h f(Α), then the method would be called the midpoint rule. In this case (4.2.1) becomes

(4.3.1) Ùa f(x) dx » Úni=1 f I M(xi


b xi + xi-1
2
- xi-1).

If the function is concave up or concave down in the interval [xi-1 , xi ], then the midpoint method is better than
the trapezoid method. More precisely, the absolute error from the midpoint method is smaller than the absolute
error from the trapezoid method. The proof is a simple argument using elementary geometry. Suppose that the
function is concave up in the interval. Consider Figure 3.1a. The error for the midpoint rule is the area between
the horizontal line and the curve. The area to the left of the point A, denoted by Α, is positive and the area on
the right is negative. Denoting the error to the right as -Β, we have the midpoint method error is Α - Β.

Figure 3.1a: The midpoint method Figure 3.1b: with tangent at A

1.4 1.4

1.2 1.2

1.0 1.0 ∆

0.8 Β 0.8

D
0.6 0.6
Α A A B

0.4 0.4 Γ

E
0.2 0.2

0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

If we include the tangent line to f at the midpoint (see Figure 3.1b), then triangles ABC and ADE are congruent
(use side-angle-side). Let Ζ be the common area of the two triangles, then Β = Ζ + ∆ and Α = Ζ - Γ. Hence we
may rewrite the midpoint error as Α - Β = Ζ - Γ - (Ζ + ∆) = -(Γ + ∆). This last term is in fact the error if we had
used the tangent line to compute the approximate integral. Taking absolute values, the absolute midpoint
method error is Γ + ∆.

Figure 3.2a: The trapezoid method Figure 3.2b: with parallelograms


N

1.4 1.4

1.2 1.2

1.0 Η 1.0 Q

S C

0.8 0.8

Ξ
0.6 0.6 M
A A
(use side-angle-side). Let Ζ be the common area of the two triangles, then Β = Ζ + ∆ and Α = Ζ - Γ. Hence we
may rewrite the midpoint error as Α - Β = Ζ - Γ - (Ζ + ∆) = -(Γ + ∆). This last term is in fact the error if we had
60
usedNMfMS_part1.nb
the tangent line to compute the approximate integral. Taking absolute values, the absolute midpoint
method error is Γ + ∆.

Figure 3.2a: The trapezoid method Figure 3.2b: with parallelograms


N

1.4 1.4

1.2 1.2

1.0 Η 1.0 Q

S C

0.8 0.8

Ξ
0.6 0.6 M
A A

0.4 0.4

E
0.2 0.2

0.2 0.4 0.6 0.8 1.0 0.2 0.4 0.6 0.8 1.0

Figure 3.2a shows the trapezoid method error, Ξ + Η, along with the tangent line at the interval midpoint. Next
we construct line segments from M and N parallel to the tangent. (See Figure 3.2b.) We now have two parallelo-
grams, P1: EASM and P2: ACNT. Since the triangles MSQ and QNT are congruent (use angle-side-angle), then
Ξ + Η is equal to the area of P1and P2 that lies above the curve. On the other hand the absolute midpoint method
error is equal to the portion of the area of the two parallelograms that lies below the curve. Since the curve is
concave up, then necessarily, Ξ + Η > Γ + ∆, and we have proved the claim.

The equivalent Mathematica statements, Ù f[x] âx , Integrate[f[x], {x,a,b}]; compute the integral of f(x) by
b

a
computing the anti-derivative g of f (if one exists) and then evaluating g(b) - g(a). Whereas the statement,
NIntegrate[f[x], {x,a,b}]; does a numerical integration process call Gaussian quadrature. Alternately, if Mathe-
matica fails to find an anti-derivative or the function is one of the functions known to not have an anti-deriva-
tive, then Mathematica will automatically execute Gaussian quadrature.

There are many different realizations of quadrature. In its simplist form, Gaussian quadrature, the numerical
integration is

(4.3.2) Ùa f(x) dx » Úni=1 wi f(xi ).


b

The quadrature points xi and weights wi depend on the interval and on n . The Mathematica function

GaussianQuadrature[n,a,b]

will return the list of weights and the corresponding partition for any value of n. In order to access this function
you will need to first execute

<< NumericalDifferentialEquationAnalysis`

This statement loads a library of special functions including GaussianQuadrature Analysis. The format of the
The quadrature points xi and weights wi depend on the interval and on n . The Mathematica function

GaussianQuadrature[n,a,b] NMfMS_part1.nb 61

will return the list of weights and the corresponding partition for any value of n. In order to access this function
you will need to first execute

<< NumericalDifferentialEquationAnalysis`

This statement loads a library of special functions including GaussianQuadrature Analysis. The format of the
output is a list of pairs, Hxi , wi L.

For one point quadrature, the right hand side of (4.3.1) reduces to the midpoint method. For two point quadra-
0.5 0.5
ture the points are at a + Β(b – a) where Β = 0.5 + or 0.5 – . In both of these cases the weights are 1.
3 3

There is a more comprehensive treatment of Gaussian quadrature in Part 3. However, it is useful to discuss
some of the properties of Gaussian quadrature.

Given a function f with at least 2n + 1 continuous derivatives on an interval [a, b], then the n+1 Gaussian
points and weights are determined so that

i. there is a polynomial p which interpolates f at the Gaussian points,


ii. the degree of p is 2n,
Ûn+1 Hx - xi L, where Μ is a value of f H2 n+1L,
Μ
H2 n+1L! i=1
iii. f(x) – p(x) =

i=1 wi f Hxi L.
iv. Ùa pdx = Ún+1
b

Note that the error term in (iii) was derived in Theorem 3.1.3. By (iii) and (iv),

ÈΜÈ ÈΜmax È
|Ùa f dx – Ún+1
i=1 wi f Hxi L| = i=1 Hx - xi L dx| £
|Ùa Ûn+1 Hb - aLn+2,
b b
H2 n+1L! H2 n+1L!

ÈΜmax È
where | Μmax | denotes the maximal value of f H2 n+1L on [a, b]. Setting D = b – a and Ca,b = H2 n+1L!
, we write

(4.3.3) |Ùa f dx – Ún+1


i=1 wi f Hxi L| £ Ca,b D
b n+2
.

Next suppose that we subdivide the interval N times so that each subinterval has length h, the ith subinterval has
end points ai = a + ih and ai+1 = a + (i+1)h and midpoint a + ih
2
. Now if we do 1-point Gaussian quadrature on
each subinterval, then

(4.3.4) Ùa f dx = Úi Ùa f dx » Úi f Ha + ih2 ),
b ai+1
i

with error bounded by Ca,b h2. Hence, the error converges to zero order 2 in h.

One point quadrature (midpoint method) is commonly used to compute the integrals that occur in finite ele-
ment method. Two point quadrature is used in collocation. Both are techniques for computing approximate
solutions of differential equations are developed in Part 2.

Exercises:
with error bounded by Ca,b h2. Hence, the error converges to zero order 2 in h.
62 NMfMS_part1.nb

One point quadrature (midpoint method) is commonly used to compute the integrals that occur in finite ele-
ment method. Two point quadrature is used in collocation. Both are techniques for computing approximate
solutions of differential equations are developed in Part 2.

Exercises:

1. The following is a modified version of an exercise from [Gerald and Wheatley]. In this exercise we deter-
mine the reaction heat of an exothermal reaction. Suppose two substances are placed in a kiln. The first sub-
stance is active producing an exothermal reaction. The second substance is inert. Over a 25 minute period the
temperature of the kiln goes from 86.2 to 126.5 degrees (Fahrenheit). At each minute the temperature of each
substance is measured and the temperature difference is recorded. The results are in the following tables.

t 0 1 2 3 4 5 6 7 8 9 10 11
∆HtL 0. 0.34 1.86 4.32 8.07 13.12 16.8 18.95 18.07 16.69 15.25 13.86

12 13 14 15 16 17 18 19 20 21 22 23 24 25
12.58 11.4 10.33 8.95 6.46 4.65 3.37 2.4 1.76 1.26 0.88 0.63 0.42 0.3

a. Plot the 26 points (t, ∆(t)). Fit a cubic B-spline to the data and display both plots on the same axis. In order to
ensure that the curve extends over the entire domain, duplicate the first and last points when generating the B-
spline. We denote this curve Σ, and the segments of Σ as Σi (s) = (Σ1i (s), Σ2i (s)).

b. The temperature difference ∆ is caused by the reaction. The first t for which ∆(t) > 0 is the starting time of the
reaction, denoted a. To find the end time of the reaction plot the points (t, Log[∆(t)]). You will notice that after
a while the points seem to lie on a line. The value of t for which the plot begins to appear linear is denoted b
and is the end time for the reaction.

c. Estimate the integral Ùa ∆ dt using the trapezoid method.


b

d. Estimate Ùa ∆ dt using the midpoint rule as follows. Subdivide the interval into 10 sub-intervals with length h
b

b-a
= 10
. On each sub-interval estimate the integral via the midpoint rule (1 point Gaussian Quadrature as in
(4.3.4)). Use the points on the B-spline Σ to estimate values of ∆. For instance, in order to get a value ∆(t), you
will need to
i. determine the pertinent segment of Σ,
ii. determine s so that Σi1(s) = t,
iii. evaluate Σi1(s).

Σi2 HsL
d. Evaluate |Μmax| = Maxi,s : >. Estimate the absolute error for the integral.
d

Σi1 HsL
ds
d
ds

The value calculated in c is the reaction heat.


Σi2 HsL
d. Evaluate |Μmax| = Maxi,s : >. Estimate the absolute error for the integral.
d

Σi1 HsL
ds
d
ds NMfMS_part1.nb 63

The value calculated in c is the reaction heat.

4. An Application: First Order ODE

We consider the following first order ordinary differential equation,

d
(4.4.1) dx
u = f (x, u),

where u is a function defined and differentiable on an interval [a, b], and f is a function both of the independent
variable x and the dependent variable u. Our purpose here is to illustrate some of the techniques developed
above. Throughout this section we will suppose that the boundary value, u(a) = u0 is known.

In many cases (4.4.1) is easily solved. For instance if f = f(x) is independent of u, then we may integrate both
sides of (4.4.1) to get

(4.4.2) u(x) - u0 = Ùa f HsL ds.


x

If f has no anti-derivative, then (4.4.2) is resolved for any x by means of techniques introduced in the proceed-
'
ing sections. If f(x,u) = u(x)g(x) and u is positive, then (4.4.1) reduces to €uu = g(x), or Log(u) – Log(u0 ) =
Ùa f HsL ds. Hence, u(x) = u0 ehHxL, where h(x) = Ùa f HsL ds. Again we may need numerical techniques to resolve
x x

values of u.

In each of the preceding cases we were able to separate terms involving u to one side of the equation from
terms not involving u on the other. Further the expression involving u was formally integrable to yield an
expression involving only u. If this is not possible, then there are a host of numerical techniques at our disposal.
We will spend the remainder of this section developing three of these. Other techniques are introduced in Part
2.

We begin by setting a partition of [a, b], a = x0 < x1 < ... < xn = b. By assumption we know u(x0 ). Our goal is to
estimate u(xi ), i > 0. In each case the process will be iterative. We will use knowledge of u(x0 ) to estimate u(x1 )
and then derive the estimate of u(x2 ) from the prior estimate of u(x1 ) and so forth. The obvious flaw is that any
error introduced at the ith iteration is likely compounded at the next.

The simplest technique is called forward Euler. In this case we formally integrate (4.4.1) to get

(4.4.3) u(xi+1 ) - u(xi ) = Ùx f Hx, uHxLL dx.


xi+1
i

The question is how to estimate the integral on the right hand side of (4.4.3). At the i + 1 step in the iteration,
we know u(xi ), and hence we know f(xi , u(xi )). In forward Euler, we replace the integral with f(xi , u(xi ))(xi+1 -
xi ) and set

(4.4.4) u(xi+1 ) = u(xi ) + f(xi , u(xi ))(xi+1 – xi ).

Certainly the error introduced at each iteration of (4.4.4) is a function of (xi+1 - xi ). There is another problem
(4.4.3) u(xi+1 ) - u(xi ) = Ùx f Hx, uHxLL dx.
i

The NMfMS_part1.nb
64 question is how to estimate the integral on the right hand side of (4.4.3). At the i + 1 step in the iteration,
we know u(xi ), and hence we know f(xi , u(xi )). In forward Euler, we replace the integral with f(xi , u(xi ))(xi+1 -
xi ) and set

(4.4.4) u(xi+1 ) = u(xi ) + f(xi , u(xi ))(xi+1 – xi ).

Certainly the error introduced at each iteration of (4.4.4) is a function of (xi+1 - xi ). There is another problem
with (4.4.4). The left hand end point is not a particularly good estimator of the integral. The midpoint would be
better. The following technique, due to Runge, implements improvements in both directions.

We begin by setting Ξi = €12 (xi+1 + xi ), the midpoint of the interval. We use (4.4.4) to estimate u(Ξi ) = u(xi ) +
€12 f(xi , u(xi ))(xi+1 - xi ) as if we were doing Eulers method at the midpoint. Then we estimate the slope as f(Ξi ,
u(Ξi )) and set

(4.4.5) u(xi+1 ) = u(xi ) + f(Ξi , u(Ξi ))(xi+1 - xi L.

Depending on the author, this technique is referred to as the midpoint or Runge method. Notice that the right
hand side of both (4.4.4) and (4.4.5) look like truncated Taylor expansions. It is reasonable to consider the
second degree Taylor expansion for u. To do this we need the second derivative for u. Since we know the
derivative of u, we can derive an expression for the second derivative of u in terms of f. (Provided f is differen-
d2
tiable.) For instance, if f = x u(xL2 , then 2 u = u(xL2 + x f = u(xL2 + x2 u(xL2 . Therefore we can write the second
dx
degree Taylor expansion for u at xi and use that to estimate u(xi+1 ).

(4.4.6) u(xi+1 ) = u(xi ) + f(xi , u(xi ))(xi+1 - xi ) + €12 f '(xi , u(xi ))(xi+1 - xi L2

Notice that (4.4.4) is just the first degree Taylor expansion. We should expect that the estimator in (4.4.6) is
better. In addition, (4.4.5) varies from (4.4.6) in the coefficient of the linear and quadratic terms. Intuitively we
should expect that (4.4.6) is a better estimator. Indeed this is the case. At this point we stop to look at an exam-
ple.

We select a known function, express the differential equation and then use each of the three methods to derive
x x
a solution. Set u(x) = 1 - e- €5 , u' = €15 e- €5 = €15 (1 - u) on the interval [0, 20] with boundary condition u(0) = 0.
We take xi+1 - xi = 2 for each i. We have applied each of the estimators to the differential equation. Figure 4.1
displays the output.

Figure 4.1: Three estimators applied to the ODE u'HxL = 0.2e- €5 on @0, 20D with uH0L = 0,
x

curve -- actual solution


dots -- forward Euler
Φ -- Runge
o -- Taylor
1.2

1.0
Φ Φ
Φ o o
Φ o
Φ o
o
Φ
o
0.8 Φ
o

Φ
o

0.6
Φ
o
We select a known function, express the differential equation and then use each of the three methods to derive
x x
a solution. Set u(x) = 1 - e- €5 , u' = €15 e- €5 = €15 (1 - u) on the interval [0, 20] with boundary condition u(0) = 0.
NMfMS_part1.nb
We take xi+1 - xi = 2 for each i. We have applied each of the estimators to the differential equation. Figure 4.165
displays the output.

Figure 4.1: Three estimators applied to the ODE u'HxL = 0.2e- €5 on @0, 20D with uH0L = 0,
x

curve -- actual solution


dots -- forward Euler
Φ -- Runge
o -- Taylor
1.2

1.0
Φ Φ
Φ o o
Φ o
Φ o
o
Φ
o
0.8 Φ
o

Φ
o

0.6
Φ
o

0.4
Φ
o

0.2

Φ
o
5 10 15 20

The first thing to say about the results is that forward Euler is at best a crude estimator. Secondly, the Runge
method out performed Taylor. There are two contributing factors for this. First, the midpoint rule is an excel-
lent estimator for the integral. Additionally, for the equation at hand, f is independent of u. Therefore we did
not need to use forward Euler to estimate u(Ξi ) in (4.4.5). Notice also that forward Euler gives a high estimate
for the concave down function and that the error accumulates from left to right. Indeed, in all cases, the error is
cumulative. If we needed to use forward Euler to estimate u(Ξi ), these estimates would also be high and would
aversely affect the accuracy of the other two methods. (See Exercise 1 below.)

There is a problem with the Taylor expansion estimator that makes it impractical. The issue lies in the differ-
ence between research software and production software. Suppose that you have an engineering firm and want
to implement an automated system using the second order Taylor expansion to estimate the solution to first
order ODE. Such a system could be written in Mathematica. But Mathematica does not produce production
software. To execute software in Mathematica you must open the notebook, then press ctrl-enter to execute the
program. But when the notebook is open, then a careless key stroke could change the program. The resulting
execution may fail, or worse, it may execute but produce unreliable results. A program written in C or C++ is
complied to produce an executable or dot-exe file. The source code is then held in a protected library. The
technician who runs the software has no need to access the source code. This makes the software reliable. But
these programming languages do not implement symbolic manipulation. The result is that you cannot write this
application in one of these languages.

Exercises:

1. Consider the differential equation u' = xu. Suppose that x lies in the interval [0, 1] and that u(0) = 1.
a. Solve the equation.
b. Estimate the solution using forward Euler, Runge method and a second degree Taylor series.
c. Plot the four treatments of the ODE on the same axis.
technician who runs the software has no need to access the source code. This makes the software reliable. But
these programming languages do not implement symbolic manipulation. The result is that you cannot write this
application
66 in one of these languages.
NMfMS_part1.nb

Exercises:

1. Consider the differential equation u' = xu. Suppose that x lies in the interval [0, 1] and that u(0) = 1.
a. Solve the equation.
b. Estimate the solution using forward Euler, Runge method and a second degree Taylor series.
c. Plot the four treatments of the ODE on the same axis.
NMfMS_part1.nb 67

References

1. Atkinson, Kendall E., An introduction to Numerical Analysis, J. Wiley, 2nd Edition, 1989.

2. Don, Eugene, Schaum's Outline of Theory and Problems of Mathematica, McGraw-Hill, 2008.

3. Hildebrand, F. B., Introduction to Numerical Analysis, 2nd Edition, Dover, 1974.

4. Gerald, C. F. and P. O. Wheatley, Applied Numerical Analysis, 7th Edition, Pearson - Addison Weseley,
2004.

5. Loustau, John and M. Dillon, Linear Geometry with Computer Graphics, Marcel Dekker, 1993.

6. Marsden, Jerrold E., and Anthony J. Tromba, Vector Calculus, 5thEdition, W. H. Freeman, 2003.

7. Skeel, Richard D. and Jerry B. Keiper, Elementary Numerical Computing with Mathematica, McGraw-Hill,
1999.

8. Su, Bu-Qing and Ding-Yuan Liu, Computational Geometry - Curve and Surface Modeling, Academic Press,
1989.

Das könnte Ihnen auch gefallen