Chapter 1 Mathamatical Analysis

By
Oumar Gueye, Ph.D.

Department of Mathematics, University of Manitoba
This chapter covers the following sections of the textbook:
1.2 Representation of numbers on computers
1. 3 Errors in numerical solutions
Math-2120-Winter-2015
Definition 1:
Numerical analysis is the part of applied mathematics that studies the methods using
Numerical approximation for solving mathematical problems (differential calculus, integral
calculus, differential equations, partial differential equations, optimization problems and
so on)
Definition 2:
Numerical methods are mathematical techniques used:
in numerical analysis, to evaluate a definite integral for example,

in numerical Algebra, to solve an equation or system of equations,
and in optimization, to find a local minimum of a given function for example.
Remark:
1)
2)
3)
4)
5)
Because of their power, computers and calculators play a capital role in numerical
analysis.
MATLAB a high level computer language will be used in this course.

Solving a problem analytically (if possible) provides an exact solution, while numerical
methods (very often) provides an approximate solution.
Solving a problem analytically is not always possible
Approximate solutions involve errors due the numerical method used and how the data
are stored on the computer.
Decimal system:
Decimal system uses the following ten digits, 0,1, , 9, to represent any number. A
number in decimal system (base 10) is given by a sequence of these 10 digits
Remark:
Base 10 is one of the most popular numbering system.
A number in base 10 can be written as a sum of multiple of power of ten.
Example 1:
1) 345610 3000 400 50 6

3 103 4 102 5 101 6 100
2) 546.74810 5 102 4 101 6 100 7 101 4 102 8 103
Binary system:
Binary system uses the following 2 digits, 0,1 to represent any number. A number in
binary system (base 2) is given by a sequence of these 2 digits.
Remark:
The popularity of the binary system is due to the rapid development of computer science
and Telecommunication.
Example 2:
1) 11101012
2) 1001.0112
From base 10 to base 2: (For an integer)

1.
2.
3.
4.
Take the decimal number (integer) and divide it by 2 keeping track of the remainder
(instead of decimal place).
Take the result and divide it by two in the same way, always keeping track of the
remainder.
Repeat Step 2 until you reach a result of 0.
Read the remainders (all 0 or 1) off in reverse order starting at the bottom with the one
you just finished. This is the answer.
Remark:
Your last step should always look like 1 / 2 = 0 r 1.
Example 3:
Convert
5310in base 2.
53/2
26/2
13/2
6/2
3/2
1/2
26
13
6
3
1
0
=
=
=
=
=
=
R
R
R
R
R
R
1
0
1
0
1
1
Read from bottom to top
The binary representation of 53 is
1101012
From base 10 to base 2: (For a fractional part)

1.
Multiply the decimal part by 2
2.
Record the integer part of the result obtained at step 1 (that is 0 or 1)
3.
Stop if the fractional part of the result obtained at step 2 is 0, otherwise go to step 4.
4.
Repeat previous step with the fractional part of the result obtained at step 2.
Remark:
The binary representation is given by the sequence of recorded integer part from the first to
the last.
10
Example 4: Convert 87.187510
in base 2
Lets first find 87 in base 2 and next convert 0.1875 in binary
8710 10101112
Result
Fractional part
Integer part
0.1874 x 2
0.375
0.375
0.375 x 2
0.75
0.75
0.75 x 2
1.5
0.5
0.5 x 2
1.00
0.00
Read
from
top to
bottom
0.187510 0.00112
therefore 87.187510 1010111.00112

11
From base 2 to base 10 :

1.
2.
3.
4.
5.
Multiply by 2 the first digit to the left just before the decimal point
0
The next digit is multiply by 21 and repeat the process by increasing the exponent of 2
until the last digit
Multiply by 2 1 the first digit to the right just after the decimal point
The next digit is multiply by 2 2 , and repeat the process by decreasing the exponent of
2 until the last digit.
Add all results to get the number in base 10
12
Example 5:
Convert in decimal the following numbers
1) 1011.01012
2) 1101012
Exercise:
1)
1
Find the binary representation of
5 10
2)
Find the decimal representation of
0.012
13
Definition 3:
A number in decimal floating representation (or scientific representation) is a number
D
written under the form d 0 .d1d 2 d s 10 , where
0 d i 9 for all i 0, , s; d 0 0, D is an integer called the order of magnitude

and 0. d1d 2 d s is called the mantissa
Example 6:
1) - 324. 765 3.24765 102
2) 0.43572 4.3572 101
0.24765 is the mantissa
0.3572 is the mantissa
14
Definition 3:
In binary floating representation a number written as
1. b1b2b3 b,k 2 e where
bi and e {0,1}; 0.b1b2b3 bk is the mantissa and e is the exponent
Example 7:
1) 1.1001 2101
2) 1.01101 210
Exercise:
Find the binary floating representation of
10012
15
Remark:
1)
Computers store numbers in binary floating point.
2)
Each binary digit (0 or 1) is called bit (for binary digit).
3)
A byte is equal to 8 bits.
4)
Since the memory of computers is limited, then all numbers cannot be represented on
computers.
16
Converting a number in binary floating representation:

Let D be a number in base 10. To represent it in binary floating point, one carry out the
following steps:
1.
Find the largest power of 2 which is smaller than D; that is
2.
Normalize D by multiplying and dividing D by
3.
Find respectively the binary representation of the mantissa and the exponent.
2d ;
2d
D
2 d 1.d1d 2 d s 2d
d
2
17
Example 8:
Find the binary floating point of the following numbers
1)
40
2)
0.15625
Remark:
When a number is in binary floating point representation, computers store the sign of the
mantissa, the mantissa and the exponent according to IEEE-754 standard.
18
IEEE-754 standard (Institute of Electrical and Electronics Engineers)

A) Single precision format: 32 bits
Sign (1 bit); 1 for and 0 for +

Exponent (8 bits)
Mantissa (23 bits)
Exponent (8 bits)
Sign
(1 bit)
Mantissa (23 bits)
19
Remark:
1.
2.
3.
4.
Since in binary floating point all numbers start by 1, then this leading 1 is not
represented on the computer.
The largest number that can be stored with 8 bits (for the exponent) is 28 1 255
Numbers from 0 to 255 will be used to represent exponents between -127 and 128.
In single precision format, a bias of 127 is added to any exponent and the result is
stored as the exponent (for example -127 is stored as 0).
20
Example 8:
Find how a computer store the number 16.5 by using single precision format
Lets find the binary floating point of 16.5
Identify the sign, the exponent and the mantissa.
Find exponent + bias in base 2 (that is 131 = 1000 0011)
Write the final result
01000001100001000000000000000000
Exponent
Mantissa
21
IEEE-754 standard (Institute of Electrical and Electronics Engineers)

B) Double precision format: 64 bits
Sign (1 bit)
Exponent (11 bits)
Mantissa (52 bits)
Sign (1 bit)
Exponent (11 bits)
Mantissa (52 bits)
22
Remark:
1.
2.
3.
The largest number that can be stored with 11 bits (for the exponent) is 211 1 2047
Numbers from 0 to 2047 will be used to represent exponents between -1023 and
1024.
In single precision format, a bias of 1023 is added to any exponent and the result is
stored as the exponent (for example -1023 is stored as 0).
23
Example 9:
Find how a computer store the number -50 by using double precision format
Lets find the binary floating point of -50
Identify the sign, the exponent and the mantissa.
Find exponent + bias in base 2 (that is 1028 = 1000 0000100)
Write the final result
1 1 0 0 0 0 0 0 0 1 0 0 1 0 0 1 0
Exponent (11 bits)
0 0
Mantissa (52 bits)
24
Two types of error occur frequently when one deals with numerical methods: Truncation
error and round-off error.
A.
Truncation Error:
It refers to the errors caused by the method itself. This kind of error occurs when the
numerical method uses an approximate mathematical procedure.
Example 10:
Lets use Taylor series to approximate
cos x 1
x2
2!
x4
4!
cos 3
x6
6!
x8
8!
25
Example 10: (continued)
cos 0.5
cos 1
cos 1
2
18
2
18
0.451688
4
1944
ETr 0.5 0.451688 0.048312
0.501796
E Tr 0.5 0.501796 0.001796
26
B.
Round-off Error
Since a finite number of bits is used to store, the mantissa longer than the number of bits
available have to be chopped or rounded. In such case, the true value of the number is not
stored and the error made is called round-off error.
Remark:
Total error = truncation error + rounded-off error

Since the true value is usually unknown then the value of the truncation error is also
unknown.
For some numerical methods the truncation error can be approximated (see next
chapters).
27
END OF CHAP 1
28

Chapter 1 Mathamatical Analysis

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chapter 1 Mathamatical Analysis

Hochgeladen von

Copyright:

Verfügbare Formate

By

Oumar Gueye, Ph.D.

This chapter covers the following sections of the textbook:

1.2 Representation of numbers on computers

1. 3 Errors in numerical solutions

in numerical analysis, to evaluate a definite integral for example,

and in optimization, to find a local minimum of a given function for example.

MATLAB a high level computer language will be used in this course.

Base 10 is one of the most popular numbering system.

A number in base 10 can be written as a sum of multiple of power of ten.

1) 345610 3000 400 50 6

2) 546.74810 5 102 4 101 6 100 7 101 4 102 8 103

From base 10 to base 2: (For an integer)

Read from bottom to top

The binary representation of 53 is

From base 10 to base 2: (For a fractional part)

Multiply the decimal part by 2

Record the integer part of the result obtained at step 1 (that is 0 or 1)

Example 4: Convert 87.187510

Lets first find 87 in base 2 and next convert 0.1875 in binary

therefore 87.187510 1010111.00112

From base 2 to base 10 :

Find the decimal representation of

0 d i 9 for all i 0, , s; d 0 0, D is an integer called the order of magnitude

1) - 324. 765 3.24765 102

2) 0.43572 4.3572 101

0.24765 is the mantissa

0.3572 is the mantissa

1. b1b2b3 b,k 2 e where

bi and e {0,1}; 0.b1b2b3 bk is the mantissa and e is the exponent

Computers store numbers in binary floating point.

Each binary digit (0 or 1) is called bit (for binary digit).

A byte is equal to 8 bits.

Converting a number in binary floating representation:

Find the largest power of 2 which is smaller than D; that is

Normalize D by multiplying and dividing D by

IEEE-754 standard (Institute of Electrical and Electronics Engineers)

Sign (1 bit); 1 for and 0 for +

Mantissa (23 bits)

Lets find the binary floating point of 16.5

Identify the sign, the exponent and the mantissa.

Find exponent + bias in base 2 (that is 131 = 1000 0011)

Write the final result

IEEE-754 standard (Institute of Electrical and Electronics Engineers)

Exponent (11 bits)

Mantissa (52 bits)

Lets find the binary floating point of -50

Identify the sign, the exponent and the mantissa.

Find exponent + bias in base 2 (that is 1028 = 1000 0000100)

Write the final result

Mantissa (52 bits)

Example 10: (continued)

ETr 0.5 0.451688 0.048312

E Tr 0.5 0.501796 0.001796

Total error = truncation error + rounded-off error

Das könnte Ihnen auch gefallen