Beruflich Dokumente
Kultur Dokumente
oating point numbers with base 10 oating point numbers with base 2 IEEE oating point standard machine precision rounding error
16-1
properties a nite set of numbers unevenly spaced: distance between oating point numbers varies the smallest number greater than 1 is 1 + 10n+1 the smallest number greater than 10 is 10 + 10n+2, . . . largest positive number: +(.999 9)10 10emax = (1 10n)10emax smallest positive number: xmin = +(.100 0)10 10emin = 10emin1
16-3
0.25 0.5
1 .5
2 .5
3 .5
largest positive number: xmax = +(.111 1)2 2emax = (1 2n)2emax smallest positive number: xmin = +(.100 0)2 2emin = 2emin1 in practice, the number system includes subnormal numbers : unnormalized small numbers (d1 = 0, e = emin), and the number 0
IEEE oating point numbers 16-5
requires 32 bits: 1 sign bit, 23 bits for mantissa, 8 bits for exponent IEEE standard double precision: n = 53, emin = 1021, emax = 1024
requires 64 bits: 1 sign bit, 52 bits for mantissa, 11 bits for exponent used in almost all modern computers
IEEE oating point numbers 16-6
Machine precision
the machine precision of a binary oating point number system with mantissa length n is dened as M = 2 n example: IEEE standard double precision (n = 53) M = 253 1.1102 1016 interpretation 1 + 2M is the smallest oating point number greater than 1: (.10 01)2 21 = 1 + 21n = 1 + 2M
16-7
Rounding
a oating-point number system is a nite set of numbers; all other numbers must be rounded notation: (x) is the oating-point representation of x rounding rules used in practice numbers are rounded to the nearest oating-point number in case of a ties: round to the number with least signicant bit 0 (round to nearest even) example: numbers x (1, 1 + 2M ) are rounded to 1 or 1 + 2M : (x) = 1 if 1 < x 1 + M (x) = 1 + 2M if 1 + M < x < 1 + 2M another interpretation of M : numbers between 1 and 1 + M are indistinguishable from 1
IEEE oating point numbers 16-8
| (x) x| M | x|
machine precision gives a bound on the relative error due to rounding number of correct (decimal) digits in (x) is roughly log10 M i.e., about 15 or 16 in IEEE double precision fundamental limit on accuracy of numerical computations
16-9
Exercises
explain the following MATLAB results (MATLAB uses IEEE double precision) >> (1 + 1e-16) - 1 ans = 0 >> (1 + 2e-16) - 1 ans = 2.2204e-16 >> (1 - 1e-16) - 1 ans = -1.1102e-16 >> 1 + (1e-16 - 1) ans = 1.1102e-16
IEEE oating point numbers 16-10
run the following code in MATLAB and explain the result x = 2; for i=1:54 x = sqrt(x); end; for i=1:54 x = x^2; end
explain the following results (log(1 + x)/x 1 for small x) >> log(1+3e-16)/3e-16 ans = 0.7401 >> log(1+3e-16)/((1+3e-16)-1) ans = 1.0000
16-11
the function f (x) = 1 for x [1016, 1015], evaluated in MATLAB using ( (1+x)-1 ) / ( 1+(x-1) ) 2
0 1016
1015
16-12