Sie sind auf Seite 1von 15

2

Fractional Numbers
Fractional numbers have the form:
Fractional Number Notations xxxxxxxxx.yyyyyyyyy
where the xes constitute the integer
part of the value and the ys the
Ver. 1.4 fractional part
There are two main methods to encode
fractional numbers:
fixed-point notation
2010 - Claudio Fornaro
floating-point notation

3 4

Fixed-point Notation Fixed-point Notation


Fixed-point notation splits the available If needed:
n bits in 2 portions: the integer part must be padded with 0es
one for the integer part on the left
one for the fractional part the fractional part must be padded with
integer fractional 0es on the right
The radix point is not stored (does not Examples
uses up bits): its position is just known 5.25 in FX on 4+4 bits: 01010100
The number of bits for the integer and 5.25 in FX on 6+2 bits: 00010101
the fractional part are chosen before
making any calculation Radix points are supposed here
5 6

Fixed-point Notation Fixed-point Notation


For relative fractional values, both SM Examples
and 2C notations can be used Convert value +12.25 in FX 2C 1+4+3
The n bits are then divided into 3 parts: 01100010
sign (1 bit)
Convert value 12.25 in FX 2C 1+4+3
integer part (m bits)


01100010 10011110
fractional part (n-m-1 bits)

E.g. 1+7+8 means 1 bit for sign, 7 for the Note: when using the 1st 2C-operation method,
integer part and 8 for the fractional part 1 must be added to the LSB, not to unity place:
01100010
Operations are the same seen as for 10011101+
2C-Operation

integer values, provided that the values 1=


have the same format 10011110

7 8

Exercises Exercises
Convert the values as requested Solutions
151.0 FX 2C on 16 bits (1+8+7) 151.0 0 10010111 0000000

151.25 FX 2C on 16 bits (1+8+7) 1 01101001 0000000
111100101010 from FX 2C (1+7+4) ()10 151.25 0 10010111 0100000
100110011000 from FX 2C (1+6+5) ()10 1 01101000 1100000
Note that the integer part is not the same
Calculate on FX 2C 16 bits (1+7+8) and
111100101010 0 0001101 0110
identify any overflow
13.37510
(111.6 44.57) / 2
100110011000 0 110011 01000
(68.22 71.25) * 64 51.2510
9 10

Exercises Exercises
Solutions Solutions
(111.6 44.57) / 2 (68.22 71.25) * 64
1101111.10011001201101111100110012C 1000100.00111000201000100001110002C
101100.100100012 00101100100100012C 1000111.012 01000111010000002C
11010011011011112C 10111000110000002C
1 0110111110011001+ 1 0100010000111000+

1101001101101111= 1011100011000000=
0100001100001000 0010000110000100 1111110011111000 0011111000000000
+33.515625 OVERFLOW
Radix points are supposed here Radix points are supposed here

11 12

Fixed-point Uses Fixed-point Problems


Fixed-point notation is sometimes used Suppose the following (unsigned) values
by simulating it with the integer notation have to be coded using Fixed-point on a
that microprocessors use (i.e. 2C) total of 8 bits:
This allows faster computations than 37.25 100101.01
operations using floating-point notation 12.625 1100.1010
(intrinsically slower) 5.4375 101.01110
1.2890625 1.0100101
All of them can be coded in 8 bits, but
there is not a unique position for the
radix point suitable for all
13 14

Fixed-point Problems Fixed-point Problems


Suppose you have to represent some Suppose you have to represent some
fractional values 0 x < 8 using a values with fractional part x.0, x.5 or
Fixed-point coding 4+4 bits: x.25 only, using a Fixed-point coding
7.2732 2.3748 5.4375 1.2890 4+4 bits
The first bit is always 0, and the The last two bits are always 00, and the
fractional part is rounded to 4 bits integer part is limited to 15
If we could move the fractional point 1 If we could move the fractional point 2
positions to the left, we could have 1 positions to the right, we could have 2
more bit for precision more bits for the integer part (values
up to 63)

15 16

Fixed-point Problems Exponential Notation


The problem with Fixed-point notation Exponential notation represents a
is the fixed position of the radix point number as a value (mantissa or
To solve this problem, the radix point significand ) that multiplies a whole
must be made movable (floating), this power of the base (exponent )
requires that its position be stored Examples (in decimal): Mantissa
along with each number 123.45678 = 0.12345678103 Exponent
0.0087654321 = 0.8765432110-2
87655678 = 0.87655678108
17 18

Exponential Notation Exponential Notation


Very big and very small values are When the number of digits is not enough
obtained by just varying the exponent to store the whole number only the most
The same value can be expressed in important (leftmost) digits are stored
many forms: The most significant digits are thus
123.45 = 0.12345103 = 1234510-2 preserved, but approximation errors are
Among these forms, form 0.x (x0) is introduced because of truncation
chosen to have a unique representation Example (only 4 decimal digits):
for values, this is called the 0.001234567 0.123410-2=0.001234000
normalized form 876543 0.8765106 = 876500

19 20

Exponential Notation Exponential Notation


The maximum representation error with Example
n digits is 10-n relative to the power of Suppose the value has only 4 decimal digits
the whole part 876543 0.8765106 (normalized)
If the whole part power is m : The whole part is 0106 m =6
= 10-n 10m = 10m-n = 10-4 106 = 102 (maximum error)
which is the power of the rightmost This can also be seen by writing the value
digit (LSD) as a sum of powers:
0106+8105+7104+6103+5102
for this value, the error is:
| 876543 876500 | = 43 (< 102 )
21 22

Exponential Notation IEEE-P754 Floating-Point


Example The IEEE-P754 standard describes the
Suppose the value has only 4 decimal digits most common notations used by
0.001234567 0.123410-2=0.001234000 computer FPUs (Floating-Point Units) to
The whole part is 010-2 m = 2 compute floating-point values
= 10-4 10-2 = 10-6 (maximum error) The two exponential binary floating
Writing the value as a sum of powers: point notations described have the form
010-2+110-3+210-4+310-5+410-6 mantissa 2exponent and are:
for this value, the error is: Single precision (SP)
| 0.001234567 0.001234000 | =
Double precision (DP)
= 0.000000567 (< 10-6 )

23 24

IEEE-P754 Single Precision IEEE-P754 Single Precision


Single precision values uses 32 bits The mantissa (or significand ) is in the
divided in 3 parts: normalized form 1.xxxxx, where the 1
sign: 1 bit before the radix point is the leftmost 1
exponent field: 8 bits (MSB) in the binary representation
mantissa (or significand) field: 23 bit Only the fractional part of the binary
s exponent mantissa mantissa is stored in the mantissa field:
the leftmost 1 is already known to be
The sign bit is defined as follows: present (called hidden bit ), this allows
0 is used for values 0 for one more bit of precision (23 bits
1 is used for values 0 (negative zero!) stored + 1 hidden = 24 bits effective)
25 26

IEEE-P754 Single Precision IEEE-P754 Single Precision


The exponent is a relative integer Excess notation is efficient, especially
value on 8 bits, the IEEE-P754 SP for number comparison
standard does not use SM or 2C The offset value is 2n1 1
notations, but a biased notation called (n is the number of bits) in order to
excess 127: the FP exponent field is consider the first half of the range as
computed by adding constant value 127 negative numbers
(bias constant ) to the exponent of the
normalized value

27 28

IEEE-P754 Single Precision IEEE-P754 Single Precision


Example: +13.2510 IEEE-P754(SP) Example: convert from IEEE-P754(SP)
sign is positive: sign bit = 0 1 01100000 01000000000000000000000
convert the value to binary sign bit = 1
13.25 = 1101.01 extract the mantissa, add the hidden bit,
normalize the value and convert to decimal
1101.01 = 1.1010123 Note: base 2 1.012= 1.2510
compute the exponent by adding 127 to compute the real exponent by subtracting
the real base 2 exponent 127 from the extracted exponent
3+127=130=10000010 1100000 = 96 96127=31
Compose the pieces adding padding 0es compose the parts: -1.2510231 =-5.821010
0 10000010 10101000000000000000000
29 30

IEEE-P754 Single Precision IEEE-P754 Single Precision


The SP decimal range is: The representation error is the
(1.41045 3.410+38) absolute weight of the LSB
The decimal exponent varies from 45 This is computed by multiplying the
to +38, corresponding to a binary weight of the integer part (hidden bit)
exponent from 126 to 127 times the relative weight of the mantissa
LSB (i.e. the weight of the LSB with
Values are approximated to 7 decimal respect to the integer part)
digits (corresponding to the 24 bits used This results in adding the exponents
by the mantissa) 1.10010..1 20 = 20-23 = 2-23
1.10010..1 25 = 25-23 = 2-18
1.10010..1 294 = 294-23 = 271

31 32

IEEE-P754 Single Precision IEEE-P754 Single Precision


The binary exponent varies from 126 Zero
to 127, corresponding to excess 127 Exponent=00000000, Mantissa=0
values from 1 to 254 0/1 0000 0000
by definition, not by computation, because
Exponent values 00000000 (0) and there is not any 1 for normalization
11111111 (255) are used for special Positive and negative are considered
numbers: equivalent
Zeroes Infinity
Infinities Exponent=11111111, Mantissa=0
NaNs 0/1 1111 0000
Denormalized values Operations with infinitives are well defined
33 34

IEEE-P754 Single Precision IEEE-P754 Single Precision


Not a Number (NaN) Special Operations
Exponent=11111111, Mantissa0 N / INF =0
0/1 1111 <not 0000> INF INF = INF
NaNs are used to indicate values that does
N/0 = INF
not represent real numbers
There are 2 types of NaNs: INF + INF = INF
Quiet NaNs: denote indeterminate operations 0/0 = NaN
(mantissa MSB set), the result of an operation INF INF = NaN
is not mathematically defined INF / INF = NaN
Signalling NaNs: denote an invalid operation INF 0 =NaN
(mantissa MSB clear)
Any operation with NaN yields a NaN result

35 36

IEEE-P754 Single Precision IEEE-P754 Double Precision


IEEE-P754 standard allows values in Double precision notation just extends
non-normalized form too (denormalized ) the SP notation to use 64 bits
Exponent=00000000, Mantissa0 The differences are:
Hidden bit is now 0 and not 1 exponent bits: 11
mantissa bits: 52
The exponent value is considered 126

bias constant: 1023


Value is: 0.mantissa 2126 exponent range: 1022, +1023
equivalent decimal range:
(4.910-324 1.710+308)
with 15 decimal digits
denormalized exponent: 1022
37 38

IEEE-P754 Compact Notation IEEE-P754 Exercises


For ease of writing and copying, Convert the following values to/from
floating-point numbers (as any other bit IEEE-P754:
sequence) can be translated to base 16 1324.25 to SP and DP
as they were (they are not!) a pure 0.02324 to SP and DP with an absolute
binary number precision of 1/1000
0 10000000 0010000 40100000 0 10000000 0010000 to decimal
1 01111111 1100000 BFE00000 1 01111111 1100000 to decimal
C3C41000 1100001111000100000100 EB141000 to decimal

39 40

IEEE-P754 Exercises IEEE-P754 Exercises


Solutions Solutions
1324.25 0.02324
10100101100.01 = 1.010010110001210 = 1/1000 n =10 (fractional bits)
10+127 = 137 = 10001001 0.0000010111 = 1.011126
10+1023 = 1033 = 10000001001 6+127 = 121 = 01111001
then: 6+1023 = 1017 = 01111111001
SP: 1 10001001 01001011000100 then:
in compact form: C4A58800 SP: 0 01111001 011100
DP: 1 10000001001 01001011000100 in compact form: 3CB80000
in compact form: C094B10000000000 DP: 0 01111111001 011100
in compact form: 3F97000000000000
41 42

IEEE-P754 Exercises Floating-point Addition


Solutions To add two FP values, these must have
0 10000000 0010000 the same exponent before adding their
+1.00122128-127= 10.012 =+2.25 mantissas: the smaller value is converted
1 01111111 1100000 to have the same exponent as the
1.1122127-127= 1.75 greater (it is de-normalized)
EB141000 = 1 11010110 0010100000100

As the exponent is increased (e.g. by 3),
1.0010122214-127= 1.1562510287=
= 1.1562510287 = 1.15625 280 27= the mantissa must decrease (right shift 3
1 1024 102 = 1026 (approx.) bits) to not change the overall value
the non-approximated value is: 1.01000216 + 1.101000213
1.78921021302965117856514048 1026 1.01000216 + 0.001101216

43 44

Underflow Underflow
If the conversion of the smaller value Example in SP
shifts away all of the mantissa bits 1.101243+ 1.01218
(including the hidden bit), the value is 1.01218 must be converted to the form
xxx243, this causes a right shift of 25
approximated to 0, thus the operation
bits on the mantissa, thus shifting away
result is equal to the greater while the all the 24 mantissa bits and resulting in 0
smaller is just ignored Adding up many small values, it is
There is an underflow condition when, possible that a partial sum becomes so
adding 2 values, the result is equal to big to cause underflow for each of the
the greater of them subsequent values (only the first part of
the values is added up)
45 46

IEEE-P754 Exercises IEEE-P754 Exercises


Calculate the following operations Solution N.1:
(IEEE-P754) and express the result in 2B1A5F20
the same compact form, identify any 0 01111110 00110100101111100100000
Overflow/Underflow: E=01010110=86
2B1A5F20 + 4F1A3BB0 4F1A3BB0
0 10011110 00110100011101110110000
C4A58000 + C2B80000
E=10011110=158
63AB102F 709B1BC2
Difference of exponents= 72
7F600000 + 7F100000 72 > 24 UNDERFLOW
Result: 4F1A3BB0

47 48

IEEE-P754 Exercises IEEE-P754 Exercises


Solution N.2: Solution N.2 (continuation):
C4A58000 De-normalized mantissa of the 2nd value to
1 10001001 01001011000000000000000 have exponent=10 (4 right shifts):
E=10001001 =137 (non biased: 10) 0.00010111
M=1. 01001011 Addition: 1.01001011 210 +
C2B80000 0.00010111 2 10 =
1 10000101 01110000000000000000000 1.01100010 2 10

E=10000101 =133 (non biased: 6) Result:


M=1.0111 1 10001001 01100010000000000000000
Difference of exponents: 137 133= 4 C4B18000
49 50

IEEE-P754 Exercises IEEE-P754 Exercises


Solution N.3 Solution N.4
63AB102F 709B1BC2 7F600000
0 11000111 01010110001000000111111 0 11111110 11000000000000000000000
E=199 E=254 (non biased=127)
0 11100001 00110110001101111000010 7F100000
E=225 0 11111110 00100000000000000000000
Difference of exponents: 225 199 = 26 E=254 (non biased=127)
26 > 24 UNDERFLOW Difference of exponents: 0
Result: F09B1BC2 (SIGN CHANGED!)

51 52

IEEE-P754 Exercises IEEE-P754 Exercises


Solution N.4 (continuation) Calculate in the IEEE-P754 SP format
1.110 2127 + the following operations with DECIMAL
1.001 2127 = numbers, identify any
10.111 2127 Overflow/Underflow:
Renormalization: 1.01112128 92000000010 92000000110
Max exponent is 127 OVERFLOW
Result: (+Infinity)
0 11111111 00000000000000000000000
7F800000
53 54

IEEE-P754 Puzzles IEEE-P754 SP Ranges


Solution: Maximum normalized positive number is
Values differ on the LSB 1.1111112127 with 23 fractional bits
The two numbers have 9 decimal digits
If there were all the bits, the value would
corresponding to about 93=27 bits
be: 1.1111112127 with 127 fractional bits,
After normalization, the relative weight of
the LSB is 2-27 1.1111112127 = 2128 1
Having only 24 bits, power 2-27 is discarded Having just 23 fractional bits, the value is
The two values are considered equal approximated to 1.1111 0000 2127
Result is 0 with 23 fractional bits set to 1 and the
rightmost 12723=104 bits set to 0
104 bits set to 1 are value 2104 1

55 56

IEEE-P754 SP Ranges IEEE-P754 SP Ranges


Maximum normalized positive value: Maximum denormalized positive number
1.1111 0000 2127 = is 0.1111112-126 with 23 fractional bits
(2128 1) (2104 1) =2128 2104 the rightmost bit power is: 12623= 149
3.4028234663852885981170418348452e+38 (2-126 1) (2-149 1) =2-126 2-149
Minimum denormalized positive number
Minimum normalized positive number: is 0.0000012-126 with 23 fractional bits
1.0000002-126
the rightmost bit power is: 12623= 149
1.1754943508222875079687365372222e38
2-149
57 58

IEEE-P754 Puzzles IEEE-P754 Puzzles


Determine the difference between value Determine the range of the consecutive
44A58800 and the next one (44A58801) integer values in SP.
Value in binary is: Values are in the form 1.xxxx with 23
0 10001001 01001011000100 = fractional bits (denormals are not integers)
1.01001011000100210 24 bits (hidden bit included) result in 224
Next one differs for just the LSB: combinations of bits (0 to 2241), each
1.01001011000101210 corresponds to a value and an appropriate
Difference is 1LSB weight = 210-23 = 2-13 exponent makes it an integer value
224 is represented too
Range: 224 +224

59

IEEE-P754 Puzzles
Determine the (absolute) representation
error for value N=61018 in IEEE-P754 SP.
N = 6 1018 6 260 requires 63 bits
N = 1.xxx 262
In SP there are only 23 bits for the mantissa
The relative weight of the LSB is 262-23=39
The representation error is 239

Das könnte Ihnen auch gefallen