The Conversion Procedure (Decimal To Floating Point)

The Conversion Procedure [Decimal to Floating Point]
The rules for converting a decimal number into floating point are as follows:
A. Convert the absolute value of the number to binary, perhaps with a fractional part after
the binary point. This can be done by converting the integral and fractional parts
separately. The integral part is converted with the techniques examined previously. The
fractional part can be converted by multiplication. This is basically the inverse of the
division method: we repeatedly multiply by 2, and harvest each one bit as it appears left
of the decimal.
B. Append × 20 to the end of the binary number (which does not change its value).
111001.1
1.110011 x 26
C. Normalize the number. Move the binary point so that it is one bit from the left. Adjust the
exponent of two so that the value does not change.
D. Place the mantissa into the mantissa field of the number. Omit the leading one, and fill
with zeros on the right.
E. Add the bias to the exponent of two, and place it in the exponent field. The bias is
2k−1 − 1, where k is the number of bits in the exponent field. For the eight-bit format,
k = 3, so the bias is 23−1 − 1 = 3. For IEEE 32-bit, k = 8, so the bias is 28−1 − 1 = 127.
F. Set the sign bit, 1 for negative, 0 for positive, according to the sign of the original
number.
Using The Conversion Procedure
 Convert 2.625 to our 8-bit floating point format.
A. The integral part is easy, 210 = 102. For the fractional part:
Generate 1 and continue

0.625 × 2 = 1.25 1
with the rest.
0.25 × 2 = 0.5 0 Generate 0 and continue.
Generate 1 and nothing
0.5 × 2 = 1.0 1
remains.
B. So 0.62510 = 0.1012, and 2.62510 = 10.1012.

C. Add an exponent part: 10.1012 = 10.1012 × 20.
D. Normalize: 10.1012 × 20 = 1.01012 × 21.
E. Mantissa: 0101
F. Exponent: 1 + 3 = 4 = 1002.
G. Sign bit is 0.
The result is 0100 0101. Represented as hex, that is 4516.
 Convert -4.75 to our 8-bit floating point format.
A. The integral part is 410 = 1002. The fractional:
0.75 × 2 = 1.5 1 Generate 1 and continue with the rest.

0.5 × 2 = 1.0 1 Generate 1 and nothing remains.
B. So 4.7510 = 100.112.
C. Normalize: 100.112 = 1.00112 × 22.
D. Mantissa is 0011, exponent is 2 + 3 = 5 = 1012, sign bit is 1.
So -4.75 is 11010011 = d316
b. Convert 0.40625 to our 8-bit floating point format.

A. Converting:
0.40625 ×2= 0.8125 0 Generate 0 and continue.

0.8125 ×2= 1.625 1 Generate 1 and continue with the rest.
0.5 ×2= 1.0 1 Generate 1 and nothing remains.
B. So 0.4062510 = 0.011012.
C. Normalize: 0.011012 = 1.1012 × 2-2.
D. Mantissa is 1010, exponent is -2 + 3 = 1 = 0012, sign bit is 0.
So 0.40625 is 00011010 = 1a16
c. Convert -12.0 to our 8-bit floating point format.

A. 1210 = 11002.
B. Normalize: 1100.02 = 1.12 × 23.
C. Mantissa is 1000, exponent is 3 + 3 = 6 = 1102, sign bit is 1.
So -12.0 is 11101000 = e816
d. Convert decimal 1.7 to our 8-bit floating point format.

A. The integral part is easy, 110 = 12. For the fractional part:

…
B. The reason why the process seems to continue endlessly is that it does. The
number 7/10, which makes a perfectly reasonable decimal fraction, is a repeating
fraction in binary, just as the faction 1/3 is a repeating fraction in decimal. (It
repeats in binary as well.) We cannot represent this exactly as a floating point
number. The closest we can come in four bits is .1011. Since we already have a
leading 1, the best eight-bit number we can make is 1.1011.
C. Already normalized: 1.10112 = 1.10112 × 20.
D. Mantissa is 1011, exponent is 0 + 3 = 3 = 0112, sign bit is 0.
The result is 00111011 = 3b16. This is not exact, of course. If you convert it back to
decimal, you get 1.6875.
e. Convert -1313.3125 to IEEE 32-bit floating point format.


B. So 1313.312510 = 10100100001.01012.
C. Normalize: 10100100001.01012 = 1.010010000101012 × 210.
D. Mantissa is 01001000010101000000000, exponent is 10 + 127 = 137 =
100010012, sign bit is 1.
So -1313.3125 is 11000100101001000010101000000000 = c4a42a0016
f. Convert 0.1015625 to IEEE 32-bit floating point format.

A. Converting:

B. So 0.101562510 = 0.00011012.
C. Normalize: 0.00011012 = 1.1012 × 2-4.
D. Mantissa is 10100000000000000000000, exponent is -4 + 127 = 123 =
011110112, sign bit is 0.
So 0.1015625 is 00111101110100000000000000000000 = 3dd0000016
g. Convert 39887.5625 to IEEE 32-bit floating point format.


B. So 39887.562510 = 1001101111001111.10012.
C. Normalize: 1001101111001111.10012 = 1.00110111100111110012 × 215.
D. Mantissa is 00110111100111110010000, exponent is 15 + 127 = 142 =
100011102, sign bit is 0.
So 39887.5625 is 01000111000110111100111110010000 = 471bcf9016
The Conversion Procedure

The rules for converting a floating point number into decimal are simply to reverse of the
decimal to floating point conversion:
A. If the original number is in hex, convert it to binary.
B. Separate into the sign, exponent, and mantissa fields.
C. Extract the mantissa from the mantissa field, and restore the leading one. You may also
omit the trailing zeros.
D. Extract the exponent from the exponent field, and subtract the bias to recover the actual
exponent of two. As before, the bias is 2k−1 − 1, where k is the number of bits in the
exponent field, giving 3 for the 8-bit format and 127 for the 32-bit.
E. De-normalize the number: move the binary point so the exponent is 0, and the value of
the number remains unchanged.
F. Convert the binary value to decimal. This is done just as with binary integers, but the
place values right of the binary point are fractions.
G. Set the sign of the decimal number according to the sign bit of the original floating point
number: make it negative for 1; leave positive for 0.
If the binary exponent is very large or small, you can convert the mantissa directly to decimal
without de-normalizing. Then use a calculator to raise two to the exponent, and perform the
multiplication. This will give an approximate answer, but is sufficient in most cases.
Examples Using The Conversion Procedure

 Convert the 8-bit floating point number e7 (in hex) to decimal.
A. Convert: e716 = 111001112.
B. Seprate: 11100111
C. Mantissa: 1.0111
D. Exponent: 1102 = 610; 6 − 3 = 3.
E. De-normalize: 1.01112 × 23 = 1011.1
F. Convert:
Exponents 23 22 21 20 2-1
Place Values 8 4 2 1 0.5
Bits 1 0 1 1 . 1
Value 8 + 2 + 1 + 0.5 = 11.5
G. Sign: negative.
Result: e7 is -11.5
 Convert the 8-bit floating point number 26 (in hex) to decimal.
A. Convert and separate: 2616 = 00100110 2

B. Exponent: 0102 = 210; 2 − 3 = -1.
C. Denormalize: 1.0112 × 2-1 = 0.1011.
D. Convert:
Exponents 20 2-1 2-2 2-3 2-4
Place Values 1 0.5 0.25 0.125 0.0625
Bits 0 .1 0 1 1
Value 0.5 + 0.125 + 0.0625 = 0.6875
E. Sign: positive
Result: 26 is 0.6875.
b. Convert the 8-bit floating point number d3 (in hex) to decimal.

A. Convert and separate: d316 = 11010011 2
B. Exponent: 1012 = 510; 5 − 3 = 2.
C. Denormalize: 1.00112 × 22 = 100.11.
D. Convert:
Exponents 22 21 20 2-1 2-2

Place Values 4 2 1 0.5 0.25
Bits 1 0 0 . 1 1
Value 4 + 0.5 + 0.25 = 4.75
E. Sign: negative
Result: d3 is -4.75.
c. Convert the 32-bit floating point number 44361000 (in hex) to decimal.
B. Exponent: 100010002 = 13610; 136 − 127 = 9.
C. Denormalize: 1.011011000012 × 29 = 1011011000.01.
D. Convert:
Exponents 29 28 27 26 25 24 23 22 21 20 2-1 2-2

Place
512 256 128 64 32 16 8 4 2 1 0.5 0.25
Values
Bits 1 0 1 1 0 1 1 0 0 0 .0 1
Value 512 + 128 + 64 + 16 + 8 + 0.25 = 728.25
E. Sign: positive
Result: 44361000 is 728.25.

d. Convert the 32-bit floating point number be580000 (in hex) to decimal.
A. Convert and separate: be58000016 = 10111110010110000000000000000000 2
B. Exponent: 011111002 = 12410; 124 − 127 = -3.
C. Denormalize: 1.10112 × 2-3 = 0.0011011.
D. Convert:
Exponent 0 -1 -2
2 2 2 2-3 2-4 2-5 2-6 2-7
s
Place 0.2 0.12 0.062 0.0312 0.01562 0.007812
1 0.5
Values 5 5 5 5 5 5
Bits 0 .0 0 1 1 0 1 1
0.12 0.062 0.01562 0.007812 0.210937
Value + + + =
5 5 5 5 5
E. Sign: negative
Result: be580000 is -0.2109375.
e. Convert the 32-bit floating point number a3358000 (in hex) to decimal.
A. Convert and separate: a335800016 = 10100011001101011000000000000000 2
B. Exponent: 010001102 = 7010; 70 − 127 = -57.
C. Since the exponent is far from zero, convert the original (normalized) mantissa:
Expone 0 -1 -2
2 2 2 2-3 2-4 2-5 2-6 2-7 2-8
nts
Place 0. 0.2 0.12 0.06 0.031 0.0156 0.0078 0.00390
1
Values 5 5 5 25 25 25 125 625
Bits 1 .0 1 1 0 1 0 1 1
0.2 0.12 0.031 0.0078 0.00390 1.41796
Value 1 + + + + + =
5 5 25 125 625 875
D. Use calculator to find 1.41796875 × 2-57. You should get something like
9.83913471531 × 10-18 .
E. Sign: negative
Result: a3358000 is about -9.83913471531 × 10-18 .
f. Convert the 32-bit floating point number 76650000 (in hex) to decimal.
B. Exponent: 111011002 = 23610; 236 − 127 = 109.
C. Since the exponent is far from zero, convert the original (normalized) mantissa:
Exponents 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7

Place 0.2 0.12 0.0312
1 0.5 0.0625 0.015625 0.0078125
Values 5 5 5
Bits 1 . 1 1 0 0 1 0 1
0.2 0.0312 1.789062
Value 1 + 0.5 + + + 0.0078125 =
5 5 5
D. Use calculator to find 1.7890625 × 2109. You should get something like
1.16116794981 × 1033 .
E. Sign: positive
Result: 76650000 is about 1.16116794981 × 1033 .

The Conversion Procedure (Decimal To Floating Point)

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

The Conversion Procedure (Decimal To Floating Point)

Hochgeladen von

Copyright:

Verfügbare Formate

The Conversion Procedure [Decimal to Floating Point]

Generate 1 and continue

B. So 0.62510 = 0.1012, and 2.62510 = 10.1012.

The result is 0100 0101. Represented as hex, that is 4516.

 Convert -4.75 to our 8-bit floating point format.

A. The integral part is 410 = 1002. The fractional:

0.75 × 2 = 1.5 1 Generate 1 and continue with the rest.

So -4.75 is 11010011 = d316

b. Convert 0.40625 to our 8-bit floating point format.

0.40625 ×2= 0.8125 0 Generate 0 and continue.

So 0.40625 is 00011010 = 1a16

c. Convert -12.0 to our 8-bit floating point format.

So -12.0 is 11101000 = e816

d. Convert decimal 1.7 to our 8-bit floating point format.

0.7 ×2= 1.4 1 Generate 1 and continue with the rest.

e. Convert -1313.3125 to IEEE 32-bit floating point format.

0.3125 ×2= 0.625 0 Generate 0 and continue.

So -1313.3125 is 11000100101001000010101000000000 = c4a42a0016

f. Convert 0.1015625 to IEEE 32-bit floating point format.

0.1015625 ×2= 0.203125 0 Generate 0 and continue.

So 0.1015625 is 00111101110100000000000000000000 = 3dd0000016

g. Convert 39887.5625 to IEEE 32-bit floating point format.

0.5625 ×2= 1.125 1 Generate 1 and continue with the rest.

So 39887.5625 is 01000111000110111100111110010000 = 471bcf9016

The Conversion Procedure

Examples Using The Conversion Procedure

 Convert the 8-bit floating point number 26 (in hex) to decimal.

A. Convert and separate: 2616 = 00100110 2

b. Convert the 8-bit floating point number d3 (in hex) to decimal.

Exponents 22 21 20 2-1 2-2

Exponents 29 28 27 26 25 24 23 22 21 20 2-1 2-2

Result: 44361000 is 728.25.

Result: be580000 is -0.2109375.

Result: a3358000 is about -9.83913471531 × 10-18 .

Exponents 20 2-1 2-2 2-3 2-4 2-5 2-6 2-7

Result: 76650000 is about 1.16116794981 × 1033 .

Das könnte Ihnen auch gefallen