Sie sind auf Seite 1von 12

Lecture 10

Real Number Representation and Floating Point Arithmetic


4.1

Number Representation
Integer: The bit pattern represents the numeric value exactly. Real: The bit pattern is an approximation of the numeric value.
m bits integer part n bits fractional part

Fixed point: Floating point:

.
m+n bits to use wherever they are needed

.
4.2

The point moves left or right according to an exponent value.

Real Numbers in Binary


Fractions with powers of two in the denominator are easy:
1 = 1 2 1 = 0.1 2 1 = 1 2 2 = 0.01 4 1 = 1 2 3 = 0.001 8 1 = 1 2 4 = 0.0001 16 3 1 1 = + = 0.11 4 2 4

In fact, to represent a real number with a fractional part in binary, we need to approximate the fractional part by a sum of fractions with powers of two in the denominator.

4.3

Reals: From Decimal to Binary


Example : Convert 0.0937510 to binary.
Integer conversion called for successive divisions by 2; reals with a fractional part will call for a method based on multiplications by 2.

.09375 2 = 0.1875 .1875 2 = 0.375 .375 2 = 0.75

most significant bit (right after the binary point)

0.0937510 = 0.000112

.75 2 = 1.5
.5 2 = 1.0
least significant bit
4.4

First Problem
A number accurately represented in base 10 by a fixed number of digits may lead to a binary representation that requires an infinite number of digits (non-terminating fraction).

.1 2 .2 2 .4 2 .8 2 .6 2 .2 2 .4 2 .8 2

= = = = = = = =

0.2 0.4 0.8 1.6 1.2 0.4 0.8 1.6

0.110 = 0.00011...2
Since the space to store the number will always be limited, what we end up with is an approximation.
4.5

Second Problem
A finite number of digits leads to finite precision. The error in the representation is given by the distance between two representable numbers. Moreover, there is an infinite number of real values between, say, 0 and 1, while there is a finite number of binary values representable with n bits. gap

1 2

1 1 4 8

1 16

1 16

1 8

1 4

1 2

1
4.6

Floating Point Representation


1 bit X= S

X = (1) S M 2 E
E (exponent) M (mantissa)

The sign of the exponent goes into its representation. The exponent defines the range of the number.

The mantissa defines the precision of the number. The exponent can be adjusted so that all the information contained in X ends up in the mantissa bits.

Normalization: 0.0000010111 = 1.0111 2 6


Why use a bit to represent a value which is always known?
4.7

The IEEE 754 Standard (1985)


single = S
1 bit

E (exponent)
8 bits

M (mantissa)
23 bits

double = S E (exponent)
1 bit 11 bits

M (mantissa)
52 bits

The Exponent is represented in Excess B=127 for single and B=1023 for double precision.

X = (1)S (1+ M)2E


4.8

Special Cases
Zero: 0 00000000 1 00000000 Infinity: 0 11111111 1 11111111 NaN: 0 11111111 1 11111111 00000000000000000000000 00000000000000000000000 00000000000000000000000 00000000000000000000000 non-zero mantissa non-zero mantissa
4.9

Div by 0, sqrt(-1)

Start

1. Compare the exponents of the two numbers. Shift the smaller number to the right until its exponent would match the larger exponent

Floating Point Addition

2. Add the significands

3. Normalize the sum, either shifting right and incrementing the exponent or shifting left and decrementing the exponent

Overflow or underflow? No

Yes

Exception

4. Round the significand to the appropriate number of bits

No

Still normalized? Yes Done

4.10

10

Sign

Exponent

Significand

Sign

Exponent

Significand

Small ALU

Compare exponents

Exponent difference 0 1 0 1 0 1 Shift smaller number right

Control

Shift right

Big ALU

Add

1 Normalize

Increment or decrement

Shift left or right

Rounding hardware

Round

Sign

Exponent

Significand

4.11

11

Start

1. Add the biased exponents of the two numbers, subtracting the bias from the sum to get the new biased exponent

Floating Point Multiplication

2. Multiply the significands

3. Normalize the product if necessary, shifting it right and incrementing the exponent

Overflow or underflow? No

Yes

Exception

4. Round the significand to the appropriate number of bits

No

Still normalized? Yes

5. Set the sign of the product to positive if the signs of the original operands are the same; if they differ make the sign negative

4.12
Done

12

Das könnte Ihnen auch gefallen