Sie sind auf Seite 1von 49

Computer Arithmetic

Adder Performance, Multiply,


Shift & floating point
( App. C5, C6 4th ed.)
32 32
A B 4
c ALU m
ovf
S
32

3/20/20
18 arithmetic.1
1-bit adder Review (Appendix B.5, B.6)
CarryIn

CinCarryIn
a

a a
Sum
Sum
1unit of b

b delay from
b Cin to sum CarryOut

Cin
sum Carry out
Co
CarryOut
2 gate delays
2 units of
A
delay from
B A/B to sum

Sum = a!bc! + ab!c! + a!b!c+abc


=a b c = XOR

3/20/201
Carryout = a!bc + ab!c + abc! + abc
8 arithmetic.2
Binvert Operation
CarryIn

a
1-bit ALU: 0

1
AND, OR, Result

a+b, a+b! b 0
1
2

Less 3

a. CarryOut
ALU Delays
Result = 1 gate delay
Binvert Operation From a to result = 2
CarryIn
Most a
Form b to Result = 2
0 (ignore b invert)
significant
1
bit
Result
b 0 2
1
Less 3
Set

Overflow Overflow
3/20/201
detection
8 b. arithmetic.3
Bnegate Operation

a0
32-bit ALU b0 ALU0
Less
Result0

+ CarryOut

zero detect Cin


a1 Result1

+ b1
0
ALU1
Less
Zero
SLT support CarryOut
Cin
a2 Result2
b2 ALU2
CarryIn 0 Less

a
0 CarryOut
1
Result
b 0 2
1 Cin
Result31
Less 3 a31
b31 ALU31 Set
0 Less Overflow
a. CarryOut

For SLT
3/20/201
8 arithmetic.4
Overflow ?? - 4-bit example
Decimal Binary Decimal 2’s Complement
0 0000 0 0000
1 0001 -1 1111
2 0010 -2 1110
3 0011 -3 1101
4 0100 -4 1100
5 0101 -5 1011
6 0110 -6 1010
7 0111 -7 1001
-8 1000
• Examples: 7 + 3 = 10 but ...
• - 4 - 5 = - 9 but ...
0 1 1 1 1
0 1 1 1 7 1 1 0 0 –4
+ 0 0 1 1 3 + 1 0 1 1 –5

1 0 1 0 –6 0 1 1 1 7

3/20/201
8 arithmetic.5
Overflow Detection
• Overflow: result too large (or too small) to represent properly
– Example: - 8  4-bit binary number  7
• When adding operands with different signs, overflow cannot occur!
• Overflow occurs when adding:
– 2 positive numbers and sum is negative
– 2 negative numbers and the sum is positive
• On your own: Prove you can detect overflow by:
– Carry into MSB  Carry out of MSB

0 1 1 1 1 0
0 1 1 1 7 1 1 0 0 –4
+ 0 0 1 1 3 + 1 0 1 1 –5

1 0 1 0 –6 0 1 1 1 7

3/20/201
8 arithmetic.6
Overflow Detection Logic
• Carry into MSB  Carry out of MSB
– For a N-bit ALU: Overflow = CarryIn[N - 1] XOR CarryOut[N - 1]
CarryIn0

A0 1-bit Result0 X Y X XOR Y


B0 ALU
0 0 0
CarryIn1 CarryOut0
A1 0 1 1
1-bit Result1
ALU 1 0 1
B1
1 1 0
CarryIn2 CarryOut1
A2 1-bit Result2
B2 ALU
CarryIn3
A3 Overflow
1-bit Result3
B3 ALU

CarryOut3

3/20/201
8 arithmetic.7
CarryIn

Ripple Adder Performance?


a
Critical Path of n-bit adder is n*CP
b
1unit delay
CarryIn0 from Cin to
CarryOut sum
A0 1-bit Result0 sum
B0 ALU Cin
CarryIn1 CarryOut0
A1 A 2 unit delay
1-bit Result1
ALU B
from A/B
B1
to sum
CarryIn2 CarryOut1
A2
Very slow:
1-bit Result2 Must improve
B2 ALU
Assume t = carry delay / bit
CarryIn3 CarryOut2
A3
32- bit ALU needs
1-bit Result3 32 * t units of delay
B3 ALU
64-bit ALU needs
CarryOut3 64 * t units of delay

3/20/201
8 arithmetic.8
Fast Add - Carry Select - review

• 4-bit Carry Select Adder


• Uses 2 4-bit ripple adder
• one adder assumes Cin = 0
• 2nd adder assumes Cin = 1
• Cin selects Sum & Cout
• Delay = 4 FA adder delays +
Mux
• Fast - expensive

3/20/201
8 arithmetic.9
16-bit Carry Select - review

• 1st 4-bits ripple – carry select not needed


• Delay = 4 FA delays + 3 MUXs
• Fast & expensive

3/20/201
8 arithmetic.10
Fast Addition : Carry Lookahead
• Carry Inputs can be precomputed by logic
c1 = g0 + c0  p0
= a0  b0 + c0  (a0 + b0)
p0 = a0 + b0 g0 = a0  b0 1 unit delay
each p, g
c2 = g1 + p1  c1
= g1 + p1  g0 + p1  p0  c0 3 units of delay
= a1  b1 + c1 (a1 + b1)
p1 = a1 + b1 g1 = a1  b1 1 unit delay

c3 = g2 + p2  g1 + p2  p1  g0 + p2  p1  p0  c0
3 units of delay
c4 = g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0 +
p3  p2  p1  p0  c0
3 units of delay
C4= func( a3, b3, a2, b2, a1, b1, a0, b0, c0)
3/20/201
8 arithmetic.11
Fast Addition: Carry Look Ahead – 4 bits
C0 = Cin
A B C-out
0 0 0 “kill”
S
2 0 1 C-in “propagate”
a0
g 1 0 C-in “propagate”
b0 p 1 1 1 “generate”
3 c1 = g0 + c0  p0
4
a1 S g = a and b 1 delay
g p = a or b
b1 p
c2 = g1 + g0  p1 + c0  p0  p1
3
a2 S 4 3 units of delay for c1, c2, c3,
g (c4)
b2 p 4 units of delay for S1, S2, S3
3 c3 = g2 + g1 p2 + g0  p1  p2 + c0  p0  p1  p2
4
a3 S G0=g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0
g
b3 p P0 = p3  p2  p1  p0

3 units of delay for G0


C4 = . . .
3/20/201
8 arithmetic.12
Carry Lookahead – 2nd level – 16 bits
Add 2nd level abstraction for more practical 4-bit units
Each Pi, Gi handles 4 bits at a time, 0-3, 4-7, 8-11,..)

2 units of delay for


P03 = p3  p2  p1  p0 ; P03, P47, P811, P1215
G03 = g3 + p3  g2 + p3  p2  g1 + p3  p2  p1  g0
C4 = G03 + C0  P03
P47 = p7  p6  p5  p4 ; P03
G47 = g7 + p7  g6 + p7  p6  g5 + p7  p6  p5  g4

P811 = p11  p10  p9  p8 ; C8 = G47 + G03 P47 + C0  P03  P47


G811 =g11 + p11  g10 + p11  p10  g9 + p11  p10  p9  g8
C12 = ….
P1215 = p15  p14  p13  p12;
G1215 =g15 + p15  g14 + p15  p14  g13 + p15  p14  p13  g12 …….

3 units of delay for


G03, G47, PG811, G1215
3/20/201
8 arithmetic.13
Fast Addition: Cascaded Carry Look-ahead (16-bit):
can be combined with CSA
C C0
L
A G0
P0
4 c4 = G03 + C0  P03 c4 has 4 units of
4-bit delay
Adder

5 c8 = G47 + G03 P47 + C0  P03  P47 c8


4-bit 5 units of delay for c8, c12, c16
Adder

5 c12 = G811 + G47P811 + G03  P147 P2811 + C0  P03  P47P 811


G c12
4-bit P
Adder

3/20/201 c16 = G1215 + G2 P3 + G1 P2  P3 + G0  P1  P2  P3 + C0  P0  P1  P2  P3


8 arithmetic.14
8-bit carry lookahead
a0
b0
S0
adder (4-bit block is also CLA)
3 S1
a1
b1 2nd level
3
a2 S2 carry
b2
3 G0
lookahead
S3
a3
b3 P0
6
4 units of delay c4= G0 + c0 P0 5
a4 S4
b4 c5= g4 + c4.p4
c5
6 S5
Delays 1 4 1
a5
b5 c6 c6 =    
a6
6 S6
b6
c7
c7 =    
6 S7 G1
a7
b7
P1
3/20/201
8 arithmetic.15
Multiply, Divide & Shift

3/20/20
18 arithmetic.16
MIPS arithmetic instructions
• Instruction Example Meaning Comments
• add add $1,$2,$3 $1 = $2 + $3 3 operands; exception possible
• subtract sub $1,$2,$3 $1 = $2 – $3 3 operands; exception possible
• add immediate addi $1,$2,100 $1 = $2 + 100 + constant; exception possible
• add unsigned addu $1,$2,$3 $1 = $2 + $3 3 operands; no exceptions
• subtract unsigned subu $1,$2,$3 $1 = $2 – $3 3 operands; no exceptions
• add imm. unsign. addiu $1,$2,100 $1 = $2 + 100 + constant;
no exceptions
• multiply mult $2,$3 Hi, Lo = $2 x $3 64-bit signed product
• multiply unsigned multu$2,$3 Hi, Lo = $2 x $3 64-bit unsigned product
• divide div $2,$3 Lo = $2 ÷ $3, Lo = quotient, Hi = remainder
• Hi = $2 mod $3
• divide unsigned divu $2,$3 Lo = $2 ÷ $3, Unsigned quotient & remainder
• Hi = $2 mod $3
• Move from Hi mfhi $1 $1 = Hi Used to get copy of Hi
• Move from Lo mflo $1 $1 = Lo Used to get copy of Lo

3/20/201
8 arithmetic.17
MULTIPLY (unsigned)
• Paper and pencil example :
Multiplicand 1000 A
Multiplier 1001 B
1000 a3b0 a2b0 a1b0 a0b0
0000 a3b1 a2b1 a1b1 a0b1
0000 a3b2 a2b2 a1b2 a0b2
1000 a3b3 a2b3 a1b3 a0b3
Product 01001000
• m bits x n bits = m+n bit product
• Binary makes it easy:
–0 => place 0 ( 0 x multiplicand)
–1 => place a copy ( 1 x multiplicand)
• 2 architectures – Fast Array MPY &
Slow Shift & Add

3/20/201
8 arithmetic.18
Fast unsigned Multiply== Array Multiplier
0 0 0 0

Abj sum in Multiplicand A A3 A2 A1 A0


B0
j
aBi
i
A3 A2 A1 A0
B1

FA
carry carry A3 A2 A1 A0
out in B2

sum out
A3 A2 A1 A0
B3
Cell delays ?
Product P P7 P6 P5 P4 P3 P2 P1 P0 Multiplier B

• Can be adapted to accomodate signed MPY


• Q: How much hardware for 32 bit multiplier? Critical
path?
3/20/201
8 arithmetic.19
Signed Multiply – Baugh-Wooley- 2’s complement
Multiplicand 1000 A = -8
Multiplier 1001 B = -7
1 10000 1 ~a3b0 a2b0 a1b0 a0b0
1000 ~a3b1 a2b1 a1b1 a0b1
1000 ~a3b2 a2b2 a1b2 a0b2
11111 1 a3b3 ~a2b3 ~a1b3 ~a0b3
Product 00111000 = +56

• ~ means bit is complimented


• extra 1 added to compensate where shown
• Complementing & addition of 1
• save complete sign extension HW every time

3/20/201
8 arithmetic.20
Fast signed Array Multiplier - Baugh-Wooley alg
0 0 0 0

Abj sum in Multiplicand A A3 A2 A1 A0


B0
j
aBi
i
A3 A2 A1 A0
B1

FA
carry carry A3 A2 A1 A0
out in B2

sum out
A3 A2 A1 A0
B3

1 P3 P2 P1 P0 Multiplier B
bj sum in
Product P FA FA FA FA 1
ai

P7 P6 P5 P4
2 cell types used, FA adders added carry
FA
carry
out in
3/20/201
8 sum out arithmetic.21
Array Multiplier - Baugh-Wooley Equations

an-1 & bn-1 are the sign bits, above equation for 4- bit
3/20/201
8
example arithmetic.22
Baugh-Wooley MPY
Example II: 1011 * 0011
1 0 1 1
1

0 0 1 1 1

0 0 1 1

0 0 1 1
0
1 1 0 0
1 0 0 0
0 0 0 0

1 1 1 0 0
0 1 1 1
0 1 1 1

1 1 1 1 0 0 0 1 = -15
arithmetic.23
Multiplication, using shift & Add
0 0 0 0 0 0 0
A3 A2 A1 A0
B0

Multiplier A3 A2 A1 A0
B1
operation
A3 A2 A1 A0
B2

A3 A2 A1 A0
B3

P7 P6 P5 P4 P3 P2 P1 P0

• At each stage shift multiplicand left ( x 2)


• Multiplier bit Bi determines : add in shifted
multiplicand
• Accumulate 2n bit partial product at each stage
3/20/201
8 arithmetic.24
Multiplication, using shift & Add
• long-multiplication approach

multiplicand
1000
multiplier
× 1001
1000
0000
0000
1000
product 1001000

Length of
product is the
sum of operand
lengths
3/20/201
8 arithmetic.25
Multiplication Hardware
using shift & Add

Initially 0

3/20/201
8 arithmetic.26
Optimized Multiplier
using shift & Add
• Perform steps in parallel: add/shift

32 – bit ALU, multiplicand

 One cycle per partial-product addition


 ok, if frequency of multiplications is low
3/20/201
8 arithmetic.27
Sift – add Multiply Algorithm Start

Product0 = 1 1. Test Product0 = 0


Product0

1a. Add multiplicand to the left half of product &


place the result in the left half of Product register

Product Multiplicand
0000 0011 0010 2. Shift the Product register right 1 bit.
1: 0010 0011 0010
2: 0001 0001 0010
1: 0011 0001 0010
2: 0001 1000 0010
1: 0001 1000 0010
2: 0000 1100 0010
1: 0000 1100 0010 32nd No: < 32 repetitions
2: 0000 0110 0010 repetition?
0000 0110 0010 Yes: 32 repetitions
Done
3/20/201
8 arithmetic.28
MIPS logical instructions
• Instruction Example Meaning Comment
• and and $1,$2,$3 $1 = $2 & $3 3 reg. operands; Logical AND
• or or $1,$2,$3 $1 = $2 | $3 3 reg. operands; Logical OR
• xor xor $1,$2,$3 $1 = $2 $3 3 reg. operands; Logical XOR
• nor nor $1,$2,$3 $1 = ~($2 |$3) 3 reg. operands; Logical NOR
• and immediate andi $1,$2,10 $1 = $2 & 10 Logical AND reg, constant
• or immediate ori $1,$2,10 $1 = $2 | 10 Logical OR reg, constant
• xor immediate xori $1, $2,10 $1 = ~$2 &~10 Logical XOR reg, constant
• shift left logical sll $1,$2,10 $1 = $2 << 10 Shift left by constant
• shift right logical srl $1,$2,10 $1 = $2 >> 10 Shift right by constant
• shift right arithm. sra $1,$2,10 $1 = $2 >> 10 Shift right (sign extend)
• shift left logical sllv $1,$2,$3 $1 = $2 << $3 Shift left by variable
• shift right logical srlv $1,$2, $3 $1 = $2 >> $3 Shift right by variable
• shift right arithm. srav $1,$2, $3 $1 = $2 >> $3 Shift right arith. by variable

3/20/201
8 arithmetic.29
How shift instructions are implemented
Two kinds:

logical-- value shifted in is always "0" shift right logical


"0" msb lsb "0" by 2
1100 1011

arithmetic-- on right shifts, sign extend


msb lsb "0" shift right arithmetic
by 2
1011  1110

instruction can request 0 to 32 bits to be shifted!

3/20/201
8 arithmetic.30
ARM :: Barrel Shifter:

Operand Operand – Shift value can be either be:


1 2 • 5 bit unsigned integer
• Specified in bottom byte of another
register.
Barrel Example: ADD r0, r1, r2, LSL#7
Shifter
• Semantics: r2 is shifted left by 7 & then
added to r1

ALU

Result

3/20/201
2/1
8 arithmetic.31
Barrel Shifter, used in ICs
Shift Right using one transistor per switch
SR3 SR2 SR1 SR0
D3

D2
A6

D1
A5

D0
A4

A3 A2 A1 A0
3/20/201
8 arithmetic.32
Barrel Shifter, used in ICs
Shift ……Left & right
SL 1 SL 2 SL3
SR2 SR1 SR0
D3

D2

A5

D1

A4
D0

A3

A2 A1 A0

arithmetic.33
Summary: Multiply & Shift
• Multiply: successive refinement to see final design
– 32-bit Adder, 64-bit shift register, 32-bit Multiplicand Register
• Fast multiply  Array multiplier

• barrel shifter  fast efficient shifting


• Can be multi-level to save transistors eg
– Shift by 16 or 0
– Shift by 8 or 0
– Shift by 4 or 0
– Shift by 2 or 0
– Shift by 1 or 0

3/20/201
8 arithmetic.34
Multilevel shifting – Shift right logical –
5 shift levels for 32-bit ALU
Shift 16 or 0 Shift 8 or 0 Shift 4 or 0 Shift 2 or 0 Shift 1 or 0
A31 X31
X31 Y31 Z31 M31 D31
“0” “0” “0” “0” “0”

A30
X30
Y30 Z30 M30
D30 Each Mux is 2
“0”
CMOS
A29
X29 transistors
“0”

Total
transistor
count = 5 * 32
*2 = 320
A1 X1
X1 Y1 D1
A17 X9

A0 X0 Y0 Z0 M0
X0 Y0 Z0 M0 D0
A16 X8 Z2 M1
Y4

3/20/201
8 arithmetic.35
Floating Point Arithmetic

• How to represent
– numbers with fractions, e.g., 3.1416
– very small numbers, e.g., .000000001
– very large numbers, e.g., 3.15576  109

• Fixed point
• Floating point: a number system with floating
decimal point
• Normalized numbers: no leading 0’s , single
digit before decimal point
9
1.0 x 10 9
3.1557 x 10
35
0.03
3/20/201
8 arithmetic.36
Floating Point Notation – IEEE 754 FP
decimal point exponent
Sign, magnitude
23 -24
6.02 x 10 1.673 x 10

Mantissa radix (base)


Sign, magnitude
e - 127
IEEE F.P. ± 1.M x 2
• notes:
– Can represent very large & very small numbers
– Arithmetic (+, -, *, / )
– Representation, Normal form
– Range and Precision, Single, Double
– Rounding
– Exceptions (e.g., divide by zero, overflow, underflow)
3/20/201
8 arithmetic.37
Floating-Point Arithmetic
Floating point numbers in IEEE 754 standard:
1 8 23
single precision
sign SS Exp
Exp Mantissa
Mantissa
exponent: mantissa:
excess 127 sign + magnitude, normalized
actual exponent is binary integer binary significand w/ hidden
e = E - 127 integer bit: 1.M
0 < E < 255
S E-127 127
N = (-1) 2 (1.M)
0 = 0 00000000 0 . . . 0 -1.5 = 1 01111111 10 . . . 0

Numbers that can be represented is in the range:


-126 127
2 (1.0) to 2 (2 - 2 -23 )

Double Precision IEEE 754 [64-bits]


Exponent = 11 bits, Bias = 1023, Mantissa = 52, Sign= 1bit
3/20/201
8 arithmetic.38
Floating-Point Example
( 1) s  (1  significant )  2 (exp.  bias)
 bias = 127 for 32-bit word
 S = 1: negative
0: positive or zero
• Example –0.75
– –0.75 = (–1)1 × 1.12 × 2–1
– S=1
– Fraction = 1000…002
– Exponent = –1 + Bias = 126
• Single: –1 + 127 = 126 = 011111102
• Double: –1 + 1023 = 1022 = 011111111102

• Single: 1011111101000…00
• Double: 1011111111101000…00

3/20/20
18 arithmetic.39
Exponent Bias used to simplify comparisons

• +127 added to each exponent – eg for 2-1


exponent written -1 + 127 = 126

0000 0000 1111 1111


most negative most positive
exponent exponent

3/20/201
8 arithmetic.40
Floating-Point Addition – Decimal
Key point – Make exponents equal

• 4-digit example
9.999 × 101 + 1.610 × 10–1
1. Align decimal points
Shift number with smaller exponent
9.999 × 101 + 0.016 × 101
2. Add significands
9.999 × 101 + 0.016 × 101 = 10.015 × 101
3. Normalize result & check for over/underflow
1.0015 × 102
4. Round and renormalize if necessary
1.002 × 102

3/20/201
8 arithmetic.41
Floating-Point Addition – Binary
Key point – Make exponents equal

• 4-bit example
1.0002 × 2–1 + –1.1102 × 2–2 (0.5 + –0.4375)
1. Align binary points
Shift number with smaller exponent
1.0002 × 2–1 + –0.1112 × 2–1
2. Add significands
1.0002 × 2–1 + –0.1112 × 2–1 = 0.0012 × 2–1
3. Normalize result & check for over/underflow
1.0002 × 2–4, with no over/underflow
4. Round and renormalize if necessary
1.0002 × 2–4 (no change) = 0.0625

Chapter 3 — Arithmetic for Computers — 42


3/20/201
8 arithmetic.42
Floating pt. ADD Algorithm
Compare exp, shift smaller no. 9.75
right until exponents same
0.5625
X2 after shift 0.00010010000000000000000
Add mantissas
sum 1.01001010000000000000000 10.3125
Normalize sum by shifting right
and incr exponent or Normalized already
Shift left & decr exponent

Over
flow / exception
underflo

round mantissa

N Normal Y
DONE
?

3/20/201
8 arithmetic.43
Floating Point Addition Summary

• Step 1: align, round


• Step 2: add
• Step 3: normalize, check overflow or underflow
• Step 4: round
1
• Example: 9.99 ten  10  1610
. ten  10

3/20/201
8 arithmetic.44
Floating Point Multiplication

• Step 1: add exponents, subtract bias, Mpy


mantissas
• Step 2: normalize and check over/underflow
• Step 3: round
• Step 4: check sign
• Example: .  (  0.4375)
05
• Extra bias due to Step 1 must be subtracted

3/20/201
8 arithmetic.45
FP Adder Hardware

• more complex than integer adder


• Doing it in one clock cycle - takes too long
– Much longer than integer operations
– Slower clock would penalize all instructions
• FP adder usually takes several cycles
– pipelined

3/20/201
8 arithmetic.46
FP Adder Hardware

Exponents
compared

Step 1

Smaller number
shifted right

Step 2
Result iterated
until normalized
Step 3

Step 4

3/20/201
8 arithmetic.47
Floating Point: Overflow & Underflow

• Exponent too large to be represented

• Underflow: negative exponent too small to fit in


exponent field

3/20/201
8 arithmetic.48
Summary of Floating Point Arithmetic

• IEEE floating point standard 32 bit and 64 bit


• Converting decimal numbers to floating point and
vice versa
• Overflow and underflow
• Floating point add and multiply

3/20/201
8 arithmetic.49

Das könnte Ihnen auch gefallen