Sie sind auf Seite 1von 13

CMU 18-447 S09 L2-1 2009 J. C.

Hoe

18-447 Lecture 2: Computer Arithmetic: Adders

James C. Hoe Dept of ECE, CMU January 14, 2009
Announcements: No class on Monday Verilog Refresher next Wednesday Review P&H Ch 3 Handouts: Lab 1 and HW1 will be posted on Blackboard this weekend

CMU 18-447 S09 L2-2 2009 J. C. Hoe

Let bn-1bn-2b2b1b0 represent an n-bit unsigned integer n 1 weight of the ith digit
its value is

2 b
i i =0

value of the ith digit

a finite representation between 0 and 2n-1 e.g., 1011two = 8ten + 2ten + 1ten = 11ten (more commonly rewritten as b1011=11)

Often written in Hex for easier human consumption

to convert, starting from the LSB, map 4 binary digits at a time into a corresponding hex digit; and vice versa e.g., 1010_1011two=ABhex

For converting between binary and decimal, memorize decimal values of 20 ~ 210, and remember 210 is about 1000.

Let bn-1bn-2b2b1b0 represent an n-bit signed integer

its value is

2 n 1 bn 1 + 2i bi
i =0

n2

a finite representation between -2n-1 and 2n-1 -1 e.g., assume 4-bit 2s-complement b1011 = -8 + 2 + 1 = -5 b1111 = -8 + 4 + 2 + 1 = -1

To negate a 2s-complement number

add 1 to the bit-wise complement assume 4-bit 2s-complement (- b1011) = b0100 + 1 = b0101 = 5 (- b0101) = b1010 + 1 = b1011 = -5 (- b1111) = b0000 + 1 = b0001 = 1 (- b0000) = b1111 + 1 = b0000 = 0

Intuition: a 4-bit example

b1111=15 b1111=-1,15 b1110=14 b1110=-2,14 b1101=13 b1101=-3,13 b1100=12 b1100=-4,12 b1011=11 b1011=-5,11 b1010=10 b1010=-6,10 b1001=9 b1001=-7,9

CMU 18-447 S09 L2-4 2009 J. C. Hoe

b0000=0 (as 2s),0 unsigned) (as unsigned) b0001=1 b0001=1,1 b0010=2 b0010=2,2

b0111=7 b0111=7,7 b1000=8 b1000=-8,8

how to add two numbers what it means to overflow the number representation how to negate a number Yes, 0 is a positive number in CS

Unsigned numbers

CMU 18-447 S09 L2-5 2009 J. C. Hoe

pad the left with as many 0s as you need (aka 0-extension) e.g. 4b1111 8b0000_1111

2s-complement numbers
positive: pad the left with as many 0s as you need negative: pad the left with as many 1s as you need e.g. 4b1111 8b1111_1111 4b1110 8b1111_1110 or generically, pad the left with the same value as the original sign-bit as many times as necessary (aka signedextension) What about converting from larger to smaller representation?

CMU 18-447 S09 L2-6 2009 J. C. Hoe

Long Hand b3 a3 b2 a2 b1 a1 b0 a0

c4 carry?

c3

c2

c1

s3

s2

s1

s0

a b
cin a 0 0 1 1 0 0 1 1 b 0 1 0 1 0 1 0 1 0 cout 0 0 0 1 0 1 1 1 s 0 1 1 0 1 0 0 1

cout

FA
s

cin

0 0 0 1 1 1 1

a, b, cin are functionally indistinguishable as inputs

b3 a3 b2 a2 b1 a1 b0 a0

a b co s c2 i

a b co s c1 i

a b co s ci 0

s3

s2

s1

s0

Could use a half-adder, but lets wait

b3 a3 b2 a2 b1 a1 b0 a0

CMU 18-447 S09 L2-9 2009 J. C. Hoe

a b c4 carry? co s ci

a b co s ci

a b co s ci

a b co s ci 0

s3

s2

s1

s0

overflow?= (a3b3) ? 0 : (a3s3) - cant overflow when adding a pos. and a neg. number - if 2 pos. numbers yield a neg. number V; vice

2s-Complement Subtraction

Subtracting is like adding the negative Negation is easy in a 2s-complement representation A B

bit-wise complement

c v S

cin

sub?

CMU 18-447 S09 L2-11 2009 J. C. Hoe

Size/Complexity:
n x SizeOf( Full Adder )

O(n)

Critical Path Delay: O(n)

n x DelayOf( Full Adder ) n x 2 gate delays (assuming 2-level SOP is used)

bn-1an-1 cout ab co ci s sn-1

b1 a1 ab co ci s s1

b0 a0 ab co ci cin s s0

18-447 Simplified Timing

clock period chosen to be greater than worst-case Tpd

CMU 18-447 S09 L2-12 2009 J. C. Hoe

Global Clock

Sync Sync

comb comb

comb comb

Sync Sync

Sync
registers latch new value

comb

comb

Sync
No more change to Q

CMU 18-447 S09 L2-13 2009 J. C. Hoe

Intel P4 is designed around a clock period that is twice the 16-bit adder latency Using a rough estimation gate delay 0.5 ns-per-micron x feature-size a 90nm process has gate delay = 45ps If Intel used a ripple-carry adder then P4 should be running ~ 1/ (2x2x16x45ps) = 347MHz Alternatively speaking, 3GHz P4 would have to add 2 16-bit numbers in ~4 gate delays

CMU 18-447 S09 L2-14 2009 J. C. Hoe

How to reduce the carry-propagation delay? Remember, long-hand is how most of us add, but not the only way Can we compute an intermediate carry signal without first computing the earlier ones
e.g., let cm (or sm) be a function of am....a0 and bm....b0 c2 =(a1a0b0)+(a1a0c0)+(a1b0c0)+(b1a0b0)+(b1a0c0) +(b1b0c0)+(a1b1) Complexity grows exponentially in n exponential isnt too bad for small ns b1 a1 b0a0 gate delay is 2, independent of n bn-1an-1 true for small ns

cn-1 sn-1 s1 s0

A15:8 B15:8 A7:0 B7:0

cin

S15:8

Multi-Stage CSA
c

CMU 18-447 S09 L2-16 2009 J. C. Hoe

c

c

c s15:12

s11:8

s7:4

s3:0

cost=(2k-1)/knFAsize + muxs --- for k-stage delay=nFAdelay/k +(k-1)mux-delay k=4,n=16 ~8 gate-delay + 3 mux-delay k=8,n=16 ~4 gate-delay + 7 mux-delay k=16,n=16 ~2 gate-delay + 15 mux-delay

Variable-Length CSA
c

CMU 18-447 S09 L2-17 2009 J. C. Hoe

c

a0b0 FA
c

FA

c s15:10

s9:5

s4:2

s1

s0

-doubles the cost -delay set by the longest adder stage, grows by O(n1/2) with careful critical path tuning Can we have cut-down the carries without 2x cost?

Carry Generate and Propagate

cin 0 0 0 0 1 1 1 1

CMU 18-447 S09 L2-18 2009 J. C. Hoe

a 0 0 1 1 0 0 1 1

b 0 1 0 1 0 1 0 1

cout 0 0 0 1 0 1 1 1

a 0 0 1 1

b 0 1 0 1

cout 0 cin cin 1

If ab then cout is 1 regardless of cin (carry generate) if ab then cout is the same as cin (carry propagate) gi=ai bi pi=aibi local decisions based on ai and bi only

Given gi=ai bi pi=aibi ci+1= gi + (pi ci) Thus c1=g0+(p0c0) c2=g1+(p1c1)=g1+(p1(g0+(p0c0)))=g1+p1g0+p1p0c0 c3=g2+p2g1+p2p1g0+p2p1p0c0 c4=g3+p3g2+p3p2g1+p3p2p1g0+p3p2p1p0c0 and so on -We can compute cn in O(log n) gate delay and O(n2) size, only manageable for small n -Given cn we can compute sn for a constant additional delay

CMU 18-447 S09 L2-19 2009 J. C. Hoe

Given c1=g0+(p0c0) c2=g1+(p1c1)=g1+(p1(g0+(p0c0)))=g1+p1g0+p1p0c0 c3=g2+p2g1+p2p1g0+p2p1p0c0 c4=g3+p3g2+p3p2g1+p3p2p1g0+p3p2p1p0c0 G As a 4-arity group G4=g3+p3g2+p3p2g1+p3p2p1g0 P4=p3p2p1p0 g
n-1:0

CMU 18-447 S09 L2-20 2009 J. C. Hoe

pn-1:0

CLAn

Pn Gn

p[15:12] g[15:12] c[15:12] p[11:8] g[11:8] c[11:8] p[7:4] g[7:4] c[7:4] p[3:0] g[3:0] c[3:0]

CLA4
G3 P3 Cin16 G2

CLA4
P2 Cin8 G1

CLA4

CLA4
P0 Cin0

P1 Cin4 G0

CLA16

CLA4

G16

P16

Cin0

Computing Individual Carries

Example: 8-bit, 2-ary CLA
g0 p0 g1 p1

=generate for bits 1~0

g2 p2 g3 p3 g4 p4 g5 p5

cin0 = Cin cin1 = g0 + p0Cin cin2 = G10 + P10Cin cin4 = G30 + P30Cin Cout= G70 + P70Cin
G70

P70

g6 p6 g7 p7

cin3 = g2 + p2cin2 cin5 = g4 + p4cin4 cin6 = G54 + P54cin4 cin7 = g6+p6(G54 + P54cin4)

AIV
P
G+Pc

AIV
P
G+Pc

G P
G+Pc

G P
G+Pc

AI BI CLAn/4
G cin

CLAn/4
G

cout

SIV

SIII

SII

SI

CMU 18-447 S09 L2-24 2009 J. C. Hoe

O(n) size, O(n) delay

O(n) size, O(n0.5) delay