Beruflich Dokumente
Kultur Dokumente
Subsystem Design
YLAND BA
L
1966
U M B C
AR
Datapath
Memory
Control
UMBC
Datapath: The core -- all other components are support units that store
either the results of the datapath or determine what happens in the next
cycle.
Input-Output
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
Control:
A FSM (sequential circuit) implemented using random logic, PLAs or
memories.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
Registers
Data-In
UMBC
Bit 0
Bit 4
Bit 3
Bit 2
Bit 1
Data-Out
Multiplexer
Shifter
Adder
However, as we will see, the task is non-trivial since there are multiple
equivalent logic and circuit topologies to choose from, each with adv./
disadv. in terms of speed, power and area.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
0
0
0
0
1
1
1
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0
0
0
0
0
0
1
1
G(A.B)
UMBC
Ci
B
0
0
1
1
1
1
1
1
P(A+B)
0
0
1
1
1
1
0
0
0
0
0
1
0
1
1
1
Co
delete
delete
propagate
propagate
propagate
propagate
generate
generate
Carry status
0
1
1
0
1
0
0
1
P(A + B) Sum
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
D(A.B): (delete)
Ensures that a carry bit will be deleted at Co.
P(A+B): (propagate)
Indicates that Ci is propagated (passed) to Co.
pendent of Ci).
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
S0
FA
B0
Co,0
=Ci,1
B1
S1
FA
A1
Co,1
S2
FA
A2
B2
Co,2
S3
FA
A3
Co,3
UMBC
The critical path (worst case delay over all possible inputs) is a ripple from
lsb to msb.
Ci,0
A0
Ripple-carry adder:
B3
S(G, P) = P XOR C i
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
Ci
S
FA
Co
Ci
FA
Co
The inverting property of a full adder can be used to achieve this goal:
B
B
A
A
Note that when optimizing this structure, it is far more important to optimize
tcarry than tsum.
One possible worst case bit pattern (from lsb to msb) is:
A: 00000001; B: 01111111
Convince yourself that this is true.
where tcarry and tsum are the propagation delays from Ci to Co & S.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
A
B
P XOR Ci
Ci.P(A + B)
Ci
A
B
A
B
Ci
G(A.B)
Co
MO
UN
TI
RE COUNT
Y
Ci
A
B
Ci
Co
Ci
Ci
Ci
Symmetrical
A design
eliminates
diffusion
B caps and
reduces
Ci series R.
YLAND BA
L
1966
U M B C
AR
UMBC
IVERSITY O
F
Subsystem Design
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
Cin
Subtract
S<0>
C<3>
S<0>
S<1>
S<2>
S<3>
UMBC
This version increases Cos load to 4 diffusion caps, 2 internal (sum) gate caps
B<0>
A<0>
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
1966
result
UMBC
11
YLAND BA
L
U M B C
AR
In this case, you want equal Sum and Carry delays in order to minimize clock
cycle time.
Clk
Cout
augand C
in
Clk
Set
Clr
Reg
1-bit
MO
UN
TI
RE COUNT
Y
Subsystem Design
1966
XOR
XNOR
Co
UMBC
12
Note: S and Co delay times are approximately equal -- good for multipliers.
YLAND BA
L
U M B C
AR
Ci
IVERSITY O
F
Total transistors is 26
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
YLAND BA
L
1966
U M B C
AR
UMBC
A0
A0
B0
B0
Ci0
B1
A1
B1
Ci
A1
Ci1
Ci2
13
A0
A1
B0
B1
Subsystem Design
MO
UN
TI
RE COUNT
Y
Ci0
A0
B0
B1
S0
S1
Ci0
Ci1
Ci1
A1
Subsystem Design
1966
G0
Co,0
3.5
2.5
P1
G1
3
2.5
Co,1
P2
2
G2
Co,2
2.5
1.5
P3
G3
Co,3
1.5
P4
1
G4
1.5
Co,4
carry, Ci,0 and the previous propagate signals are high, P0 to Pk-1.
Transistor sizes largest here since worst case is to discharge all nodes Co,k.
3.5
P0
UMBC
14
Only 4 diffusion capacitances are present per node but the distributed RCnature of the chain results in delay that is quadratic with number of bits.
Buffers and/or transistor sizing can be used to improve performance.
YLAND BA
L
U M B C
AR
Ci,0
IVERSITY O
F
Co,4
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
1966
C2
R3
C3
R4
C4
R5
C5
R6
C6
Out
UMBC
15
Note that reducing R by a factor, e.g. k, at each stage increases the capacitance
by a factor k and increases area.
A k-factor of 1.5, reduces delay by 40% and increases area by 3.5X.
contribution.
YLAND BA
L
U M B C
AR
C1
R2
t p = 0.69 C i R j
i = 1 j = 1
R1
MO
UN
TI
RE COUNT
Y
YLAND BA
L
1966
U M B C
AR
Co,0
P1
FA
G1
Co,1
FA
G2
Co,2
FA
P3
Co,3
Co,3
BP = P0P1P2P3
G3
occurred.
16
In other words:
if (P0P1P2P3 == 1) then Co,3 = Ci,0 else either DELETE or GENERATE
high.
In this case, an incoming carry Ci,0 = 1, propagates along the com-
Assume Ak and Bk (for k = 1...3) are set such that all Pk (propagate) are
FA
G0
UMBC
Ci,0
P0
P2
Subsystem Design
IVERSITY O
F
Mux
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
Co,k+3
Sum Generation
Carry vector
Mux
1-carry propagation
0-carry propagation
UMBC
17
Co,k-1
P,G
Setup
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
UMBC
18
Note that the low-order terms, e.g., P0 and G0, appear in the expression for
U M B C
AR
Pk = Ak + Bk
MO
UN
TI
RE COUNT
Y
Subsystem Design
1966
UMBC
19
C0,3
G0
G1
G2
Size and fan-in of the gates limit the size to about four.
YLAND BA
L
U M B C
AR
P3
P2
P1
P0
Ci,0
IVERSITY O
F
G3
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
G<0>
YLAND BA
L
1966
U M B C
AR
UMBC
Ci,0
P<0>
20
G<1>
P<2>
Clk
Clk
G<2>
P<3>
MO
UN
TI
RE COUNT
Y
G<3>
C<3>
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
21
The number of logic levels is proportional to log2N, fan-in is limited and the
Inverse
binary tree
Forward
binary tree
(C4-7,P4-7)
)is defined as: (g, p) . (g, p) = (g + pg, pp)
Co,7
Co,5
(G6, P6)
(G7, P7)
Co,3
Co,4
Co,6
Co,1
Co,2
(G4, P4)
(G5, P5)
(G2, P2)
(G3, P3)
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
B=A
UMBC
22
B<3>
A<3>
B<2>
A<2>
B<1>
A<1>
B<0>
A<0>
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
1966
Clk
Q
Q<0>
Q<1>
Q<2>
UMBC
23
Q<3>
Q<3>
YLAND BA
L
U M B C
AR
Clk
Subsystem Design
MO
UN
TI
RE COUNT
Y
1966
Clear
Clk
Clear
D
Q
1-bit
Reg
Q<0>
0
1
Clk
Clear
D
Q
1-bit
Reg
0
1
Clk
Clear
D
Q
1-bit
Reg
Q<2>
0
1
Clk
Clear
D
Q
1-bit
Reg
UMBC
24
Q<3>
YLAND BA
L
U M B C
AR
Clk
IVERSITY O
F
Q<1>
Subsystem Design
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
1966
UMBC
25
YLAND BA
L
U M B C
AR
Multipliers may be classified by the format in which data words are accessed:
Serial
Serial/parallel
Parallel
1100
0000
1100
0000
0111100
1100
X 0101
MO
UN
TI
RE COUNT
Y
Subsystem Design
YLAND BA
L
1966
U M B C
AR
P7
Y =
X =
IVERSITY O
F
Y j2 j
X i2i
P5
X2
Y2
X1
Y1
X0
Y0
P4
P3
P2
P1
26
P0
X i2i
j=0
n1
Y j2 j =
k=0
m + n 1
Multiplicand
Multiplier
i=0
X3
Y3
P = XY =
m1
Pk 2 k
UMBC
P6
X3Y2
X3Y3 X2Y3
j=0
n1
i=0
m1
MO
UN
TI
RE COUNT
Y
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
Co
Multiplication
IVERSITY O
F
B
X
27
Ci
and Yi.
MO
UN
TI
RE COUNT
Y
YLAND BA
L
1966
U M B C
AR
P7
UMBC
P5
FA
FA
P6
X2
P4
FA
X1
FA
FA
X3
X2
X3
HA
tmult = (M-1)+(N-2)tcarry
X3
+ (N-1)tsum + tand
IVERSITY O
F
X3
28
P3
HA
X0
FA
X1
FA
X2
Y3
X2
X0
FA
X1
P2
HA
Subsystem Design
MO
UN
TI
RE COUNT
Y
Y1
P0
Y0
P1
HA
X0
X0
Y2
X1
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
i=0
i=0
m2
bi 2i
ai 2i
UMBC
= am 1 bn 1
2m + n 2 +
i=0j=0
m 2n 2
29
ai b j 2 i + j
i=0
m2
ai bn 1
i=0
n2
am 1 bi 2 m 1 + i
2n 1 + i
m2
n2
1
i
n
1
P = am 1 2
+ ai 2 bn 1 2
+ b i 2 i
i=0
i=0
B = bm 1 2 m 1 +
A = am 1 2 m 1 +
This is in contrast with the adder where minimizing tcarry was key.
MO
UN
TI
RE COUNT
Y
1966
YLAND BA
L
U M B C
AR
RE COUNT
Y
UMBC
a7 a
7
AND
( a7 b0 )
a6 a
6
AND
a5 a
5
AND
AND
ADD
AND
ADD
( a5 b0 )
( a6 b0 )
AND
( a7 b1 )
a4
AND
( a4 b0 )
AND
ADD
a4
a3 a
3
AND
( a3 b0 )
AND
ADD
a2 a
2
AND
( a2 b0 )
AND
ADD
a1
AND
( a1 b0 )
AND
ADD
a1 a0
AND
( a0 b0 )
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
( a0 b1 )
( a1 b1 )
( a2 b1 )
( a3 b1 )
( a4 b1 )
( a5 b1 )
( a6 b1 )
AND
( a7 b2 )
( a0 b2 )
( a1 b2 )
( a2 b2 )
( a3 b2 )
( a4 b2 )
( a5 b2 )
( a6 b2 )
30
( a4 b3 )
( a5 b3 )
( a6 b3 )
( a3 b3 )
AND
ADD
AND
ADD
AND
ADD
AND
ADD
( a0 b3 )
( a1 b3 )
( a2 b3 )
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
( a0 b4 )
( a1 b4 )
( a2 b4 )
( a3 b4 )
( a4 b4 )
( a5 b4 )
( a6 b4 )
( a3 b5 )
( a4 b5 )
( a5 b5 )
( a6 b5 )
( a2 b5 )
AND
ADD
AND
ADD
AND
ADD
( a0 b5 )
( a1 b5 )
AND
ADD
AND
ADD
( a0 b6 )
( a1 b6 )
( a2 b6 )
( a3 b6 )
( a4 b6 )
( a5 b6 )
( a6 b6 )
( a0 b7 )
( a1 b7 )
( a2 b7 )
( a3 b7 )
a0
b0 b0
b1 b1
P0
b2 b2
P1
b3 b3
P2
b4 b4
P3
b5 b5
P4
b6 b6
P5
b7 b7
P6
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
( a4 b7 )
( a5 b7 )
( a6 b7 )
P8
( a7 b7 )
P14
P13
P12
P11
P10
P9
Subsystem Design
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
( a7 b3 )
AND
ADD
AND
ADD
AND
ADD
AND
( a7 b4 )
AND
( a7 b5 )
AND
( a7 b6 )
AND
( a7 b7 )
ADD
ADD
ADD
ADD
ADD
ADD
ADD
ADD
( a7 b7 )
ADD
ADD
IVERSITY O
F
MO
P15
UN
TI
P7
Subsystem Design
YLAND BA
L
1966
U M B C
AR
FA
HA
FA
FA
Vector-merging adder
FA
FA
HA
FA
FA
HA
HA
4x4 version
Advantage:
Critical path is uniquely defined:
tmult = (N-1)tcarry + tand + tmerge
FA
HA
UMBC
31
Here the carry bits are not immediately added but rather saved for the
next adder stage.
HA
HA
IVERSITY O
F
HA
MO
UN
TI
RE COUNT
Y
reset
If area
is a
concern.
X
Y
G1
G2
Cin
Clk
1966
UMBC
32
YLAND BA
L
U M B C
AR
P7
P0
serial register
IVERSITY O
F
Reg
1-bit
Subsystem Design
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
j=0
j
Y j 4 with ( Y j { 2, 1, 0, 1, 2 } )
UMBC
33
Y =
(N 1) 2
Radix-4 scheme:
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
Ci
Ci
Ci
Ci
Y5
Y4
Y3
Ci-1
Ci-1
Ci-1
Y2
34
Ci
FA
Y1
Sum
FA
FA
Y2
Y3
Ci-1
Ci-1
FA
Y5
Y0
Slice of a 6-bit
carry-save mult.
Ci
Ci
FA
FA
FA
FA
Y1
UMBC
Y0
Y4
MO
UN
TI
RE COUNT
Y
YLAND BA
L
1966
U M B C
AR
UMBC
H3
Mux
Right/Left
IR
IVERSITY O
F
A3
MO
UN
TI
RE COUNT
Y
H2
Mux
A2
35
Subsystem Design
H0
Mux
A0
IL
H1
Mux
A1
IVERSITY O
F
s<1>
s<0>
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
36
result
l<3:0>
l<4:1>
l<5:2>
l<6:3>
shift
1
2
4
8
r<0>
r<1>
r<2>
r<3>
MO
UN
TI
RE COUNT
Y