Chap8 1

IVERSITY O
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
Datapath
Memory
Control
UMBC
(December 11, 2000 3:44 pm)
Datapath: The core -- all other components are support units that store
either the results of the datapath or determine what happens in the next
cycle.
Input-Output
Digital Device Components

A simple processor illustrates many of the basic components used in any digital system:
Principles of VLSI Design
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
UMBC
(December 11, 2000 3:44 pm)
Interconnect and Input-Output:

Parasitic resistance, capacitance and inductance affects performance of
wires both on and off the chip.
Growing die size increases the length of the on-chip interconnect,
increasing the value of the parasitics.
Control:
A FSM (sequential circuit) implemented using random logic, PLAs or
memories.

Memory:
A broad range of classes exist determined by the way data is accessed:
Read-Only vs. Read-Write
Sequential vs. Random access
Single-ported vs. Multi-ported access
Or by their data retention characteristics:
Dynamic vs. Static
Stay tuned for a more extensive treatment of memories.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
Registers
Data-In
UMBC
(December 11, 2000 3:44 pm)
Bit 0
Bit 4
Bit 3
Bit 2
Bit 1
Data-Out
Multiplexer
Shifter
Adder
Also, optimizations focused at one design level, e.g., sizing transistors,

leads to inferior designs.
Bit-sliced organization
Control
is common for datapaths.
However, as we will see, the task is non-trivial since there are multiple
equivalent logic and circuit topologies to choose from, each with adv./
disadv. in terms of speed, power and area.

Datapath elements include adders, multipliers, shifters, BFUs, etc.
The speed of these elements often dominates the overall system performance so optimization techniques are important.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
0
0
0
0
1
1
1
1
0
1
0
1
0
1
0
1
0
0
1
1
0
0
1
1
0
0
0
0
0
0
1
1
G(A.B)
UMBC
Ci
B
0
0
1
1
1
1
1
1
P(A+B)
0
0
1
1
1
1
0
0
0
0
0
1
0
1
1
1
Co
delete
delete
propagate
propagate
propagate
propagate
generate
generate
Carry status
(December 11, 2000 3:44 pm)
0
1
1
0
1
0
0
1
P(A + B) Sum
Lets start with some basic definitions before considering optimizations:
Optimizations can be applied at the logic or circuit level.

Logic-level optimization try to rearrange the Boolean equations to produce
a faster or smaller circuit, e.g. carry look-ahead adder.
Circuit-level optimizations manipulate transistor sizes and circuit topology
to optimize speed.
Datapath Operators: Addition/Subtraction

Lets start with addition, since it is a very common datapath element and
often a speed-limiting element.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
UMBC
Carry = A.B + A.Ci + B.Ci

5
(December 11, 2000 3:44 pm)
Sum = A.B.Ci + A.B.Ci + A.B.Ci + A.B.Ci = A XOR B XOR C
The Boolean expressions for S and Co are:
D(A.B): (delete)
Ensures that a carry bit will be deleted at Co.
P(A XOR B): (propagate)

Used in some adders for the P term since it can be reused to generate the
sum term.
P(A+B): (propagate)
Indicates that Ci is propagated (passed) to Co.
pendent of Ci).

G(A.B): (generate)
Occurs when a Co is internally generated within the adder (occurs inde-
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
S0
FA
B0
Co,0
=Ci,1
B1
S1
FA
A1
Co,1
S2
FA
A2
B2
Co,2
S3
FA
A3
Co,3
UMBC
(December 11, 2000 3:44 pm)
The critical path (worst case delay over all possible inputs) is a ripple from
lsb to msb.
Ci,0
A0
Ripple-carry adder:
B3
CMPE 413/CMSC 711
(Also, Co and S can be expressed in terms of delete (D)).
Note that G and P are INdependent of Ci.
S(G, P) = P XOR C i
Co(G, P) = G + PC i (or P in this case).

But S and Co can be written in terms of G and P:
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
UMBC
Ci
S
FA
Co
Ci
(December 11, 2000 3:44 pm)
FA
Co
The inverting property of a full adder can be used to achieve this goal:
B
B
A
A
Note that when optimizing this structure, it is far more important to optimize
tcarry than tsum.
One possible worst case bit pattern (from lsb to msb) is:
A: 00000001; B: 01111111
Convince yourself that this is true.
where tcarry and tsum are the propagation delays from Ci to Co & S.

The delay in this case is proportional to the number of bits, N, in the input
words:
tadder = (N - 1)tcarry + tsum
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
A
B
P XOR Ci
Ci.P(A + B)
Ci
A
B
A
B
Ci
G(A.B)
Co
CMPE 413/CMSC 711
(December 11, 2000 3:44 pm)
Transistor level diagram uses

32 transistors.
(see Weste and Eshraghian).
One possible (un-optimized) implementation:
Co(A, B, Ci) = Co(A, B, Ci)

Thus,
S(A, B, Ci) = S(A, B, Ci)
MO
UN
TI
RE COUNT
Y
Ci
A
B
Ci
Co
Ci
Sum = A.B.Ci + (A + B + Ci)Co
Ci
Ci
Symmetrical
A design
eliminates
diffusion
B caps and
reduces
Ci series R.
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
UMBC
(December 11, 2000 3:44 pm)
closest to the output and symmetrical design, this implementation is slow.
Are the n and p trees duals

Co
28 transistors
of each other?
Even with some design tricks, e.g., transistors on the critical path, Ci placed
IVERSITY O
F
Subsystem Design

Co is reused in the S term as:
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
Cin
Subtract
Eliminates the inverter delay per bit for carry!
S<0>
C<3>
S<0>
S<1>
S<2>
S<3>
UMBC
plus the 6 (next bit) gate caps.

10
(December 11, 2000 3:44 pm)
This version increases Cos load to 4 diffusion caps, 2 internal (sum) gate caps
B<0>
A<0>
tances (inverter) and 6 (next bit) gate capacitances:

Overflow
C<n+1>
B<n>
S<n>
A<n>
B<3>
C<n> Sign of
A<3>
C<3> the result
B<2>
B<3>
S<3>
A<2>
A<3>
B<1>
B<2>
S<2>
A<1>
A<2>
B<0>
B<1>
S<1>
A<0>
A<1>

The load capacitance in previous version on Co consists of 2 diffusion capaci-
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
1966
result
UMBC
11
(December 11, 2000 3:44 pm)
Bit-level pipelining can be used to break the dependency between addition

time and the number of bits by inserting FAs between each register bit.
YLAND BA
L
U M B C
AR
n bit shift register
CMPE 413/CMSC 711
In this case, you want equal Sum and Carry delays in order to minimize clock
cycle time.
Clk
Cout
augand C
in
n bit shift register

addend
Clk
Set
Clr
Reg
1-bit

Serial addition can be used if area is a concern:
MO
UN
TI
RE COUNT
Y
Subsystem Design
1966
XOR
XNOR
Co
CMPE 413/CMSC 711
UMBC
12
(December 11, 2000 3:44 pm)
See Weste and Eshraghian for an 18 transistor implementation.
Note: S and Co delay times are approximately equal -- good for multipliers.
YLAND BA
L
U M B C
AR
Ci
IVERSITY O
F
Total transistors is 26

Transmission-gate Adder:
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
YLAND BA
L
1966
U M B C
AR
UMBC
A0
A0
B0
B0
Ci0
B1
A1
B1
Ci
A1
Ci1
Ci2
13
A0
A1
B0
B1
Subsystem Design

Dynamic Adder Design: np-CMOS adder
MO
UN
TI
RE COUNT
Y
Ci0
A0
B0
B1
S0
S1
(December 11, 2000 3:44 pm)
Ci0
Ci1
Ci1
A1
CMPE 413/CMSC 711
Subsystem Design
1966
G0
Co,0
3.5
2.5
P1
G1
3
2.5
Co,1
P2
2
G2
Co,2
2.5
1.5
P3
G3
Co,3
1.5
P4
1
G4
1.5
Co,4
carry, Ci,0 and the previous propagate signals are high, P0 to Pk-1.
Evaluate: Node Co,k is discharged, for example, if there is an incoming
Precharge: All intermediate nodes, e.g. Co,0, charged to VDD.
Transistor sizes largest here since worst case is to discharge all nodes Co,k.
3.5
P0
UMBC
14
(December 11, 2000 3:44 pm)
Only 4 diffusion capacitances are present per node but the distributed RCnature of the chain results in delay that is quadratic with number of bits.
Buffers and/or transistor sizing can be used to improve performance.
YLAND BA
L
U M B C
AR
Ci,0
IVERSITY O
F
Co,4
CMPE 413/CMSC 711

Dynamic Adder Design: Manchester Carry-Chain adder.
A chain of pass-transistors are used to implement the carry chain.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
1966
C2
R3
C3
R4
C4
R5
C5
R6
C6
Out
CMPE 413/CMSC 711
UMBC
15
(December 11, 2000 3:44 pm)
Note that reducing R by a factor, e.g. k, at each stage increases the capacitance
by a factor k and increases area.
A k-factor of 1.5, reduces delay by 40% and increases area by 3.5X.
contribution.
Since R1 appears 6 times in the expression, it makes sense to minimize its
C5(R1 + R2 + R3 + R4 + R5) + C6(R1 + R2 + R3 + R4 + R5 + R6)
The delay of the RC network is then:

tp = 0.69(C1R1 + C2(R1 + R2) + C3(R1 + R2 + R3) + C4(R1 + R2 + R3 + R4) +
YLAND BA
L
U M B C
AR
C1
R2
Elmore delay is given by:

N
i
t p = 0.69 C i R j
i = 1 j = 1
R1

Consider the worst case delay of the carry chain:
MO
UN
TI
RE COUNT
Y
YLAND BA
L
1966
U M B C
AR
Co,0
P1
FA
G1
Co,1
FA
G2
Co,2
FA
P3
Co,3
Co,3
BP = P0P1P2P3
G3
CMPE 413/CMSC 711
occurred.
16
(December 11, 2000 3:44 pm)
In other words:
if (P0P1P2P3 == 1) then Co,3 = Ci,0 else either DELETE or GENERATE
plete chain and Co,3 = 1.
high.
In this case, an incoming carry Ci,0 = 1, propagates along the com-
Assume Ak and Bk (for k = 1...3) are set such that all Pk (propagate) are
FA
G0
UMBC
Ci,0
P0
P2
Subsystem Design

Carry-Bypass adder:
IVERSITY O
F
Mux
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
Co,k+3
Sum Generation
Carry vector
Mux
1-carry propagation
0-carry propagation
For Square-Root Carry-Select,

higher order blocks take more
operand bits than lower order
blocks.
Select operation is much faster than

time to compute either of the two
possible carry vectors.
This block adds bits k to k+3.
UMBC
17
(December 11, 2000 3:44 pm)
A Square-Root Carry-Select Adder (delay = O(N1/2)) is constructed by

increasing the number of input bits in each block from lsb to msb.
Co,k-1
P,G
Setup

Linear Carry-Select adder:
One way around waiting for the incoming carry is to compute the result
of both possible values in advance and let the incoming carry select the
correct result.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
Co,k = Gk + Pk . (Gk-1 + Pk-1.Co,k-2)
YLAND BA
L
1966
UMBC
18
every bit, making the fanout load large.

(December 11, 2000 3:44 pm)
Note that the low-order terms, e.g., P0 and G0, appear in the expression for
For example, for 4 stages of look-ahead:

C0 = G0 + P0Ci
C1 = G1 + P1G0 + P1P0Ci
C2 = G2 + P2G1 + P2P1G0 + P2P1P0Ci
C3 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0Ci
U M B C
AR
CMPE 413/CMSC 711
The dependency between Co,k and Co,k-1 can be eliminated by

expanding Co,k-1.
Pk = Ak + Bk
The carry out of the kth stage is computed as:

Co,k = Gk + Pk . Co,k-1 where Gk = Ak . Bk

Carry look-ahead adder (avoiding the ripple altogether):
Compute the carries to each stage in parallel.
MO
UN
TI
RE COUNT
Y
Subsystem Design
1966
UMBC
19
C0,3
G0
G1
G2
(December 11, 2000 3:44 pm)
Size and fan-in of the gates limit the size to about four.
YLAND BA
L
U M B C
AR
P3
P2
P1
P0
Ci,0
IVERSITY O
F
G3
CMPE 413/CMSC 711

Carry look-ahead adder:
One possible implementation without using simple logic gates.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
G<0>
YLAND BA
L
1966
U M B C
AR
UMBC
Other high speed versions

given in Weste and Eshraghian.
Ci,0
P<0>
Worst case is pull-down

through 6 series n-channel
transistors.
P<1>
Domino CMOS implementation:
20
G<1>
P<2>
Clk
Clk
G<2>
P<3>
C3 = G3 + P3(G2 + P2(G1 + P1(G0 + P0Ci,0)))
Factoring term C3 yields:

Carry look-ahead adder:
MO
UN
TI
RE COUNT
Y
(December 11, 2000 3:44 pm)
G<3>
C<3>
CMPE 413/CMSC 711
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
21
(December 11, 2000 3:44 pm)
layout is compact (jigsaw puzzle) (see Rabaey for details).
The number of logic levels is proportional to log2N, fan-in is limited and the
The dot operator (
Inverse
binary tree
Forward
binary tree
(C4-7,P4-7)
)is defined as: (g, p) . (g, p) = (g + pg, pp)
Co,7
Co,5
(G6, P6)
(G7, P7)
Co,3
Co,4
Co,6
Co,1
Co,2
CMPE 413/CMSC 711
(G4, P4)
(G5, P5)
(G2, P2)
(G3, P3)
(G0, P0) Co,0

(G1, P1)

The Logarithmic look-ahead adder: O(log2N) delay:
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
Zero detect NOR gate.
B=A
UMBC
22
(December 11, 2000 3:44 pm)
Think about the modifications necessary to make it a signed comparator

(Hint: A couple of XOR gates).
B<3>
A<3>
B<2>
A<2>
B<1>
A<1>
B<0>
A<0>
Datapath Operators: Comparison

Magnitude Comparators:
May be built from an adder, complementer (XOR gates) and a zero
detect unit.
B >= A
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
1966
Clk
Q
Q<0>
Q<1>
"Ripple Carry" Binary counter
Q<2>
UMBC
23
(December 11, 2000 3:44 pm)
Q<3>
Q<3>
CMPE 413/CMSC 711
Not a good choice for performance and testability (with no reset).
YLAND BA
L
U M B C
AR
Clk
Subsystem Design
Datapath Operators: Binary Counters

Asynchronous: Based on the Toggle register.
MO
UN
TI
RE COUNT
Y
1966
Clear
Clk
Clear
D
Q
1-bit
Reg
Q<0>
0
1
Clk
Clear
D
Q
1-bit
Reg
0
1
Clk
Clear
D
Q
1-bit
Reg
Q<2>
0
1
Clk
Clear
D
Q
1-bit
Reg
UMBC
24
Q<3>
CMPE 413/CMSC 711
(December 11, 2000 3:44 pm)
Replace AND gate with an adder for up/down counting capability.

Weste and Eshraghian also show a version that can be initialized.
YLAND BA
L
U M B C
AR
Clk
IVERSITY O
F
Q<1>
Subsystem Design
Datapath Operators: Binary Counters

Synchronous counter.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
1966
UMBC
25
(December 11, 2000 3:44 pm)
The parallel form computes the partial products in parallel.
YLAND BA
L
U M B C
AR
Binary multiplication equivalent to

AND operation
CMPE 413/CMSC 711
Multipliers may be classified by the format in which data words are accessed:
Serial
Serial/parallel
Parallel
1100
0000
1100
0000
0111100
1100
X 0101
Datapath Operators: Multiplication

Multiplication can be broken down into two steps:
Computation of partial products.
Accumulation of the shifted partial products.
MO
UN
TI
RE COUNT
Y
Subsystem Design
YLAND BA
L
1966
U M B C
AR
P7
Y =
X =
IVERSITY O
F
Y j2 j
X i2i
P5
X2
Y2
X1
Y1
X0
Y0
P4
P3
P2
P1
26
P0
X i2i
j=0
n1
Y j2 j =
k=0
m + n 1
(December 11, 2000 3:44 pm)
There are m*n summands

produced by a set of m*n
AND gates in parallel.
Multiplicand
Multiplier
i=0
X3Y0 X2Y0 X1Y0 X0Y0

X3Y1 X2Y1 X1Y1 X0Y1
X2Y2 X1Y2 X0Y2
X1Y3 X0Y3
X3
Y3
P = XY =
m1
Pk 2 k
CMPE 413/CMSC 711
Multiplying 2 unsigned binary integers results in:
UMBC
P6
X3Y2
X3Y3 X2Y3
j=0
n1
i=0
m1

Parallel Unsigned Multiplication:
MO
UN
TI
RE COUNT
Y
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
UMBC
Co
Multiplication
IVERSITY O
F
B
X
27
(December 11, 2000 3:44 pm)
A NxN multiplier requires:

N(N-2) full adders
N half adders
Sum the
N2 AND gates
Partial products
Ci
Most of the work (and delay) is in summing the partial products.
and Yi.

Parallel Multiplication:
Multiplication is carried out using a bitwise AND of the operands, Xi
MO
UN
TI
RE COUNT
Y
YLAND BA
L
1966
U M B C
AR
P7
UMBC
P5
FA
FA
P6
X2
P4
FA
X1
FA
FA
X3
X2
X3
HA
tmult = (M-1)+(N-2)tcarry
X3
+ (N-1)tsum + tand
IVERSITY O
F
X3
28
P3
HA
X0
FA
X1
FA
X2
Y3
X2
X0
FA
X1
P2
HA
Subsystem Design

Array multiplier:
MO
UN
TI
RE COUNT
Y
Y1
P0
Y0
There are a large

number of nearly
identical critical
paths in this
circuit.
P1
HA
X0
X0
(December 11, 2000 3:44 pm)
Y2
X1
CMPE 413/CMSC 711
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
i=0
i=0
m2
bi 2i
ai 2i
UMBC
= am 1 bn 1
2m + n 2 +
i=0j=0
m 2n 2
29
ai b j 2 i + j
i=0
m2
ai bn 1
i=0
n2
am 1 bi 2 m 1 + i
(December 11, 2000 3:44 pm)
2n 1 + i
Expanding shows that the last two rows of

summands are all negative so the algorithm
simply adds in their negations.
Let A and B represent signed integers.
m2
n2
1
i
n
1
P = am 1 2
+ ai 2 bn 1 2
+ b i 2 i
i=0
i=0
B = bm 1 2 m 1 +
A = am 1 2 m 1 +
Baugh-Wooley algorithm: Only 3 additional adders required over the

unsigned version.
m2
Parallel Signed Multiplication:
The transmission gate adder is a good choice here.
This is in contrast with the adder where minimizing tcarry was key.

From the delay expression and the fact that all critical paths have the same
length, minimizing tmult requires minimizing both tcarry and tsum.
MO
UN
TI
RE COUNT
Y
1966
YLAND BA
L
U M B C
AR
RE COUNT
Y
UMBC
a7 a
7
AND
( a7 b0 )
a6 a
6
AND
a5 a
5
AND
AND
ADD
AND
ADD
( a5 b0 )
( a6 b0 )
AND
( a7 b1 )
a4
AND
( a4 b0 )
AND
ADD
a4
a3 a
3
AND
( a3 b0 )
AND
ADD
a2 a
2
AND
( a2 b0 )
AND
ADD
a1
AND
( a1 b0 )
AND
ADD
a1 a0
AND
( a0 b0 )
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
( a0 b1 )
( a1 b1 )
( a2 b1 )
( a3 b1 )
( a4 b1 )
( a5 b1 )
( a6 b1 )
AND
( a7 b2 )
( a0 b2 )
( a1 b2 )
( a2 b2 )
( a3 b2 )
( a4 b2 )
( a5 b2 )
( a6 b2 )
30
( a4 b3 )
( a5 b3 )
( a6 b3 )
( a3 b3 )
AND
ADD
AND
ADD
AND
ADD
AND
ADD
( a0 b3 )
( a1 b3 )
( a2 b3 )
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
( a0 b4 )
( a1 b4 )
( a2 b4 )
( a3 b4 )
( a4 b4 )
( a5 b4 )
( a6 b4 )
( a3 b5 )
( a4 b5 )
( a5 b5 )
( a6 b5 )
( a2 b5 )
AND
ADD
AND
ADD
AND
ADD
( a0 b5 )
( a1 b5 )
AND
ADD
AND
ADD
( a0 b6 )
( a1 b6 )
( a2 b6 )
( a3 b6 )
( a4 b6 )
( a5 b6 )
( a6 b6 )
( a0 b7 )
( a1 b7 )
( a2 b7 )
( a3 b7 )
a0
b0 b0
b1 b1
P0
b2 b2
P1
b3 b3
P2
b4 b4
P3
b5 b5
P4
b6 b6
P5
b7 b7
P6
CMPE 413/CMSC 711
(December 11, 2000 3:44 pm)
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
( a4 b7 )
( a5 b7 )
( a6 b7 )
P8
( a7 b7 )
P14
P13
P12
P11
P10
P9
Subsystem Design
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
ADD
AND
( a7 b3 )
AND
ADD
AND
ADD
AND
ADD
AND
( a7 b4 )
AND
( a7 b5 )
AND
( a7 b6 )
AND
( a7 b7 )
ADD
ADD
ADD
ADD
ADD
ADD
ADD
ADD
( a7 b7 )
ADD
ADD

Parallel Signed Multiplication:
IVERSITY O
F
MO
P15
UN
TI
P7
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
FA
HA
FA
FA
Vector-merging adder
FA
FA
HA
FA
FA
HA
HA
4x4 version
Minimizing tmerge is useful,

e.g. use carry-select or lookahead.
(Assuming tadd = tcarry).
Advantage:
Critical path is uniquely defined:
tmult = (N-1)tcarry + tand + tmerge
FA
HA
UMBC
31
(December 11, 2000 3:44 pm)
Here the carry bits are not immediately added but rather saved for the
next adder stage.
HA
HA
Cost: A little extra

area:
IVERSITY O
F
HA

Carry-Save Multiplier:
Carry bits can be passed diagonally downwards instead of to the left.
MO
UN
TI
RE COUNT
Y
reset
If area
is a
concern.
X
Y
G1
G2
Cin
Clk
1966
Computes the summands

row-wise from right to left.
UMBC
32
(December 11, 2000 3:44 pm)
Serial/Parallel Unsigned Multiplier shown in Weste and Eshraghian.
YLAND BA
L
U M B C
AR
P7
P0
CMPE 413/CMSC 711
serial register
Disadv: Quadratic delay: tmult = M x N x tcarry
Xi and Yi delivered serially

Clk
to the inputs of G1 at different rates.
IVERSITY O
F
Reg
1-bit
Subsystem Design

Serial Unsigned Multiplication:
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
CMPE 413/CMSC 711
YLAND BA
L
1966
U M B C
AR
j=0
j
Y j 4 with ( Y j { 2, 1, 0, 1, 2 } )
UMBC
33
(December 11, 2000 3:44 pm)
Virtually every multiplier in use employs the Booth scheme.
The disadvantage is a somewhat more involved multiplier cell.

AND operation replaced with inversion and shift logic.
The number of partial products (and additions) is halved, resulting in

area and speed advantage.
Y =
(N 1) 2
Radix-4 scheme:

Booth Encoding:
A special encoding of the multiplier word reduces the number of
required addition stages and speeds up multiplication substantially.
MO
UN
TI
RE COUNT
Y
IVERSITY O
F
Subsystem Design
YLAND BA
L
1966
U M B C
AR
Ci
Ci
Ci
Ci
Y5
Y4
Y3
Ci-1
Ci-1
Ci-1
Y2
34
Ci
FA
Y1
Sum
FA
FA
Y2
Y3
Ci-1
Ci-1
FA
Y5
(December 11, 2000 3:44 pm)
Adv: O(log2N) mult time.

Disadv: Very irregular -- difficult
to layout.
Y0
Slice of a 6-bit
carry-save mult.
Ci
Ci
# of ripple stages is N-2

Sum
FA
FA
FA
FA
Y1
UMBC
Y0
Y4
CMPE 413/CMSC 711

Wallace Multiplier:
Trees can be used to replace the linear partial-sum adders:
MO
UN
TI
RE COUNT
Y
YLAND BA
L
1966
U M B C
AR
UMBC
H3
Mux
Right/Left
IR
IVERSITY O
F
A3
Datapath Operators: Shifters

Right/Left 1-bit shifter:
MO
UN
TI
RE COUNT
Y
H2
Mux
A2
35
Subsystem Design
H0
Mux
A0
IL
(December 11, 2000 3:44 pm)
H1
Mux
A1
CMPE 413/CMSC 711
IVERSITY O
F
s<1>
s<0>
Subsystem Design
YLAND BA
L
1966
U M B C
AR
UMBC
36
result
l<3:0>
l<4:1>
l<5:2>
l<6:3>
shift
1
2
4
8
r<0>
r<1>
r<2>
r<3>
CMPE 413/CMSC 711
(December 11, 2000 3:44 pm)
l<6:0> Arithmetic and logical shifts and rotates possible

by muxing l<6:0> to the appropriate values.
Datapath Operators: Shifters

Barrel shifter:
s<3>
s<2>
MO
UN
TI
RE COUNT
Y

Chap8 1

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Chap8 1

Hochgeladen von

Copyright:

Verfügbare Formate

IVERSITY O

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

Digital Device Components

Principles of VLSI Design

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

Interconnect and Input-Output:

Digital Device Components

Principles of VLSI Design

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

Also, optimizations focused at one design level, e.g., sizing transistors,

Digital Device Components

Principles of VLSI Design

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

Lets start with some basic definitions before considering optimizations:

Optimizations can be applied at the logic or circuit level.

Datapath Operators: Addition/Subtraction

Principles of VLSI Design

CMPE 413/CMSC 711

Carry = A.B + A.Ci + B.Ci

(December 11, 2000 3:44 pm)

Sum = A.B.Ci + A.B.Ci + A.B.Ci + A.B.Ci = A XOR B XOR C

The Boolean expressions for S and Co are:

P(A XOR B): (propagate)

Datapath Operators: Addition/Subtraction

Principles of VLSI Design

(December 11, 2000 3:44 pm)

CMPE 413/CMSC 711

(Also, Co and S can be expressed in terms of delete (D)).

Note that G and P are INdependent of Ci.

Co(G, P) = G + PC i (or P in this case).

Datapath Operators: Addition/Subtraction

Principles of VLSI Design

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

Datapath Operators: Addition/Subtraction

Principles of VLSI Design

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

Transistor level diagram uses

One possible (un-optimized) implementation:

Co(A, B, Ci) = Co(A, B, Ci)

Datapath Operators: Addition/Subtraction

Principles of VLSI Design

Sum = A.B.Ci + (A + B + Ci)Co

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

closest to the output and symmetrical design, this implementation is slow.

Are the n and p trees duals

Datapath Operators: Addition/Subtraction

Principles of VLSI Design

CMPE 413/CMSC 711

Eliminates the inverter delay per bit for carry!

plus the 6 (next bit) gate caps.

(December 11, 2000 3:44 pm)

tances (inverter) and 6 (next bit) gate capacitances:

Datapath Operators: Addition/Subtraction

Principles of VLSI Design

(December 11, 2000 3:44 pm)

Bit-level pipelining can be used to break the dependency between addition