Sie sind auf Seite 1von 36

IVERSITY O

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

Datapath

Memory

Control

UMBC

(December 11, 2000 3:44 pm)

Datapath: The core -- all other components are support units that store
either the results of the datapath or determine what happens in the next
cycle.

Input-Output

Digital Device Components


A simple processor illustrates many of the basic components used in any digital system:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

UMBC

(December 11, 2000 3:44 pm)

Interconnect and Input-Output:


Parasitic resistance, capacitance and inductance affects performance of
wires both on and off the chip.
Growing die size increases the length of the on-chip interconnect,
increasing the value of the parasitics.

Control:
A FSM (sequential circuit) implemented using random logic, PLAs or
memories.

Digital Device Components


Memory:
A broad range of classes exist determined by the way data is accessed:
Read-Only vs. Read-Write
Sequential vs. Random access
Single-ported vs. Multi-ported access
Or by their data retention characteristics:
Dynamic vs. Static
Stay tuned for a more extensive treatment of memories.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

Registers

Data-In

UMBC

(December 11, 2000 3:44 pm)

Bit 0

Bit 4
Bit 3
Bit 2
Bit 1

Data-Out

Multiplexer
Shifter

Adder

Also, optimizations focused at one design level, e.g., sizing transistors,


leads to inferior designs.
Bit-sliced organization
Control
is common for datapaths.

However, as we will see, the task is non-trivial since there are multiple
equivalent logic and circuit topologies to choose from, each with adv./
disadv. in terms of speed, power and area.

Digital Device Components


Datapath elements include adders, multipliers, shifters, BFUs, etc.
The speed of these elements often dominates the overall system performance so optimization techniques are important.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

0
0
0
0
1
1
1
1

0
1
0
1
0
1
0
1

0
0
1
1
0
0
1
1

0
0
0
0
0
0
1
1

G(A.B)

UMBC

Ci

B
0
0
1
1
1
1
1
1

P(A+B)

0
0
1
1
1
1
0
0

0
0
0
1
0
1
1
1

Co

delete
delete
propagate
propagate
propagate
propagate
generate
generate

Carry status

(December 11, 2000 3:44 pm)

0
1
1
0
1
0
0
1

P(A + B) Sum

Lets start with some basic definitions before considering optimizations:

Optimizations can be applied at the logic or circuit level.


Logic-level optimization try to rearrange the Boolean equations to produce
a faster or smaller circuit, e.g. carry look-ahead adder.
Circuit-level optimizations manipulate transistor sizes and circuit topology
to optimize speed.

Datapath Operators: Addition/Subtraction


Lets start with addition, since it is a very common datapath element and
often a speed-limiting element.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

UMBC

Carry = A.B + A.Ci + B.Ci


5

(December 11, 2000 3:44 pm)

Sum = A.B.Ci + A.B.Ci + A.B.Ci + A.B.Ci = A XOR B XOR C

The Boolean expressions for S and Co are:

D(A.B): (delete)
Ensures that a carry bit will be deleted at Co.

P(A XOR B): (propagate)


Used in some adders for the P term since it can be reused to generate the
sum term.

P(A+B): (propagate)
Indicates that Ci is propagated (passed) to Co.

pendent of Ci).

Datapath Operators: Addition/Subtraction


G(A.B): (generate)
Occurs when a Co is internally generated within the adder (occurs inde-

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

YLAND BA
L

1966

U M B C

AR

S0

FA

B0
Co,0
=Ci,1

B1

S1

FA

A1
Co,1

S2

FA

A2

B2
Co,2

S3

FA

A3

Co,3

UMBC

(December 11, 2000 3:44 pm)

The critical path (worst case delay over all possible inputs) is a ripple from
lsb to msb.

Ci,0

A0

Ripple-carry adder:
B3

CMPE 413/CMSC 711

(Also, Co and S can be expressed in terms of delete (D)).

Note that G and P are INdependent of Ci.

S(G, P) = P XOR C i

Co(G, P) = G + PC i (or P in this case).

Datapath Operators: Addition/Subtraction


But S and Co can be written in terms of G and P:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

UMBC

Ci
S

FA

Co

Ci

(December 11, 2000 3:44 pm)

FA

Co

The inverting property of a full adder can be used to achieve this goal:
B
B
A
A

Note that when optimizing this structure, it is far more important to optimize
tcarry than tsum.

One possible worst case bit pattern (from lsb to msb) is:
A: 00000001; B: 01111111
Convince yourself that this is true.

where tcarry and tsum are the propagation delays from Ci to Co & S.

Datapath Operators: Addition/Subtraction


The delay in this case is proportional to the number of bits, N, in the input
words:
tadder = (N - 1)tcarry + tsum

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

YLAND BA
L

1966

U M B C

AR

UMBC

A
B

P XOR Ci

Ci.P(A + B)

Ci
A
B

A
B
Ci

G(A.B)

Co

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

Transistor level diagram uses


32 transistors.
(see Weste and Eshraghian).

One possible (un-optimized) implementation:

Co(A, B, Ci) = Co(A, B, Ci)

Datapath Operators: Addition/Subtraction


Thus,
S(A, B, Ci) = S(A, B, Ci)

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

Ci

A
B

Ci

Co

Ci

Sum = A.B.Ci + (A + B + Ci)Co

Ci

Ci

Symmetrical
A design
eliminates
diffusion
B caps and
reduces
Ci series R.

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

UMBC

(December 11, 2000 3:44 pm)

closest to the output and symmetrical design, this implementation is slow.

Are the n and p trees duals


Co
28 transistors
of each other?
Even with some design tricks, e.g., transistors on the critical path, Ci placed

IVERSITY O
F

Subsystem Design

Datapath Operators: Addition/Subtraction


Co is reused in the S term as:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

Cin

Subtract

Eliminates the inverter delay per bit for carry!

S<0>

C<3>

S<0>

S<1>

S<2>

S<3>

UMBC

plus the 6 (next bit) gate caps.


10

(December 11, 2000 3:44 pm)

This version increases Cos load to 4 diffusion caps, 2 internal (sum) gate caps

B<0>
A<0>

tances (inverter) and 6 (next bit) gate capacitances:


Overflow
C<n+1>
B<n>
S<n>
A<n>
B<3>
C<n> Sign of
A<3>
C<3> the result
B<2>
B<3>
S<3>
A<2>
A<3>
B<1>
B<2>
S<2>
A<1>
A<2>
B<0>
B<1>
S<1>
A<0>
A<1>

Datapath Operators: Addition/Subtraction


The load capacitance in previous version on Co consists of 2 diffusion capaci-

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

1966

result

UMBC

11

(December 11, 2000 3:44 pm)

Bit-level pipelining can be used to break the dependency between addition


time and the number of bits by inserting FAs between each register bit.

YLAND BA
L

U M B C

AR

n bit shift register

CMPE 413/CMSC 711

In this case, you want equal Sum and Carry delays in order to minimize clock
cycle time.

Clk

Cout

augand C
in

n bit shift register


addend

Clk
Set
Clr

Reg
1-bit

Datapath Operators: Addition/Subtraction


Serial addition can be used if area is a concern:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

Subsystem Design

1966

XOR

XNOR

Co

CMPE 413/CMSC 711

UMBC

12

(December 11, 2000 3:44 pm)

See Weste and Eshraghian for an 18 transistor implementation.

Note: S and Co delay times are approximately equal -- good for multipliers.

YLAND BA
L

U M B C

AR

Ci

IVERSITY O
F

Total transistors is 26

Datapath Operators: Addition/Subtraction


Transmission-gate Adder:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

YLAND BA
L

1966

U M B C

AR

UMBC

A0

A0

B0

B0

Ci0

B1
A1

B1

Ci

A1

Ci1

Ci2

13

A0

A1

B0

B1

Subsystem Design

Datapath Operators: Addition/Subtraction


Dynamic Adder Design: np-CMOS adder

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

Ci0

A0

B0

B1

S0

S1

(December 11, 2000 3:44 pm)

Ci0

Ci1

Ci1
A1

CMPE 413/CMSC 711

Subsystem Design

1966

G0

Co,0

3.5

2.5

P1
G1
3

2.5

Co,1

P2
2

G2

Co,2

2.5

1.5

P3
G3

Co,3

1.5

P4
1

G4

1.5

Co,4

carry, Ci,0 and the previous propagate signals are high, P0 to Pk-1.

Evaluate: Node Co,k is discharged, for example, if there is an incoming

Precharge: All intermediate nodes, e.g. Co,0, charged to VDD.

Transistor sizes largest here since worst case is to discharge all nodes Co,k.

3.5

P0

UMBC

14

(December 11, 2000 3:44 pm)

Only 4 diffusion capacitances are present per node but the distributed RCnature of the chain results in delay that is quadratic with number of bits.
Buffers and/or transistor sizing can be used to improve performance.

YLAND BA
L

U M B C

AR

Ci,0

IVERSITY O
F

Co,4

CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Dynamic Adder Design: Manchester Carry-Chain adder.
A chain of pass-transistors are used to implement the carry chain.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

1966

C2

R3
C3

R4
C4

R5
C5

R6
C6

Out

CMPE 413/CMSC 711

UMBC

15

(December 11, 2000 3:44 pm)

Note that reducing R by a factor, e.g. k, at each stage increases the capacitance
by a factor k and increases area.
A k-factor of 1.5, reduces delay by 40% and increases area by 3.5X.

contribution.

Since R1 appears 6 times in the expression, it makes sense to minimize its

C5(R1 + R2 + R3 + R4 + R5) + C6(R1 + R2 + R3 + R4 + R5 + R6)

The delay of the RC network is then:


tp = 0.69(C1R1 + C2(R1 + R2) + C3(R1 + R2 + R3) + C4(R1 + R2 + R3 + R4) +

YLAND BA
L

U M B C

AR

C1

R2

Elmore delay is given by:


N
i

t p = 0.69 C i R j
i = 1 j = 1

R1

Datapath Operators: Addition/Subtraction


Consider the worst case delay of the carry chain:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

YLAND BA
L

1966

U M B C

AR

Co,0

P1

FA

G1
Co,1
FA

G2
Co,2
FA

P3
Co,3

Co,3

BP = P0P1P2P3

G3

CMPE 413/CMSC 711

occurred.
16

(December 11, 2000 3:44 pm)

In other words:
if (P0P1P2P3 == 1) then Co,3 = Ci,0 else either DELETE or GENERATE

plete chain and Co,3 = 1.

high.
In this case, an incoming carry Ci,0 = 1, propagates along the com-

Assume Ak and Bk (for k = 1...3) are set such that all Pk (propagate) are

FA

G0

UMBC

Ci,0

P0

P2

Subsystem Design

Datapath Operators: Addition/Subtraction


Carry-Bypass adder:

IVERSITY O
F

Mux

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

Co,k+3

Sum Generation

Carry vector

Mux

1-carry propagation

0-carry propagation

For Square-Root Carry-Select,


higher order blocks take more
operand bits than lower order
blocks.

Select operation is much faster than


time to compute either of the two
possible carry vectors.

This block adds bits k to k+3.

UMBC

17

(December 11, 2000 3:44 pm)

A Square-Root Carry-Select Adder (delay = O(N1/2)) is constructed by


increasing the number of input bits in each block from lsb to msb.

Co,k-1

P,G

Setup

Datapath Operators: Addition/Subtraction


Linear Carry-Select adder:
One way around waiting for the incoming carry is to compute the result
of both possible values in advance and let the incoming carry select the
correct result.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

Co,k = Gk + Pk . (Gk-1 + Pk-1.Co,k-2)

YLAND BA
L

1966

UMBC

18

every bit, making the fanout load large.


(December 11, 2000 3:44 pm)

Note that the low-order terms, e.g., P0 and G0, appear in the expression for

For example, for 4 stages of look-ahead:


C0 = G0 + P0Ci
C1 = G1 + P1G0 + P1P0Ci
C2 = G2 + P2G1 + P2P1G0 + P2P1P0Ci
C3 = G3 + P3G2 + P3P2G1 + P3P2P1G0 + P3P2P1P0Ci

U M B C

AR

CMPE 413/CMSC 711

The dependency between Co,k and Co,k-1 can be eliminated by


expanding Co,k-1.

Pk = Ak + Bk

The carry out of the kth stage is computed as:


Co,k = Gk + Pk . Co,k-1 where Gk = Ak . Bk

Datapath Operators: Addition/Subtraction


Carry look-ahead adder (avoiding the ripple altogether):
Compute the carries to each stage in parallel.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

Subsystem Design

1966

UMBC

19

C0,3

G0

G1

G2

(December 11, 2000 3:44 pm)

Size and fan-in of the gates limit the size to about four.

YLAND BA
L

U M B C

AR

P3

P2

P1

P0

Ci,0

IVERSITY O
F

G3

CMPE 413/CMSC 711

Datapath Operators: Addition/Subtraction


Carry look-ahead adder:
One possible implementation without using simple logic gates.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

G<0>

YLAND BA
L

1966

U M B C

AR

UMBC

Other high speed versions


given in Weste and Eshraghian.

Ci,0

P<0>

Worst case is pull-down


through 6 series n-channel
transistors.
P<1>

Domino CMOS implementation:

20

G<1>

P<2>

Clk

Clk

G<2>

P<3>

C3 = G3 + P3(G2 + P2(G1 + P1(G0 + P0Ci,0)))

Factoring term C3 yields:

Datapath Operators: Addition/Subtraction


Carry look-ahead adder:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

(December 11, 2000 3:44 pm)

G<3>

C<3>

CMPE 413/CMSC 711

IVERSITY O
F

Subsystem Design

YLAND BA
L

1966

U M B C

AR

UMBC

21

(December 11, 2000 3:44 pm)

layout is compact (jigsaw puzzle) (see Rabaey for details).

The number of logic levels is proportional to log2N, fan-in is limited and the

The dot operator (

Inverse
binary tree

Forward
binary tree

(C4-7,P4-7)
)is defined as: (g, p) . (g, p) = (g + pg, pp)

Co,7

Co,5

(G6, P6)
(G7, P7)

Co,3

Co,4

Co,6

Co,1

Co,2

CMPE 413/CMSC 711

(G4, P4)
(G5, P5)

(G2, P2)
(G3, P3)

(G0, P0) Co,0


(G1, P1)

Datapath Operators: Addition/Subtraction


The Logarithmic look-ahead adder: O(log2N) delay:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

Zero detect NOR gate.

B=A

UMBC

22

(December 11, 2000 3:44 pm)

Think about the modifications necessary to make it a signed comparator


(Hint: A couple of XOR gates).

B<3>
A<3>
B<2>
A<2>
B<1>
A<1>
B<0>
A<0>

Datapath Operators: Comparison


Magnitude Comparators:
May be built from an adder, complementer (XOR gates) and a zero
detect unit.
B >= A

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

1966

Clk
Q

Q<0>

Q<1>

"Ripple Carry" Binary counter

Q<2>

UMBC

23

(December 11, 2000 3:44 pm)

Q<3>

Q<3>

CMPE 413/CMSC 711

Not a good choice for performance and testability (with no reset).

YLAND BA
L

U M B C

AR

Clk

Subsystem Design

Datapath Operators: Binary Counters


Asynchronous: Based on the Toggle register.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

1966

Clear

Clk
Clear

D
Q
1-bit
Reg

Q<0>

0
1
Clk
Clear

D
Q
1-bit
Reg

0
1
Clk
Clear

D
Q
1-bit
Reg

Q<2>

0
1

Clk
Clear

D
Q
1-bit
Reg

UMBC

24

Q<3>

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

Replace AND gate with an adder for up/down counting capability.


Weste and Eshraghian also show a version that can be initialized.

YLAND BA
L

U M B C

AR

Clk

IVERSITY O
F

Q<1>

Subsystem Design

Datapath Operators: Binary Counters


Synchronous counter.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

1966

UMBC

25

(December 11, 2000 3:44 pm)

The parallel form computes the partial products in parallel.

YLAND BA
L

U M B C

AR

Binary multiplication equivalent to


AND operation

CMPE 413/CMSC 711

Multipliers may be classified by the format in which data words are accessed:
Serial
Serial/parallel
Parallel

1100
0000
1100
0000
0111100

1100
X 0101

Datapath Operators: Multiplication


Multiplication can be broken down into two steps:
Computation of partial products.
Accumulation of the shifted partial products.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

Subsystem Design

YLAND BA
L

1966

U M B C

AR

P7

Y =

X =

IVERSITY O
F

Y j2 j

X i2i

P5

X2
Y2

X1
Y1

X0
Y0

P4

P3

P2

P1

26

P0

X i2i
j=0

n1
Y j2 j =

k=0

m + n 1

(December 11, 2000 3:44 pm)

There are m*n summands


produced by a set of m*n
AND gates in parallel.

Multiplicand
Multiplier

i=0

X3Y0 X2Y0 X1Y0 X0Y0


X3Y1 X2Y1 X1Y1 X0Y1
X2Y2 X1Y2 X0Y2
X1Y3 X0Y3

X3
Y3

P = XY =

m1

Pk 2 k

CMPE 413/CMSC 711

Multiplying 2 unsigned binary integers results in:

UMBC

P6

X3Y2
X3Y3 X2Y3

j=0

n1

i=0

m1

Datapath Operators: Multiplication


Parallel Unsigned Multiplication:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

UMBC

Co

Multiplication

IVERSITY O
F

B
X

27

(December 11, 2000 3:44 pm)

A NxN multiplier requires:


N(N-2) full adders
N half adders
Sum the
N2 AND gates
Partial products

Ci

Most of the work (and delay) is in summing the partial products.

and Yi.

Datapath Operators: Multiplication


Parallel Multiplication:
Multiplication is carried out using a bitwise AND of the operands, Xi

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

YLAND BA
L

1966

U M B C

AR

P7

UMBC

P5

FA

FA

P6

X2

P4

FA

X1

FA

FA

X3

X2

X3

HA

tmult = (M-1)+(N-2)tcarry
X3
+ (N-1)tsum + tand

IVERSITY O
F

X3

28

P3

HA

X0

FA

X1

FA

X2

Y3

X2

X0

FA

X1

P2

HA

Subsystem Design

Datapath Operators: Multiplication


Array multiplier:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

Y1

P0

Y0

There are a large


number of nearly
identical critical
paths in this
circuit.

P1

HA

X0

X0

(December 11, 2000 3:44 pm)

Y2

X1

CMPE 413/CMSC 711

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

i=0

i=0
m2

bi 2i

ai 2i

UMBC

= am 1 bn 1

2m + n 2 +

i=0j=0

m 2n 2

29

ai b j 2 i + j

i=0

m2
ai bn 1

i=0

n2

am 1 bi 2 m 1 + i

(December 11, 2000 3:44 pm)

2n 1 + i

Expanding shows that the last two rows of


summands are all negative so the algorithm
simply adds in their negations.

Let A and B represent signed integers.

m2
n2

1
i
n

1
P = am 1 2
+ ai 2 bn 1 2
+ b i 2 i

i=0
i=0

B = bm 1 2 m 1 +

A = am 1 2 m 1 +

Baugh-Wooley algorithm: Only 3 additional adders required over the


unsigned version.
m2

Parallel Signed Multiplication:

The transmission gate adder is a good choice here.

This is in contrast with the adder where minimizing tcarry was key.

Datapath Operators: Multiplication


From the delay expression and the fact that all critical paths have the same
length, minimizing tmult requires minimizing both tcarry and tsum.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

1966

YLAND BA
L

U M B C

AR

RE COUNT
Y

UMBC

a7 a
7
AND
( a7 b0 )

a6 a
6
AND

a5 a
5

AND

AND
ADD

AND
ADD

( a5 b0 )

( a6 b0 )

AND
( a7 b1 )

a4

AND

( a4 b0 )

AND
ADD

a4

a3 a
3

AND

( a3 b0 )

AND
ADD

a2 a
2

AND

( a2 b0 )

AND
ADD

a1

AND

( a1 b0 )

AND
ADD

a1 a0
AND

( a0 b0 )

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

( a0 b1 )

( a1 b1 )

( a2 b1 )

( a3 b1 )

( a4 b1 )

( a5 b1 )

( a6 b1 )

AND
( a7 b2 )

( a0 b2 )

( a1 b2 )

( a2 b2 )

( a3 b2 )

( a4 b2 )

( a5 b2 )

( a6 b2 )

30

( a4 b3 )

( a5 b3 )

( a6 b3 )

( a3 b3 )

AND
ADD

AND
ADD

AND
ADD

AND
ADD

( a0 b3 )

( a1 b3 )

( a2 b3 )

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

( a0 b4 )

( a1 b4 )

( a2 b4 )

( a3 b4 )

( a4 b4 )

( a5 b4 )

( a6 b4 )

( a3 b5 )

( a4 b5 )

( a5 b5 )

( a6 b5 )

( a2 b5 )

AND
ADD

AND
ADD

AND
ADD

( a0 b5 )

( a1 b5 )

AND
ADD

AND
ADD

( a0 b6 )

( a1 b6 )

( a2 b6 )

( a3 b6 )

( a4 b6 )

( a5 b6 )

( a6 b6 )

( a0 b7 )

( a1 b7 )

( a2 b7 )

( a3 b7 )

a0
b0 b0
b1 b1
P0

b2 b2
P1

b3 b3
P2

b4 b4
P3

b5 b5
P4

b6 b6
P5

b7 b7
P6

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

( a4 b7 )

( a5 b7 )

( a6 b7 )

P8

( a7 b7 )

P14

P13

P12

P11

P10

P9

Subsystem Design

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
ADD

AND
( a7 b3 )

AND
ADD

AND
ADD

AND
ADD

AND
( a7 b4 )

AND
( a7 b5 )

AND
( a7 b6 )

AND
( a7 b7 )

ADD

ADD

ADD

ADD

ADD

ADD

ADD

ADD
( a7 b7 )

ADD

ADD

Datapath Operators: Multiplication


Parallel Signed Multiplication:

IVERSITY O
F

MO

P15

Principles of VLSI Design

UN

TI

P7

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

FA

HA

FA

FA

Vector-merging adder

FA

FA

HA

FA

FA

HA

HA

4x4 version

Minimizing tmerge is useful,


e.g. use carry-select or lookahead.

(Assuming tadd = tcarry).

Advantage:
Critical path is uniquely defined:
tmult = (N-1)tcarry + tand + tmerge

FA

HA

UMBC

31

(December 11, 2000 3:44 pm)

Here the carry bits are not immediately added but rather saved for the
next adder stage.

HA

HA

Cost: A little extra


area:

IVERSITY O
F

HA

Datapath Operators: Multiplication


Carry-Save Multiplier:
Carry bits can be passed diagonally downwards instead of to the left.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

reset

If area
is a
concern.

X
Y
G1

G2

Cin

Clk

1966

Computes the summands


row-wise from right to left.

UMBC

32

(December 11, 2000 3:44 pm)

Serial/Parallel Unsigned Multiplier shown in Weste and Eshraghian.

YLAND BA
L

U M B C

AR

P7

P0

CMPE 413/CMSC 711

serial register

Disadv: Quadratic delay: tmult = M x N x tcarry

Xi and Yi delivered serially


Clk
to the inputs of G1 at different rates.

IVERSITY O
F

Reg
1-bit

Subsystem Design

Datapath Operators: Multiplication


Serial Unsigned Multiplication:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

CMPE 413/CMSC 711

YLAND BA
L

1966

U M B C

AR

j=0

j
Y j 4 with ( Y j { 2, 1, 0, 1, 2 } )

UMBC

33

(December 11, 2000 3:44 pm)

Virtually every multiplier in use employs the Booth scheme.

The disadvantage is a somewhat more involved multiplier cell.


AND operation replaced with inversion and shift logic.

The number of partial products (and additions) is halved, resulting in


area and speed advantage.

Y =

(N 1) 2

Radix-4 scheme:

Datapath Operators: Multiplication


Booth Encoding:
A special encoding of the multiplier word reduces the number of
required addition stages and speeds up multiplication substantially.

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

IVERSITY O
F

Subsystem Design

YLAND BA
L

1966

U M B C

AR

Ci

Ci

Ci

Ci

Y5

Y4

Y3

Ci-1

Ci-1

Ci-1

Y2

34

Ci

FA

Y1

Sum

FA

FA

Y2

Y3

Ci-1

Ci-1

FA

Y5

(December 11, 2000 3:44 pm)

Adv: O(log2N) mult time.


Disadv: Very irregular -- difficult
to layout.

Y0

Slice of a 6-bit
carry-save mult.

Ci

Ci

# of ripple stages is N-2


Sum

FA

FA

FA

FA

Y1

UMBC

Y0

Y4

CMPE 413/CMSC 711

Datapath Operators: Multiplication


Wallace Multiplier:
Trees can be used to replace the linear partial-sum adders:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

YLAND BA
L

1966

U M B C

AR

UMBC

H3

Mux

Right/Left

IR

IVERSITY O
F

A3

Datapath Operators: Shifters


Right/Left 1-bit shifter:

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

H2

Mux

A2

35

Subsystem Design

H0

Mux

A0

IL

(December 11, 2000 3:44 pm)

H1

Mux

A1

CMPE 413/CMSC 711

IVERSITY O
F

s<1>

s<0>

Subsystem Design

YLAND BA
L

1966

U M B C

AR

UMBC

36

result
l<3:0>
l<4:1>
l<5:2>
l<6:3>

shift
1
2
4
8

r<0>

r<1>

r<2>

r<3>

CMPE 413/CMSC 711

(December 11, 2000 3:44 pm)

l<6:0> Arithmetic and logical shifts and rotates possible


by muxing l<6:0> to the appropriate values.

Datapath Operators: Shifters


Barrel shifter:
s<3>
s<2>

Principles of VLSI Design

MO

UN

TI

RE COUNT
Y

Das könnte Ihnen auch gefallen