Lecture 12 - CpE 690 Introduction To VLSI Design

CpE 690: Digital System Design
Fall 2013
Lecture 12 Design of Arithmetic Circuits

Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030
Adapted from Lecture Notes, David Mahoney Harris CMOS VLSI Design
Generic Digital Processor

Arithmetic Unit
bit-sliced datapath adder, multiplier, shifter, comparator etc.
Memory
Memory
RAM, ROM registers, FIFO etc.
I/O Control
Control
Finite state machine PLA, random logic
Datapath
Interconnect
switches arbiters bus
2
Bit-Sliced Datapath
Datapath (or ALU) may consist of number of arithmetic units components that operate on uniform width data words (e.g. 32-bit)
Arithmetic components often apply the same operation to each bit in the data word
Bit-Sliced is an efficient physical layout style in which an n-bit datapath built by stacking together n 1-bit data paths
Data buses run (mostly) in the horizontal direction Control runs (mostly) in the vertical direction Control
Bit 3
Multiplexer
Registers
Multiplier
Bit 2 Bit 1 Bit 0
Data-out
Data-in
Logical
Shifter
Adder
Single Bit Addition: Full Adder

S=ABC Cout = MAJ(A,B,C)
A 0 0 0 0
B 0 0 1 1 0 0 1 1
C 0 1 0 1 0 1 0 1
Cout 0 0 0 1 0 1 1 1
S 0 1 1 0 1 0 0 1
A Cout
1 1 1 1
+
S
Full Adder Design I

Direct Implementation of Boolean Equations: SUM = A B C Cout = MAJ(A,B,C)
36 transistors
5
Ripple Carry Adder

Simplest design: cascade full adders
A3 Cout B3 A2 C3 B2 A1 C2 B1 A0 C1 B0
+
S3
+
S2
+
S1
+
S0
Cin
Critical path goes from Cin to Cout Worst case delay is linear in number of bits td (N-bit adder) = (N-1).tcarry + tsum Need to minimize delay tcarry = delay from C to Cout in each full adder
tsum (delay from A,B,C to S) is negligible for large N
6
Full Adder Design II

A more compact design can be realized by generating S as a function of Cout : S = ABC + (A + B + C).Cout
VDD VDD A B A B Ci A X A Ci A B B VDD A Co B Ci A B VDD Ci A B
28 transistors
Ci S
Ci
If we could eliminate output inverters

simplify design and reduce C to Cout delay
7
Full Adder Design II - Layout

Standard cell style (not bit-slice) layout:
Full Adder Inversion

Full adder is symmetric with respect to signal inversion: A B A B Cout
+
S
Cout
+
S0
Eliminate inverters and use inverting full adder A3 B3 A2 B2 A1 B1 A0 B0
Cout
+
S3
C3
+
S2
C2
+
S1
C1
+
S0
Cin
Full Adder: Design III (Mirror Adder)
24 transistors
Output inverters removed pMOS and nMOS networks are mirror of each other
rather than complimentary simplifies layout enabled by symmetry of the add operation
Transistors placed & sized to minimize carry propagation

at the expense of sum generation
10
Mirror Adder Layout

Bit-slice cell style (not standard cell) layout:
Transistors now run vertically with horizontal poly Data travels from left to right
carry propagates vertically from one bit to the next
Can build wide transistors without affecting bit pitch
11
GPK Representation
Introduce new intermediate signals that describe full adder operation in terms of carry propagation
A 0 0 1 1 B 0 1 0 1 C 0 1 0 1 0 1 0 1 G 0 0 0 1 P 0 1 1 0 K 1 0 0 0 Cout 0 0 0 1 0 1 1 1 S 0 1 1 0 1 0 0 1
G = A B (i.e. generate carry: Cout = 1 independent of C) P = A B (i.e. propagate carry: Cout = C) K = A B (i.e. kill carry: Cout = 0 independent of C) Note that G, P and K are only functions of A and B
dont need to wait for C
12
GPK Representation
Can see the action of generate, propagate and kill operators in mirror adder:
VDD VDD A "0"-Propagate Ci "1"-Propagate A B A Generate B A B Ci A B B B Kill A Co Ci S Ci A B VDD Ci A B
13
Using GPK to Speed up Carry Propagation

Divide the words to be added into bit groups or blocks
e.g. think about adding 4-bits at a time
A
4
B
4
Addition of each block is a three-step process:
Cout
4-bit adder block

4
Cin
1. Compute bit-wise generate, propagate (& kill) signals Gi = Ai Bi Pi = Ai Bi Ki = Ai Bi 2. Use PG(K) signals and Cin to determine Ci for each bit (and Cout) 3. Calculate sums using Si = Pi Ci
14
Group Addition with PG Logic
15
Manchester Carry
Use transmission gates to provide carry propagation
dynamic static
16
4-bit Manchester Carry Logic
R/2 9C
R/2 9C
R/2 9C
R/2 9C
17
Delays in Manchester Chain

R/2 9C R/2 9C R/2 9C R/2 9C
Using Euler, delay (after n stages) = (9/4).n(n+1)RC Delay increases quadratically with n
n total delay delay of extra stage 1 4.5 RC 2 13.5 RC 9 RC 3 27 RC 13.5 RC 4 45 RC 18 RC
Better to add a couple of inverters after 3-4 bits

makes overall delay linear in n
18
Manchester Carry Stick Layout

Propagate/Generate Row VDD Pi Ci - 1 GND Inverter/Sum Row Gi Ci Pi + 1 Gi + 1 Ci + 1
19
Carry-Bypass Adder
Cout
Cin
BP = P0.P1.P2.P3 Cin Cout
If (P0 and P1 and P2 and P3) then Cout = Cin Otherwise use PG within the block In an large adder with many blocks, BP is set up well before Cin arrives Also known as Carry-Skip Adder
20
Carry-Bypass Manchester Block
C1
C2
C3
21
Carry-Bypass Critical Path

Bit 03 Setup tsetup Bit 47 Setup tbypass Bit 811 Setup Bit 1215 Setup
Carry propagation
Carry propagation
Carry propagation
Carry propagation
Sum M bits
Sum
Sum
tsum
Sum
td
ripple by-pass
If we have N bits, M bits/block, N/M blocks, worst case delay is

4-8
N
22
tadder = tsetup + M.tcarry + (N/M-1).tbypass + (M-1).tcarry + tsum
Carry-Select Adder
For each M-bit block: Calculate block carries for both Cin=0 and Cin=1 Then when Cin finally arrives, use multiplexer to select correct result PG Setup 0 1 Co,k 0 Carry Propagation 1 Carry Propagation Multiplexer
Carry Vector
Co,k+M
Sum Generation
23
Carry-Select Adder Critical Path
worst case delay is:
tadder = tsetup + M.tcarry + (N/M).tmux + tsum
24
Linear Carry-Select Adder

Lets look at worst case delays in adder with N=16, M=4 Assume tfull-adder = tmultiplexer = 1
For last block, output of 0 and 1 carry sections arrive well before multiplexer select signal from previous block 25
Square Root Carry-Select Adder

By making blocks of increasing length, we can perform more carry calculations while waiting for the multiplexer select signal
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13 Bit 14-19
Setup (1) "0" (1) "1" (3) "1" Carry (3) (4) Multiplexer Ci,0 Sum Generation S0-1 "0" Carry "0"
Setup
Setup
Setup
"0" Carry
"0"
"0" Carry
"0"
"0" Carry
"1" Carry "1" (4) (5) Multiplexer "1"
"1" Carry "1" (5) (6) Multiplexer
"1" Carry (6) (7) Multiplexer Mux (8) (7)
Sum Generation S2-4
Sum Generation S5-8
Sum Generation S9-13
Sum S14-19 (9)
tadder = tsetup + M.tcarry + (2N ).tmux + tsum

26
Carry-Select Adder: Delay Comparisons

Square root select particularly effective for large N (e.g. 64-bit)
27
Tree Adders
For wide adders (N>32 bits) delay of carry lookahead (bypass or select) adders is dominated by delay of passing carry through the lookahead stages (multiplexers). This delay can be reduced by recursively looking ahead across lookahead blocks, e.g.
lookahead across 2-bit blocks to generate Cin to 4-bit blocks lookahead across 4-bit blocks to generate Cin to 8-bit blocks, etc.
Delay can O(log N)
(at expense of area and power!)

9 8 7 6 5 4 3 2 1 0
15 14 13 12 11 10
PG generation
15:14
13:12
11:10
9:8
7:6
5:4
3:2
1:0
15:12
11:8
7:4
3:0
PG logic G logic Buffer
15:8
7:0
e.g. Brent-Kung Adder

11:0 13:0 9:0 5:0
15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0
Sum calculation
28
Subtraction
A B = A + (-B) -B = NOT(B) + 1
(where B is twos complement of B)
29
Unsigned Multiplication
Example: 1100 X 0101 1100 0000 1100 0000 :1210 : 510
multiplicand multiplier
partial products
0 0 1 1 1 1 0 0 :6010
product
M x N-bit multiplication Produce N M-bit partial products Sum these to produce (M+N)-bit product
30
Array Multiplier
=
31
Array Multiplier Critical Path
tmult = (M+N-3).tcarry + N.tsum + tAND

32
Carry Save Multiplier
=
33
Carry Save Multiplier Critical Path
Fast Adder
tmult = (M+N-2).tcarry + 2.tsum + tAND OR tmult = (N-1).tcarry + tfast_adder + tsum + tAND

34
CSA Multiplier Compact Layout
35
Twos Complement (Signed) Multiplication

In twos complement representation: = 1 . 21 + 2
2 =0 =0 2
= . = 1 . 21 + 2 . 1 . 21 + 2
=0
= . . 2+ +
=0 =0 2 =0
2 2
unsigned (N-1)x(M-1) multiply product of MSBs

2 =0
1 . 1 . 2+2
. 1 . 2+1 + 1 . . 2+1
two terms to be subtracted

36
Baugh-Wooley Partial Products

Subtraction of these terms is accomplished by adding twos complement, i.e. by adding (term +1)
37
Baugh-Wooley Multiplication Array
38
Modified Baugh-Wooley Multiplier

Simply replace AND gate in these cells with NAND gate
and set two of the carry-in constants to 1
multiplier cell with AND gate multiplier cell with NAND gate
full adder
39
Faster Multipliers
Multiplication is key element in many DSP applications
Digital Filters Transforms Modulation & Correlation
Many architectures have been proposed to speed up multiplication

radix-4 Booth encoding Wallace tree Compressor trees Pipelining Various combinations of above
Each starts by examining critical path and looking for ways to short-circuit computation Each provides improved speed at cost of area & power
40

Lecture 12 - CpE 690 Introduction To VLSI Design

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lecture 12 - CpE 690 Introduction To VLSI Design

Hochgeladen von

Copyright:

Verfügbare Formate

CpE 690: Digital System Design

Lecture 12 Design of Arithmetic Circuits

Generic Digital Processor

Bit 2 Bit 1 Bit 0

Single Bit Addition: Full Adder

Full Adder Design I

Ripple Carry Adder

Full Adder Design II

If we could eliminate output inverters

Full Adder Design II - Layout

Full Adder Inversion

Eliminate inverters and use inverting full adder A3 B3 A2 B2 A1 B1 A0 B0

Full Adder: Design III (Mirror Adder)

Transistors placed & sized to minimize carry propagation

Mirror Adder Layout

Can build wide transistors without affecting bit pitch

Using GPK to Speed up Carry Propagation

Addition of each block is a three-step process:

4-bit adder block

Group Addition with PG Logic

4-bit Manchester Carry Logic

Delays in Manchester Chain

Better to add a couple of inverters after 3-4 bits

Manchester Carry Stick Layout

BP = P0.P1.P2.P3 Cin Cout

Carry-Bypass Manchester Block

Carry-Bypass Critical Path

If we have N bits, M bits/block, N/M blocks, worst case delay is

tadder = tsetup + M.tcarry + (N/M-1).tbypass + (M-1).tcarry + tsum

Carry-Select Adder Critical Path

worst case delay is:

tadder = tsetup + M.tcarry + (N/M).tmux + tsum

Linear Carry-Select Adder

Square Root Carry-Select Adder

"1" Carry "1" (4) (5) Multiplexer "1"

"1" Carry "1" (5) (6) Multiplexer

"1" Carry (6) (7) Multiplexer Mux (8) (7)

Sum Generation S2-4

Sum Generation S5-8

Sum Generation S9-13

Sum S14-19 (9)

tadder = tsetup + M.tcarry + (2N ).tmux + tsum

Carry-Select Adder: Delay Comparisons

Delay can O(log N)

(at expense of area and power!)

PG logic G logic Buffer

e.g. Brent-Kung Adder

Array Multiplier Critical Path

tmult = (M+N-3).tcarry + N.tsum + tAND

Carry Save Multiplier

Carry Save Multiplier Critical Path

tmult = (M+N-2).tcarry + 2.tsum + tAND OR tmult = (N-1).tcarry + tfast_adder + tsum + tAND

CSA Multiplier Compact Layout

Twos Complement (Signed) Multiplication

unsigned (N-1)x(M-1) multiply product of MSBs

two terms to be subtracted

Baugh-Wooley Partial Products

Baugh-Wooley Multiplication Array

Modified Baugh-Wooley Multiplier

Many architectures have been proposed to speed up multiplication

Das könnte Ihnen auch gefallen