Sie sind auf Seite 1von 40

CpE 690: Digital System Design

Fall 2013

Lecture 12 Design of Arithmetic Circuits


Bryan Ackland Department of Electrical and Computer Engineering Stevens Institute of Technology Hoboken, NJ 07030

Adapted from Lecture Notes, David Mahoney Harris CMOS VLSI Design

Generic Digital Processor


Arithmetic Unit
bit-sliced datapath adder, multiplier, shifter, comparator etc.
Memory

Memory
RAM, ROM registers, FIFO etc.
I/O Control

Control
Finite state machine PLA, random logic
Datapath

Interconnect
switches arbiters bus
2

Bit-Sliced Datapath
Datapath (or ALU) may consist of number of arithmetic units components that operate on uniform width data words (e.g. 32-bit)
Arithmetic components often apply the same operation to each bit in the data word

Bit-Sliced is an efficient physical layout style in which an n-bit datapath built by stacking together n 1-bit data paths
Data buses run (mostly) in the horizontal direction Control runs (mostly) in the vertical direction Control
Bit 3

Multiplexer

Registers

Multiplier

Bit 2 Bit 1 Bit 0

Data-out

Data-in

Logical

Shifter

Adder

Single Bit Addition: Full Adder


S=ABC Cout = MAJ(A,B,C)

A 0 0 0 0

B 0 0 1 1 0 0 1 1

C 0 1 0 1 0 1 0 1

Cout 0 0 0 1 0 1 1 1

S 0 1 1 0 1 0 0 1

A Cout

1 1 1 1

+
S

Full Adder Design I


Direct Implementation of Boolean Equations: SUM = A B C Cout = MAJ(A,B,C)

36 transistors
5

Ripple Carry Adder


Simplest design: cascade full adders
A3 Cout B3 A2 C3 B2 A1 C2 B1 A0 C1 B0

+
S3

+
S2

+
S1

+
S0

Cin

Critical path goes from Cin to Cout Worst case delay is linear in number of bits td (N-bit adder) = (N-1).tcarry + tsum Need to minimize delay tcarry = delay from C to Cout in each full adder
tsum (delay from A,B,C to S) is negligible for large N
6

Full Adder Design II


A more compact design can be realized by generating S as a function of Cout : S = ABC + (A + B + C).Cout
VDD VDD A B A B Ci A X A Ci A B B VDD A Co B Ci A B VDD Ci A B

28 transistors

Ci S

Ci

If we could eliminate output inverters


simplify design and reduce C to Cout delay
7

Full Adder Design II - Layout


Standard cell style (not bit-slice) layout:

Full Adder Inversion


Full adder is symmetric with respect to signal inversion: A B A B Cout

+
S

Cout

+
S0

Eliminate inverters and use inverting full adder A3 B3 A2 B2 A1 B1 A0 B0

Cout

+
S3

C3

+
S2

C2

+
S1

C1

+
S0

Cin

Full Adder: Design III (Mirror Adder)

24 transistors

Output inverters removed pMOS and nMOS networks are mirror of each other
rather than complimentary simplifies layout enabled by symmetry of the add operation

Transistors placed & sized to minimize carry propagation


at the expense of sum generation
10

Mirror Adder Layout


Bit-slice cell style (not standard cell) layout:

Transistors now run vertically with horizontal poly Data travels from left to right
carry propagates vertically from one bit to the next

Can build wide transistors without affecting bit pitch

11

GPK Representation
Introduce new intermediate signals that describe full adder operation in terms of carry propagation
A 0 0 1 1 B 0 1 0 1 C 0 1 0 1 0 1 0 1 G 0 0 0 1 P 0 1 1 0 K 1 0 0 0 Cout 0 0 0 1 0 1 1 1 S 0 1 1 0 1 0 0 1

G = A B (i.e. generate carry: Cout = 1 independent of C) P = A B (i.e. propagate carry: Cout = C) K = A B (i.e. kill carry: Cout = 0 independent of C) Note that G, P and K are only functions of A and B
dont need to wait for C
12

GPK Representation
Can see the action of generate, propagate and kill operators in mirror adder:
VDD VDD A "0"-Propagate Ci "1"-Propagate A B A Generate B A B Ci A B B B Kill A Co Ci S Ci A B VDD Ci A B

13

Using GPK to Speed up Carry Propagation


Divide the words to be added into bit groups or blocks
e.g. think about adding 4-bits at a time
A
4

B
4

Addition of each block is a three-step process:

Cout

4-bit adder block


4

Cin

1. Compute bit-wise generate, propagate (& kill) signals Gi = Ai Bi Pi = Ai Bi Ki = Ai Bi 2. Use PG(K) signals and Cin to determine Ci for each bit (and Cout) 3. Calculate sums using Si = Pi Ci
14

Group Addition with PG Logic

15

Manchester Carry
Use transmission gates to provide carry propagation

dynamic static

16

4-bit Manchester Carry Logic

R/2 9C

R/2 9C

R/2 9C

R/2 9C
17

Delays in Manchester Chain


R/2 9C R/2 9C R/2 9C R/2 9C

Using Euler, delay (after n stages) = (9/4).n(n+1)RC Delay increases quadratically with n
n total delay delay of extra stage 1 4.5 RC 2 13.5 RC 9 RC 3 27 RC 13.5 RC 4 45 RC 18 RC

Better to add a couple of inverters after 3-4 bits


makes overall delay linear in n
18

Manchester Carry Stick Layout


Propagate/Generate Row VDD Pi Ci - 1 GND Inverter/Sum Row Gi Ci Pi + 1 Gi + 1 Ci + 1

19

Carry-Bypass Adder
Cout

Cin

BP = P0.P1.P2.P3 Cin Cout

If (P0 and P1 and P2 and P3) then Cout = Cin Otherwise use PG within the block In an large adder with many blocks, BP is set up well before Cin arrives Also known as Carry-Skip Adder

20

Carry-Bypass Manchester Block

C1

C2

C3

21

Carry-Bypass Critical Path


Bit 03 Setup tsetup Bit 47 Setup tbypass Bit 811 Setup Bit 1215 Setup

Carry propagation

Carry propagation

Carry propagation

Carry propagation

Sum M bits

Sum

Sum

tsum

Sum

td

ripple by-pass

If we have N bits, M bits/block, N/M blocks, worst case delay is


4-8

N
22

tadder = tsetup + M.tcarry + (N/M-1).tbypass + (M-1).tcarry + tsum

Carry-Select Adder
For each M-bit block: Calculate block carries for both Cin=0 and Cin=1 Then when Cin finally arrives, use multiplexer to select correct result PG Setup 0 1 Co,k 0 Carry Propagation 1 Carry Propagation Multiplexer
Carry Vector

Co,k+M

Sum Generation

23

Carry-Select Adder Critical Path

worst case delay is:

tadder = tsetup + M.tcarry + (N/M).tmux + tsum

24

Linear Carry-Select Adder


Lets look at worst case delays in adder with N=16, M=4 Assume tfull-adder = tmultiplexer = 1

For last block, output of 0 and 1 carry sections arrive well before multiplexer select signal from previous block 25

Square Root Carry-Select Adder


By making blocks of increasing length, we can perform more carry calculations while waiting for the multiplexer select signal
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13 Bit 14-19

Setup (1) "0" (1) "1" (3) "1" Carry (3) (4) Multiplexer Ci,0 Sum Generation S0-1 "0" Carry "0"

Setup

Setup

Setup

"0" Carry

"0"

"0" Carry

"0"

"0" Carry

"1" Carry "1" (4) (5) Multiplexer "1"

"1" Carry "1" (5) (6) Multiplexer

"1" Carry (6) (7) Multiplexer Mux (8) (7)

Sum Generation S2-4

Sum Generation S5-8

Sum Generation S9-13

Sum S14-19 (9)

tadder = tsetup + M.tcarry + (2N ).tmux + tsum


26

Carry-Select Adder: Delay Comparisons


Square root select particularly effective for large N (e.g. 64-bit)

27

Tree Adders
For wide adders (N>32 bits) delay of carry lookahead (bypass or select) adders is dominated by delay of passing carry through the lookahead stages (multiplexers). This delay can be reduced by recursively looking ahead across lookahead blocks, e.g.
lookahead across 2-bit blocks to generate Cin to 4-bit blocks lookahead across 4-bit blocks to generate Cin to 8-bit blocks, etc.

Delay can O(log N)

(at expense of area and power!)


9 8 7 6 5 4 3 2 1 0

15 14 13 12 11 10

PG generation

15:14

13:12

11:10

9:8

7:6

5:4

3:2

1:0

15:12

11:8

7:4

3:0

PG logic G logic Buffer

15:8

7:0

e.g. Brent-Kung Adder


11:0 13:0 9:0 5:0

15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0

Sum calculation

28

Subtraction
A B = A + (-B) -B = NOT(B) + 1
(where B is twos complement of B)

29

Unsigned Multiplication
Example: 1100 X 0101 1100 0000 1100 0000 :1210 : 510
multiplicand multiplier

partial products

0 0 1 1 1 1 0 0 :6010

product

M x N-bit multiplication Produce N M-bit partial products Sum these to produce (M+N)-bit product
30

Array Multiplier

=
31

Array Multiplier Critical Path

tmult = (M+N-3).tcarry + N.tsum + tAND


32

Carry Save Multiplier

=
33

Carry Save Multiplier Critical Path

Fast Adder

tmult = (M+N-2).tcarry + 2.tsum + tAND OR tmult = (N-1).tcarry + tfast_adder + tsum + tAND


34

CSA Multiplier Compact Layout

35

Twos Complement (Signed) Multiplication


In twos complement representation: = 1 . 21 + 2
2 =0 =0 2

= . = 1 . 21 + 2 . 1 . 21 + 2
=0

= . . 2+ +

=0 =0 2 =0

2 2

unsigned (N-1)x(M-1) multiply product of MSBs


2 =0

1 . 1 . 2+2

. 1 . 2+1 + 1 . . 2+1

two terms to be subtracted


36

Baugh-Wooley Partial Products


Subtraction of these terms is accomplished by adding twos complement, i.e. by adding (term +1)

37

Baugh-Wooley Multiplication Array

38

Modified Baugh-Wooley Multiplier


Simply replace AND gate in these cells with NAND gate
and set two of the carry-in constants to 1

multiplier cell with AND gate multiplier cell with NAND gate

full adder

39

Faster Multipliers
Multiplication is key element in many DSP applications
Digital Filters Transforms Modulation & Correlation

Many architectures have been proposed to speed up multiplication


radix-4 Booth encoding Wallace tree Compressor trees Pipelining Various combinations of above

Each starts by examining critical path and looking for ways to short-circuit computation Each provides improved speed at cost of area & power
40

Das könnte Ihnen auch gefallen