Sie sind auf Seite 1von 21




(Design of Low Power High Speed 32 32 Multiplier)

Prepared in partial fulfillment of the course BITS C314

(Lab Oriented Project)



II semester 2005-2006

Table of Contents

1. Introduction


2. The ESA in 2C representation

3. Reducing the switching activity

4. The algorithm and architecture

4.1 Conversion from 2C to SM notation

4.2 Speeding up the PP accumulation


4.3 Converting the RB number into 2C number


4.4 The algorithm and its VLSI architecture




6.1 Multiplier Interface
6.2 Partial Product generator


6.3 First stage of the adder


6.4 Schematic of final stage of adder


6.5 Schematic of one bit adder(sum and carry)


6.6 Layout (in tsmc018)









I would like to express my sincere gratitude to Dr. D. Sriram, Instructor In-charge Lab
oriented Project Bits C314, for providing me an opportunity to work in the methodology
of research, for cultivating a logical and creative thinking and for making me express my
findings in the form of a scientific report.
I would also like to express my gratitude to Dr.(Mrs.)Anu Gupta, Assistant
professor, EEE Group, for giving me an opportunity to work under her guidance. The
work under her supervision, gave me an opportunity to comprehend my subject
knowledge and apply it to the given problem.
Last but not the least; I would like to thank Mr. Pawan Sharma, for allowing me
to use the various tools in OYSTER LAB.


A low power multiplication algorithm and its VLSI architecture using a mixed number
representation is proposed. The reduced switching activity and low power dissipation are
achieved through the Sign-Magnitude (SM) notation for the multiplicand and through a
novel design of the Redundant Binary (RB) adder and Booth decoder. The high speed
operation is achieved through the Carry- Propagation-Free (CPF} accumulation of the
Partial Products (PP) by using the RB notation. Analysis showed that the switching
activity in the PP generation process can be reduced on average by 90%. Compared to the
same type of multipliers, the proposed design dissipates much less power and is 18%
faster on average

1: Introduction

It has been shown that by the use of the SM notation for the multiplicand, the use of
Twos Complement (2C) representation for the multiplier, and the use of RB
representation for the PP accumulation, the Expected Switching Activity (ESA), and
therefore the power dissipation, can be significantly reduced. The ESA reduction occurs
any time the negation of the multiplicand is needed in order to generate the PPs upon the
radix-4 Booths algorithm. High speed operation is sustained through the RB notations
for accumulating the PPs, since a CPF addition can be executed with RB numbers. The
inputs and outputs of the multiplication unit are assumed to be in 2C notation. It is
interesting to point out the fact that although the proposed algorithm and its VLSI
architecture is complex in terms of the number conversions, it is more energy efficient
and has an operating speed close to the Wallace tree architecture and faster than the other
proposed multipliers.

2: The ESA in 2C representation

2C numbers and the radix-4 Booths algorithm are predominantly used for multiplier
design, since the arithmetic operations can be easily carried out with 2C numbers and the
Booths algorithm can largely reduce the number of PPs. But, the Booths algorithm
often requires the negation of the multiplicand, and the negation of a 2C number requires
many bits to be switched which results in high switching activity. Without losing
generality, the radix-4 Booths algorithm can be used to demonstrate the probability of
the negation of the multiplicand to be generated and how many bits on average have to be
switched. This would give the ESA during the PP generation.
As shown in Table I, the radix-4 Booths algorithm requires -Y and 2Y, where Y is the multiplicand. For 2 C representation -Y = Y + 1, and, to generate -Y
given Y, all the bits of Y have to be switched and then the 1 be added to get the correct
2C result. The same operations are needed to generate -2Y, except a left shift is needed
before the bit complementation takes place. The negation process is highly energy
consuming, as it requires the charging and discharging of all the nodes associated with
the PP. Indeed, let an n-bit multiplier be X=xn-1,xn-2.x1,x0 according to Table I,
where k = 0, 1, . . . . [(n-1)/2] and x -1=0. So, it scans 3 bits for one PP with one bit
overlap between two adjacent triplets. If n is odd, then the largest index 2[(n-1)/2]+1=n

Therefore, an extra bit x n= x n-1 (sign extension) must be appended to the left of
x n-1 to make the triplet x n , x n-1,x n-2 .If n is even, then the largest index 2[(n-1)/2]+1=n-1
Therefore, multiplier X can be exactly grouped into n / 2 triplets and no sign extension is
needed. For parallel multiplication, all triplets can be scanned at the same time.

From Table I, when the radix-4 Booths algorithm catches the multiplier patterns l l 0,
101 and l 0 0, it has to generate -Y or -2Y. These patterns, which will be referred to as
the NEG - the negation patterns hereafter - are directly related to the ESA in the Booth
PP generator. The average probability of a NEG patterns to occur in any given triplet
x2k+1, x2k, x2k-1 of the multiplier can be analyzed as follows.
Assume an n-bit 2C number X=xn-1,xn-2.x1,x0 and the probability of being 1 for
each bit of the multiplier is 0.5.
Case 1: n is even, (n-1)/2 = (n-2)/2. Therefore n/2 triplets are needed to cover all the bits
of the multiplier and the sign extension is not needed. For x1x0, since the Booths
algorithm assumes bit xel to be always zero, there are only four choices for the triplet x1
xo x-1: 000, 010, 100 and 110. Two of them are NEGs. Hence, the probability of a NEG to
appear in x1x0 and x-, positions is l/2. For the remaining (n-2) bits, each triplet (x2k+l,
2k, 2k-l) has 8 possible patterns and 3 of them are NEGs. So, the probability of a NEG to
appear in the remaining (n-2) bits is 3/8. Therefore, the average probability of a NEG that
may appear in a triplet x2k+1, x2k , x2k-1 is

Case 2: n is odd, [(n-1)/2] = (n-1)/2. Therefore, number must be sign extended and
(n+1)/2 triplets are needed to cover all the bits of the multiplier. Based on the sign
extension rule, the triplet x,x,-,x,,-~ has four possible patterns: 000, 001, 110, 111.
Among them there is just one NEG. So, the probability of a NEG to occur in triplet x n, x

x n-2, is l/4. For x1x0, same as the case when n is even, the probability of a NEG to

occur in the triplet x,x0x-, is l/2. For the remaining (n-3) bits, the probability Of a NEG to
occur in a triplet x2k+lx2kx2k-1 is 3/8. Therefore, the average probability of a NEG that may
appear in a triplet x2k+lx2kx2k-1 is :

Combining cases 1 and 2, the average probability for a NEG to appear in triplet
x2k+lx2kx2k-1 is

Since, for 2C numbers -Y = y+l and the generation of Y requires the complementation
of every bit of Y, the ESA in the PP generation process is:

On the average, the ESA in the partial product generation process is about 0.40. This
results in a large power dissipation!

3: Reducing the switching activity

Clearly, the high switching activity in the Booth PP generator is caused by the generation
of-Y and -2Y and the fact that the 2C representation is chosen for the multiplicand Y.

The latter holds as the negation of a given 2C number is equivalent to the

complementation of all its bits and then adding 1. On the other hand, the negation of a
SM number is simple just complementing the sign bit. Hence, if one uses the SM
representation instead of 2C for the multiplicand Y, a significant reduction of ESA
during the Booth PP generation process should be expected. Consequently if SM
representation is used for the multiplicand Y, yet keeping the multiplier X in the 2C
form. The correctness of the radix-4 Booths algorithm applying to this mixed number
representation can be proved as follows: the radix-4 Booths algorithm gives correct
results when applied to 2C numbers and the validity of the Booth coding results depends
exclusively on the pattern of the multiplier. Since the multiplier is kept in 2C notation,
the radix-4 Booths algorithm stands valid for mixed number representation.
Now, let us evaluate the ESA of SM numbers. Since the multiplier is
in its 2C form, the average probability of a NEG pattern to appear in any triplet
x2k+lx2kx2k-1, of an n-bit multiplier is the same as in (4). Also, negation of a SM number is
just to complement the sign bit, therefore, the ESA for SM number in the Booth PP
generation process is:

A comparison of ESA for the SM and 2C number is reported in Table III. The reduction
of the ESA is significant, ranging from 87.5% for 8 bit operands to 98.4% for 64 bit
operands. As the operand length increases, the ESA for the even bit 2C numbers
decreases with the asymptotic value of 318 and the ESA for the odd bit 2C numbers is a
constant value of 3/8. For the SM numbers, the ESA decreases at the rate of 0(1/n) and
asymptotically reaches zero. Thus, for longer operands the ESA reduction and therefore
the power saving is more profound.

4: The algorithm and architecture

4.1: Conversion from 2C to SM notation:
A SM number can be expressed as

and a 2C number can be expressed as

For positive numbers, the 2C and SM notations are identical - no conversion is needed.
For negative numbers, the conversion from 2C to SM can

be implemented by

complementing all the bits except the sign bit yn-1, and adding the 1 to the final result. If
one assumes an uniform distribution of positive and negative numbers, then the
probability that the number has to be converted is 0.5. Although the conversion adds
some delay, it does not offset the power dissipation gain due to the SM representation
for the multiplicand. Indeed, if the multiplicand is in 2C notation one has to execute the
negation process for about 40% of all the PPs needed and the number of the negation
processes increases as the operand length increases, while the conversion from 2C to SM
takes place only once for any operand length. For the add 1 operation, instead of using
an n-bit adder which introduces delay and power overhead we generate a correction term
associated with each PP and then add this correction term to all PPs through the binary
addition tree as shown in Figure 3. In this manner, only one more input for the addition
tree is added while the whole n-bit addition operation is avoided. The correction term can
be generated according to Table IV.

The logic for Cl and C2 is trivial: Cl= yn-1*lY and C2=yn-1*2Y. The block diagram, as
shown in Figure 1, indicates that the 2C-to- SM conversion adds only one inverter delay
or about 0.5 gate delay2 which comes from the complementation operation of the 2C
number. The correction term does not introduce extra power overhead compared to the
traditional 2C implementation, since in the traditional 2C implementation one also
needs a similar correction term generator (adding 1) to generate the negation of the

4.2: Speeding up the PP accumulation

We have substantially reduced the ESA in the PPs generation, but SM numbers
are hard to manipulate for arithmetic operations, since the signs of the operands have to
be identified separately through a sequence of decisions - costing excess control logic,
execution time and power dissipation. On the other hand, the RB numbers are represented
in the form


with digit ri { 1,0,-1), are more suitable for high speed parallel arithmetic computations
[ 1, 61. Due to the redundancy in RB numbers one can perform the CPF addition through
the selection of different numbers for the same value. Hence, we further convert the PPs
into the RB representation. We are adopting the selection rule proposed by Takagi in [l]
to perform CPF addition for the PP accumulation. The rule is shown in Table VI. Let us

give an example. The CPF addition of

is shown in

Figure 2. One can see that, the carry is limited within adjacent digits and there is no
global carry propagation.

The conversion of SM-to-RB can be carried out as following: as the RB representation

uses a digit set of {-1, 0, 1), one needs two bits rilrir to represent one digit ri. If we use a
SM coding to represent a RB digit, that is, rilto represent the sign and rir to represent the
magnitude, we can easily convert a SM number into an RB number. For a SM number
X= xn-1xn-2,xi...xlxo, the sign of the number is decided by the sign bit xn-1 .Therefore, we


can group the sign bit xn-1, with all the rest bits in a pair by pair fashion, (xn-1,xn-2),( xn1,xn-3),...,(

xn-1,x1),( xn-1,x0), and interpret the pairs according to the SM coding rule

shown in Table V. Clearly, we do not need any operations except some wiring.

4.3: Converting the RB number into 2C number

The summation of the PPs is in RB form and it has to be converted back into 2C
form. This conversion is carried out easily in the following manner: from Table V, every
digit xRBi = (rilrir) of the RB number XRB, is composed of two bits. The left bit ril
represents the sign and the right bit rir represents the magnitude. One can easily form a
number XRB+ from the positive digits of XRB, and form another number XRB-, from the
negative digits of XRB. Then, subtracting XRB+ from XRB-, one can get the result in the
2C form. The process can be implemented using a fast adder. Since a fast adder is
essential for all the multiplication algorithms to carry out the final result, the RB-to-2C
conversion does not introduce any extra overhead.

4.4: The algorithm and its VLSI architecture

Step 1: Convert the multiplicand from 2C into the SM representation and keep the
multiplier in 2C form.
Step 2: Apply the radix-4 Booths algorithm to generate all the PPs represented in SM
Step 3: Convert all the partial products from SM into RB representation.
Step 4: Sum up all the PPs through a RB adder tree.


Step 5: Convert the final result from RB into 2C notation.

The corresponding VLSI architecture for the algorithm is shown in Figure 3. It is
composed of two major parts: the PP generator and the redundant binary addition tree.
The key components in this architecture are: the RB adder in the
addition tree and the Booth decoder in the PP generator.


RESULTS ( Snapshots of the RTL schematic)

1) Multiplier Interface.

2. Partial Product generator


3) First stage of the adder


4)Schematic of final stage of adder


5)Schematic of one bit adder(sum and carry)


6. Layout (in tsmc018)




This architecture has been chosen keeping low power as main objective. All the stages in
above architecture have been coded in VERILOG HDL. In implementation special care
has been taken to meet our objective. All the modules involved are verified functionally.
After testing logic synthesis has been carried out. From logic synthesis delay involved in
each stage has been calculated. Whole design has been synthesized in tsmc018
technology and the delay obtained is around 16 ns. Further semi custom layout of the
design has been done in Autocell.

Power vs delay optimization is the main aim of all the designs, various techniques can be
applied for achieving it. Since the whole design is modular wherein single one bit adder
has been repeated for the whole adder tree, optimization of this adder can increase the
speed. For this transmission gate designs can be further exploited and fast RB adder can
be designed, which could not be done here because of technology library constraints.
Various circuits level power reduction techniques can also be applied to further reduce
the power consumption.



[l] N. Takagi, et al, High-Speed VLSI Multiplication Algorithm with a Redundant

Binary Addition Tree, IEEE Trans. on Computers, Vol.C-34, No.9, pp.789-796,
September 1985.
[2] H.Makino, et al, A 8.8-ns 54x54-bit Multiplier Using New Redundant Binary
Architecture, Proceedings of 1993 International Conference on Computer Design,
Cambridge, MA, USA, pp.202-205, October 3-6, 1993.
[3] X.Huang, et al, A High-Performance CMOS Redundant Binary Multiplication-and
Accumulation (MAC) Unit, IEEE Trans. on Circuit and Systems-I: Fundamental Theory
and Applications, Vo1.41, No.1, pp.33-39, January 1994.
[4] C.Wallace, A Suggestion for a Fast Multiplier, IEEE Trans. on Electronic
Computer, Vol.EC- 13, pp. 14- 17, February 1964.
[5] L. P. Rubinfield, A Proof of the Modified Booths Algorithm for Multiplication,
IEEE Trans. on Computers, Vo! C-24, No.10, pp.1014-1015, October 1975.
[6] A. r\vizienis, Signed-Digit Number Representations for Fast Parallel Arithmetic,
IRE Trans. on Electronic Computer, Vol.EC-10, pp.389-400, September, 1961.
[7] N.Weste and K.Eshraghian, Principles of CMOS VLSI Design: A System
Perspective, 2nd Edition, pp. 555, Addison-Wesley Publishing Company, 1993.