You are on page 1of 9

International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 59

Sarita Chouhan1, Rajasthan Technical University, Kota, Rajasthan India
Yogesh Kumar2, Rajasthan Technical University, Kota, Rajasthan India


There are different entities that one would like to optimize
when designing a VLSI circuit. The design of an efficient
integrated circuit in terms of power, area, and speed
simultaneously, has become a very challenging problem.
Power dissipation is a critical parameter in modern VLSI
design field. Multiplication occurs frequently in finite
impulse response (FIR) filters, fast Fourier transforms,
discrete cosine transforms, convolution, and to save
significant power consumption of a VLSI design, it is a
good direction to reduce its dynamic power that is the
major part of total power dissipation. Here, we propose
designing of FIR filter using high speed low-power
multiplier adopting the new implementing approach. The
multiplier is designed by a modified Booth Algorithm
which is controlled by a detection unit using an AND gate
and carry save adder. The modified booth algorithm will
reduce the number of partial products generated by a
factor of 2. The carry save adder will avoid the unwanted
addition and thus minimize the switching power
dissipation. The proposed high speed low power
multiplier can attain 30 percent speed improvement and
22 percent power reduction in the modified booth
algorithm when compared with the conventional array
Index key: Multipliers, Modified Booths algorithm,
Spartan-3E FPGA, VHDL
1. Introduction
Multipliers play an important part in todays digital signal
processing (DSP) systems. Examples of their use occur in
implementations of recursive and transverse filters,
discrete Fourier transforms, correlation, range
measurement and in most of these cases it is enough with
a multiplier unit design for specific purpose. Multipliers
have large area, long latency and consume considerable
power. Therefore, low-power multiplier design has been
an important part in low-power VLSI system design. The
main research hypothesis of this work is that high-level
optimization of multiplier designs produces more power-
efficient solutions than optimization only at low levels.
Specifically, we consider how to optimize the internal
algorithm and architecture of multipliers and how to
control active multiplier resource to match external data
characteristics. The primary objective is power reduction
with small area and delay overhead. By using new
algorithms or architectures, it is even possible to achieve
both power reduction and area/delay reduction, which is
another strength of high-level optimization. For these
requirements of smaller area occupation, less power
consumption and faster operation, Booths algorithm is
practically used.
This encoding algorithm is suitable for 2s
complementary and signed number multiplication.
Booths algorithm also requires redundant partial product
generations, so-called sign-extension. In any
multiplication algorithm, the operation is decomposed in a
partial product summation. Each partial product
represents a multiple of the multiplicand to be added to
the final result. Nowadays almost all high-speed
multipliers apply a radix-4 recoding multiplication
algorithm. In a radix-2 algorithm, first we make a series
of products between the multiplicand, Y, and every bit of
the multiplier, X, generating in this way a set of words
called partial products. Next, all the partial products are
added. We use some kind of redundant arithmetic to get
the additions as fast as possible.
Usually the speed is increased with a Wallace reduction
tree . In the conventional Wallace tree, multi-input partial
product bits, at the same bit position, are consecutively
compressed to a final sum and carry signal pair by using a
series of single-bit full adders (also called 3-2
compressors). At the output, we have two words (sum and
carry) which have to be added as fast as possible by a
carry-propagate adder (CPA). The Wallace tree structure
is a version of the carry-save adders (CSA). Radix-4
multiplication obtains an improvement in the
multiplication algorithm due to the less number of partial
products entering the Wallace tree to be reduced. This can
be achieved by the application of the multiplier recoding,
changing from a 2s-complement format to a signed-digit
representation from the set.

2. Related Work

A substantial amount of research work has been put into
developing efficient architectures for multipliers given
their widespread use and complexity. Schemes such as
bisection, Baugh-Wooley and Hwang propose the
implementation of a 2s complement architecture, using
repetitive modules with uniform interconnection patterns.
However, it is not permitted an efficient VLSI realization
due to the irregular tree array form used.
International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 60
More regular and suitable multiplier designs based on the
Booth recoding technique have been proposed. The main
purpose of these designs is to increase the performance of
the circuit by the reduction of the number of partial
products. Although the Booth algorithm provides
simplicity, it is sometimes difficult to design for higher
radices due to the complexity to pre-compute an
increasing number of multiples of the multiplicand within
the multiplier unit. In the Modified Booth algorithm
approximately half of the partial products that need to be
added is used. In our work, the improvement in delay and
power has the same principal source as for the Booth
architecture, the reduction of the partial product terms,
while keeping the regularity of an array multiplier. We
show that our architecture can be more naturally
extended for higher radices, using less logic levels and
hence presenting much less spurious transitions.
3. Booth Algorithm

Booths algorithm involves repeatedly adding one of two
predetermined values A and S to a product P, then
performing a rightward arithmetic shift on P. Let m and r
be the multiplicand and multiplier, respectively; and let x
and y represent the number of bits in m and r. [4]
1. Determine the values of A and S, and the initial value
of P. All of these numbers should have a length equal to
(x + y + 1).

(a) A: Fill the most significant (leftmost) bits with the
value of m. Fill the remaining (y + 1) bits with zeros.
(b) S: Fill the most significant bits with the value of (-m)
in twos complement notation. Fill the remaining (y + 1)
bits with zeros.
(c) P: Fill the most significant x bits with zeros. To the
right of this, append the value of r. Fill the least
significant (rightmost) bit with a zero.

2. Determine the two least significant (rightmost) bits of

(a) If they are 01, find the value of P + A. Ignore any
(b) If they are 10, find the value of P + S. Ignore any
(c) If they are 00, do nothing. Use P directly in the next
(d) If they are 11, do nothing. Use P directly in the next
3. Arithmetically shift the value obtained in the 2nd step
by a single place to the right. Let P now equal this new
4. Repeat steps 2 and 3 until they have been done y times.
5. Drop the least significant (rightmost) bit from P. This is
the product of m and r.

Booth algorithm itself can be of two types 1. Radix-2
algorithm and 2. Radix-4 algorithm. These are described
in the later sections.

4. Hardware Architecture
Field Programmable Gate Arrays (FPGAs) can be
reprogrammed as many times in order to achieve the
desired result. The major design benefit in this lies in the
ability to test designs that might work. Prior to the
development of the FPGA, the fabrication process can be
quite expensive and very time consuming. The use of
FPGAs in the design process allows the more design
flexibility, and reducing a cost and developing time. If the
design fails after being tested on a FPGA, the designer
can simply rework the design and download it again to the
FPGA. Use of an FPGA would thus eliminate the loss in
development time caused by a faulty initial design, as
well as giving the designer knowledge of whether or not
the design works.

4.1 Adders

4.1.1 Ripple Carry Adder

This is the simplest type of adder but bot very efficient
when large number of bits are used. Delay increases
linearly with bit length.

Figure 1: Block diagram of Ripple carry adder

International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 61
Figure 2: Simulation Output

4.1.2 Carry Select Adder

Here in this scheme, blocks of bits are added in two ways:
(1) One assuming a carry in of 0'.
(2) Other with a carry in of 1.

Figure 3: Block diagram of Carry select adder

Figure 4: Simulation Output

4.1.3 Carry Look Ahead Adder

It can produce carries faster due to carry bits generated in
parallel by an additional circuitry whenever inputs
change. This technique uses carry bypass logic to speedup
carry propagation.

Figure 5: Block diagram of Carry Look ahead adder

Pi = ai + bi
Gi = aibi
Si = ((ai xor bi) xor ci)
Ci+1 = Gi +PiCi

Figure 6: Simulation Output

4.1.4 Sixteen Bit Full Adder

It is just simply a 16- bit full adder in which we have two
16 bit input with one carry in and a 16 bit sum output with
a single bit carry out.

Figure 7: Simulation Output

International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 62
4.1.5 Carry Save Adder

First we compute the sum ignoring any carries and
separately we can compute carry on a column by column
basis. Now sum can be computed by addition 's' and 'c' in
final stage of addition.

Figure 8: Block Diagram of Carry Save adder

Figure 9: Simulation Output

4.1.6 Carry Skip Adder

It is used to speed-up operation, propagation is skipped to
position i without waiting for ripp-ling operation time
varies according to operands as in carry-complete
addition to implement carry-skip adder, stages are divided
into blocks.
Carry-skip logic is added to each block to detect when
carry-in the block can be passed directly to the next block.

Figure 10: Block Diagram of Carry Skip Adder

Figure 11: Simulation Output

4.2 Multipliers

4.2.1 Array Multiplier

A binary multiplier is an electronic hardware device used
in Digital Electronic or a computer or other electronic
devices to perform rapid multiplication of two numbers in
binary representation. It is built up by using binary adder.

Figure 12: Block Diagram of Array Multiplier

International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 63

Figure 13: Simulation Output

4.2.2 Booth Multiplier

Booth multiplication is a technique that allows for
smaller, faster multiplication circuits, by recoding the
numbers that are multiplied. It is the standard technique
used in chip design, and provides significant
improvements over the long multiplication technique.

The booth multiplier is widely used in ASIC- oriented
products due to the higher computing speed and smaller
area. This encoding techniques has two advantages : a)
only about half of the product are needed during the
computation, that is, the number of partial products is
reduced by a factor 2 : b) delay on the critical path is less
than that of the Baugh- Wooley Multiplier. Radix 2 Multiplier

This is technique that allows for smaller, faster
multiplication circuit by recoding the numbers that
are multiplied.
It allows only half of product which is needed during
computation that is no. Of partial products is reduced by
factor 2.It has two major drawback:

1) As no of add/subtraction operations become variable
which is inconvenient for parallel multiplier.

2) It is inefficient when there are isolated 1's.

Figure 14: Block diagram of Radix 2 multiplier

Figure 15: Simulation output Radix 4 Multiplier

These multiplication schemes handle more than one bit of
the multiplier in each cycle. A higher representation radix
leads to fewer digits. Thus, a digit-at-a time multiplication
algorithm requires fewer cycles as we move to higher
radices, which means fewer partial products. The
reduction in the number of cycles, along with the use of
recoding and carry-save adders, leads to significant gains
in speed over basic multipliers. [6]

Four decades ago. MacSorley proposed a modification of
Booths algorithm a decade after. The modified Booths
algorithm (radix-4 recoding) starts by appending a zero to
the right of x0 (multiplier LSB). Triplets are taken
beginning at position x -1 and continuing to the MSB with
one bit overlapping between adjacent Triplets. If the
number of bits in X (excluding x -1) is odd, the sign
(MSB) is extended one position to Ensure that the last
triplet contains 3 bits. In every step we will get a signed
digit that will multiply the multiplicand to generate a
partial product entering the Wallace reduction tree. The
meaning of each triplet can be seen in figure:

Table 1: Radix 4

This recoding scheme applied to a parallel multiplier
halves the number of partial products so the multiplication
time and the hardware requirements decrease. [2] Modified Radix 4 Multiplier using Carry
Save Adder

International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 64
One of the major speed enhancement techniques used in
modern digital circuits is the ability to add numbers with
minimal carry propagation. Carry save adders are one of
the most widely used techniques for fast arithmetic in
In this work, we use carry save adders in the partial
product lines of the new multiplier proposed in in order to
speed-up the carry propagation along the array. In this
multiplier, a new approach is used to handle operands in
2s-complement with exactly the same structure as an
array multiplier, with the same unsigned bit products for
all the bits except those that involve a sign bit. The
regularity of this multiplier makes it suitable for the
application of carry save adders, since the ability of it to
combine three or four numbers to two, in a time that is
independent of the width of the numbers, is a much more
efficient alternative than using traditional ripple carry
Carry-save adders (CSA) can be used to reduce the
number of addition cycles as well as to make each cycle
faster. A row of binary FA is used as a mechanism to
reduce three numbers to two numbers, rather than finding
a single sum A carry save adder is very fast because it
simply outputs the carry bits instead of propagating them
to the left. As will be presented in the next section, we
apply carry save adders in the partial product lines of an
array multiplier circuit in order to speed-up the carry
propagation along the array. [3]

Figure 16: Radix 4 Using Carry Save Adder

Figure 17: Simulation Output
Considering the multiplication of two 2 s-complement
integers with n-bit multiplicand A and n-bit multiplier B

P1 denotes the I-th output product bit. Note that a
, and b
indicate data bits of multiplicand and (i multiplier,
respectively. Assume n is even and the n-bit multiplier B
can be rewritten as

where b-1=0 . Note that the terms in the bracket in Eq (4)
have values of (-2, -1, 0, 1, 2). Each recoded value
performs a certain operation on the multiplicand A, and
then the multiple additions at each stage would be
required in order to generate the correct partial product. It
is worth mentioning that the operation of -A can be
realized by the inversion of the multiplicand and addition
of I at the least significant bit. Substituting Eq. (4) into
Eq. (1), we can obtain Eq. (5) as

and it is known that the scanning of triplets begins from
b-I to the MSB with one-bit overlapping. Thus, only the
number of n/2 partial-product rows needs to be computed.

International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 65
4.2.3 Wallace Tree Multiplier

A Wallace tree is an efficient hardware implementation of
a digital circuit that multiplies two integers. The WT
multiplier sums up all the bits of the same weights in a
tree rather than completely adding the partial products in
pairs. Full adder (FA) and Half adder (HA) cells are used
to add three or two equally weighted bits respectively to
produce two bits: the sum bit with a weight equal to that
of the
operands and the carry bit with a weight equal to one
more than that of the operands. The height of the WT is
reduced by a factor of 3:2, whenever a FA is used. The
final tree is composed of as many levels of FA and HA
cells as are necessary to reduce the height of the tree to 2.
The hardware synthesis process for a WT multiplier
mainly consists of two steps. The first step is to arrange
the partial product bits as the initial WT structure, as
shown in Fig. 2 for the case of a 4x4 multiplier with
operands (a3; a2; a1; a0) and (b3; b2; b1; b0). Secondly, a
series of FA and HA transformations are applied on the
WT structure until the tree height is reduced to 2. At this
point, any n-bit conventional adder may be used to add
the remaining two n-bit rows of the tree to get the final
multiplication result.

Figure 18: Block Diagram of Wallace Tree Multiplier

Figure 19: Simulation Output

4.3 FIR Filter

A FIR filter provides variable length taps have been
widely used in many application fields. It is memory chip
in which an address generation unit & modulo unit to
access memory in a circular manner. A simple FIR filter
is described by a convolution operation.

Figure 20: Block Diagram of FIR Filter

Difference equation

Where, B
is the set of filter coefficients.

5. Simulations and Result

In this work we are evaluating the performance of the
proposed FIR filter using low power consumption
multiplier by comparing Radix 4 multiplier with the
different multipliers. These multipliers can be
implemented using VHDL coding. In order to get the
power report and delay report we are synthesizing these
multipliers using Xilinx and Modelsim. Simulation result
for the FIR filter using array multiplier are given in figure

Figure 21: Simulation Output Of FIR Filter

The comparison of synthesis report for different adders
are given in below Table 2
International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 66

Table 2: Analysis of adders

16 bit

e carry





No. of
18 18 16 18 16 16
No. of
4 input
31 32 28 32 32 32
No. of
49 50 42 81 32 32
No. of
49 50 42 81 32 32



10.448 12.9

y (in






( in W)
0.144 0.140 0.144 0.151 0.13
4.742 4.838 4.095 1.583 1.79

The comparison of synthesis report for different
multipliers are given in below Table 3

Table 3: Analysis of multipliers



Radix 2

Radix 4


No. of slices 76 55 14 192
No. of 4
input LUTS
133 99 24 384
No. of
32 34 27 90
No. of
32 34 27 90
32.237 18.726 13.798 286.487
Memory (in




(in Watt)
0.143 0.140 0.143 0.155
Power delay
4.609 2.621 1.973 44.405

6. Conclusion

Here, we proposed a novel design for a FIR filter using
Parallel Multiplier with carry save adder (Modified Booth
Algorithm. The implementation of the algorithm with an
architecture and logic design is presented where in the
Speed, Power and complexity of the design are compared
to other designs. Also, the proposed design is an area
efficient multiplier useful in decreasing area consequently
reduces the cost. The delay encountered is reduced and
the processing speed is increased than those obtained in
other conventional techniques. The proposed activity
evaluation method leads to consumed low power
estimations and very fast estimation times.
At the end we calculate and compare the power delay
product of various adders and multipliers and concluded
that for the minimal power design of FIR filter Radix-4
multiplier using carry save adder should be used as the
power delay product of carry save adder and Radix-4
multiplier is minimum as compare to other adders and


[1] A. Tisserand, Automatic generation of low-power
circuits for the evaluation of polynomials, in Proc. 40

Asilomar Conference on Signals, Systems and
Computers. Pacific Grove, California, U.S.A.:IEEE, Oct.
2006, pp. 20532057.

[2] Zhijun Huang,High level optimization techniques for
low power multiplier design University of California, los
angels, 2003.

[3] L. Ciminiera, P. Montuschi, Carry-Save
Multiplication Schemes Without Final Addition, IEEE
Transaction on Computer, vol. 45, no. 9, Sep. 1996.

International Journal of Advanced Technology & Engineering Research (IJATER)

ISSN NO: 2250-3536 VOLUME 2, ISSUE 2, MAY2012 67
[4] A. D. Booth, A signed binary multiplication
technique, Quart. J. Mechanical and Applied Math.,
vol.4, pp. 235240, 1951.

[5] W.-C. Yeh and C.-W. Jen, High-speed booth encoded
parallel multiplier design, IEEE Transactions on
Computers, vol. 49, pp. 692701, 2000.

[6] B. S. Cherkauer, E. G. Friedman, A Hybrid Radix-
4/Radix-8 Low Power Signed Multiplier Architecture,
IEEE Transaction on Circuits and Systems, vol.44, no. 8,
Aug. 1997.

[7] H. Lee, A power-aware scalable pipelined Booth
multiplier, in Proc. IEEE Int. SOC Conf., 2004, pp.123-

[8] Tisserand, Low-power arithmetic operators, in Low
Power Electronics Design, C. Piguet, Ed. CRC Press,
Nov. 2004, ch. 9.

[9] Nagendra, M. J. Irwin, and R. M. Owens, Area-
timepower tradeoffs in parallel adders, IEEE
Transactions on Circuits and Systems-II: Analog and
Digital Signal Processing, vol. 43, no. 10, pp. 689-702,
Oct. 1996.

[10] Z. Huang and M. D. Ercegovac, High-performance
low- power left- toright array multiplier design, IEEE
Transactions on Computers, vol. 54, no. 3, pp. 272- 283,
Mar. 2005.

[11] P. F. Stelling, C. U. Mattel, V. G. Oklobdzija, and R.
Ravi, Optimal Circuits for Parallel Multipliers, IEEE
Transactions on Computers, vol. 47(3): 273 - 285, 1998.

[12] W. C. Yeh, and C. W. Jen, High-Speed Booth
Encoded Parallel Multiplier Design, IEEE Transactions
on Computers, vol. 49 (7), pp. 692-701, 2000.

[13] J. A. Gibson and R. W. Gibbard, Synthesis and
Comparison of Twos Complement Parallel Multipliers,
IEEE Transactions on Computers, vol. C-24, pp
10201027, October 1975.

[14] C. S. Wallace, Suggestion for a Fast Multiplier,
IEEE Transactions on Electronic Computers, vol. EC-13,
pp. 1417, 1964.

[15] W. Gallagher and E. Swartzlander. High Radix
Booth Multipliers Using Reduced Area Adder Trees. In
Twenty-Eighth Asilomar Conference on Signals, Systems
and Computers, volume 1, pages 545549, 1994
[16] E. Costa, J. Monteiro, and S. Bampi. A New
Architecture for Signed Radix 2 Pure Array Multipliers.
In IEEE International Conference on Computer Design,
pages 112117, 2002.

[17] K.H. Tsoi, P.H.W. Leong, "Mullet - a parallel
multiplier generator," fpl, pp.691-694, International
Conference on Field Programmable Logic and
Applications, 2005., 2005.

[18] S. Tahmasbi Oskuii, P. G. Kjeldsberg, and O.
Gustafsson, "Transition activity aware design of
reduction-stages for parallel multipliers," in Proc. 17th
Great Lakes Symp. On VLSI, March 2007, pp. 120-125.

[19] Ayman A. Fayed, Magdy A. Bayoumi, "A Novel
Architecture for Low- Power Design of Parallel
Multipliers," vlsi, pp.0149, IEEE Computer Society
Workshop on VLSI 2001, 2001.


SARITA CHOUHAN received her B. Eng. Degree in
Electronics & Communication from Rajeev Gandhi
Proudyogiki Vishwavidyalaya, Bhopal, India in 2002 and
M.Tech degree in VLSI design from Mewar University,
Chittorgarh, Rajasthan, India. Currently she is an
Assistant Prof. in Electronics and Communication
department in Manikya Lal Verma Govt. Textile & Engg.
College, Bhilwara, Rajasthan, India. Her area of interest
are applied electronics, Microelectronics, VLSI, VHDL,
Verilog, EDA, Analog CMOS designing and Low Power
optimization. She has authored textbook: 1. EDA and
Logic Synthesis, S.K. Kataria and Sons, Delhi, India,
2009 and books on VLSI Design and VHDL Design are
under publishing process. She can be reached at
YOGESH KUMAR is a student of final year B.Tech.
Pursuing his Degree in Electronics & Communication
from Rajasthan Technical University, Kota, Rajasthan,
India .His area of interest are in VLSI design and
MATLAB. He presented papers in National and
International conferences on Analog CMOS design using
signal processing and currently working on Image
processing. He can be reached at