Sie sind auf Seite 1von 76

DESIGN OF LOW POWER COMPLEX MULTIPLIER

USING COMPRESSORS

THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE


REQUIREMENT FOR DEGREE OF
MASTER OF TECHNOLOGY IN VLSI DESIGN

BY
Nilay Chandrakant Ghumre
UNDER GUIDANCE OF
PROF. DR. R. B. Deshmukh

Department of Electronics and Computer Science Engineering


Visvesvaraya National Institute of Technology
Nagpur, May 2010
DEPARTMENT OF ELECTRONICS AND COMPUTER SCIENCE
VISVESVARAYA NATIONAL INSTITUTE OF TECHNOLOGY
NAGPUR

Date:

CERTIFICATE

This is to clarify that the thesis entitled “Design of Low Power


Complex Multiplier using Compressors” is bonafied workdone at Visvesvaraya
National Institute of Technology, Nagpur, India by Nilay Chandrakant Ghumre and is
submitted to Visvesvaraya National Institute of Technology, Nagpur, India in partial
fulfillment of degree of Master of Technology in VLSI Design

(Dr. R. B. Deshmukh) (Dr. R. M. Patrikar)


Guide Head of
Department
Department of Electronics and Computer Science Engineering
VNIT, Nagpur, India 440011. May 2010

DECLARATION

I here with submit the thesis “Design of Low Power Complex

Multiplier using Compressors” to Visvesvaraya National Institute of Technology,

Nagpur for degree of Master of Technology in VLSI Design. I carried it out under the

guidance of Prof. R. B. Deshmukh, ( Department of Electronics and Computer Science

Engineering ).

This thesis has not been submitted to any other University/ Institute for

award of any degree or diploma.

Date: Nilay Chandrakant Ghumre

M. Tech, VLSI Design

VNIT, Nagpur, India.


Acknowledgement

I express my sincere gratitude to many people who have helped me and

supported during the project work. Without them I could not have completed the project

on time. I am thankful to my guide, Prof. Dr. R. B. Deshmukh, for his encouragement,

patience and valuable guidance throughout entire project, Prof. Dr. R. M. Patrikar for

their valuable suggestions and the whole VLSI design lab members for their cooperation

and coordination.

I also want to thank my colleagues and friends for their encouragement

while completing this project work, I want to thank my parents, without their emotional

and moral support nothing was possible. Their love and support always encouraged me,

and last but not least I am very thankful to God, who provided me good health and good

people around me.

Nilay Chandrakant Ghumre


ABSTRACT

In High-performance VLSI circuits, the on-chip power densities are playing


dominant role in both static and dynamic conditions due to shrinking device features. The
consumed power is usually dissipated heat, affecting the performance and reliability of
the chip. Complex Multiplier is an arithmetic circuit that is extensively used in DSP and
communication applications like, FFT, Digital Filters etc. For fast circuit implementation,
parallel multiplier is preferred. For large bit-width multiplications, a large number of
adders are required to perform the partial product addition..

Compressors are used to compress partial product addition stages. Higher order
compressors permit the reduction of the vertical critical paths in parallel multiplier
resulting in better speed-power product for the multiplier circuit. Thesis presents a novel
scheme for 16*16 bit multiplier using thirteen different types of compressors. The
scheme is optimized for low power as well as high speed implementation over reported
schemes. It represents low power multiplier design methodology, which counts only
number of 1’s in the partial products.

.
CONTENTS

1. INTRODUCTION
1.1 Introduction
1.2 Complex Number
1.2.1 Operation of Complex Numbers
1.3 Organization of Thesis

2. SURVEY OF COMPLEX MULTIPLICATION


2.1 General rule of Complex Multiplication
2.2 Cases of Multiplication
2.3 Types of Complex Multiplication
2.3.1 Complex Multiplication for Area Efficient
2.3.2 Multiplication of Complex Number using a low power parallel multiplier
2.4 Related Research
2.4.1 Braun Multiplier
2.4.2 Baugh-Wooley Multiplier
2.4.3 Multiplier using Bypassing circuitary
2.4.4 Multiplier using Adder-Subtractor Unit (ASU)
2.5 Signed Number Multiplication
2.5.1 Representation of Negative Numbers
2.5.2 Booth’s Recoding Algorithm
2.5.3 Basic Technique of Booth’s Recoding Algorithm for Radix-2
and Radix-4

3. MULTIPLIER UNIT
3.1 Partial Product Generator
3.2 Different Order Compressors
3.2.1 Adder as Counter
3.2.2 Compressor Logic
3.3 Parallel Adders
3.4 Architecture of Multiplier using Compressors

4. PROPOSED COMPLEX MULTIPLIER


4.1 Unsigned Multiplication
4.2 Signed Multiplication
4.2.1 Modified Technique Recoding Algorithm for Radix-2 and Radix-
4
4.2.2 Modified Booth’s Recoding Unit
4.3 Compressors and Adders

5. RESULTS AND DISCUSSION


5.1 Behavioral Simulation
5.2 Synthesis Report
5.3 Power Calculation
5.4 Layout

6.CONCLUSION AND FUTURE WORK


6.1 Conclusion
6.2 Future work

7. REFERENCES
LIST OF FIGURES
Figure 2.1. OBC-DA based Complex Multiplier structure
Figure 2.2. 4x4 Braun Multiplier
Figure 2.3. 4*4 Bypass Multiplier
Figure 2.4 4*4 ASU Multiplier
Figure 2.5 Adder Subtractor Unit
Figure 2.6: - Smart Adder (SA)
Figure 3.1. Internal Block Diagram of 16*16 Basic Multiplier
Figure 3.2. Partial Product Generator (4 Bit)
Figure 3.3. Half Adder
Figure 3.4. Full Adder
Figure 3.5. Block Diagram of 4:3 Compressor
Figure 3.6. Block Diagram of 5:3 Compressor
Figure 3.7. Block Diagram of 6:3 Compressor
Figure 3.8. Block Diagram of 7:3 Compressor
Figure 3.9. Block Diagram of 8:4 Compressor
Figure 3.10. Block Diagram of 9:4 Compressor
Figure 3.11. Block Diagram of 10:4 Compressor
Figure 3.12. Block Diagram of 11:4 Compressor
Figure 3.13. Block Diagram of 12:4 Compressor
Figure 3.14. Block Diagram of 13:4 Compressor
Figure 3.15. Block Diagram of 14:4 Compressor
Figure 3.16. Block Diagram of 15:4 Compressor
Figure 3.17. Block Diagram of 16:5 Compressor
Figure 3.18. Block Diagram of Parallel Adder
Figure 3.19. Architecture of 8*8 Multiplier using Compressors
Figure 4.1. Block Diagram of Unsigned Complex Multiplier
Figure 4.2. Combinational Logic for intermediate sign
Figure 4.3. Combinational Circuit for output sign Real Part and Imaginary Part
Figure 4.4. Modified Complex Multiplier Block Diagram
Figure 4.5 Block Diagram of Modified Booth’s Recoding unit Multiplier
Figure 4.6 Addition scheme for Radix-2
Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2
Figure 4.8 Addition scheme for Radix-4
Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4

LIST OF TABLES

Table 3.1. Half Adder as a Counter


Table 3.2 Full Adder as a Counter
Table 4.1. Booth’s Recoding algorithm Radix-2
Table 4.2. Booth’s Recoding algorithm Radix-4
Table 4.3 Modified Booth’s Recoding Algorithm Radix-2
Table 4.4 Modified Booth’s Recoding Algorithm Radix-4
Chapter 1.

Introduction
The electronics industry has achieved a phenomenal growth over the last two
decades, mainly due to the rapid advances in integration technologies, large-scale
systems design - in short, due to the advent of VLSI. The number of applications of
integrated circuits in high-performance computing, telecommunications, and consumer
electronics has been rising steadily, and at a very fast pace. Increasing demand for
portable electronics for computing and communication, as well as other applications, has
necessitated longer battery life, lower weight, and lower power consumption. In order to
satisfy these requirements, research activities focusing on low power/low voltage design
techniques are underway. Since 'power' is now one of the design decision variables, the
expanded design space required for low power has further increased the complexity of an
already non-trivial task. Low power design basically involves two concomitant tasks:
power estimation and analysis and power minimization. These tasks need to be carried
out at each of the levels in the design hierarchy, namely, the behavioral, architectural,
logic, circuit and physical levels.[1]
In the survey of the current state of the field, many of the salient power
estimation and minimization techniques proposed for low power VLSI design are
reviewed. For each of the design levels, we provide an overview of several power
estimation and minimization approaches and the CAD tools that support them. Finally,
future research issues are discussed that will be necessary in order to make the low power
design endeavor a successful one. In the majority of digital signal processing (DSP)
applications the critical operations are the multiplication and accumulation. Real-time
signal processing requires high speed and high throughput Multiplier unit that consumes
low power, which is always a key to achieve a high performance digital signal processing
system. The purpose of this work is design and implementation of a low power multiplier
unit with block enabling technique to save power[2].
1.1 Introduction
Sizes of devices are scaling down by Moore Law. The sources of energy
consumption on a CMOS chip can be classified as static and dynamic
power dissipation. The dominant component of energy consumption in
CMOS is dynamic power consumption caused by the actual effort of the
circuit to switch. A first order approximation of the dynamic power
consumption of CMOS circuitry is given by the formula:

P = C*V2*f

Where P is the power, C is the effective switch capacitance, V is


the supply voltage, and f is the frequency of operation. The power
dissipation arises from the charging and discharging of the circuit node
capacitances found on the output of every logic gate. Power
management is the careful planning of power budget for every
subsystem of a VLSI chip. This is especially important issue for today’s
complex systems. The most important and successful use of power
management is to deactivate a portion of circuit when its computation
is not required [3].
Every low-to-high logic transition in a digital circuit incurs a
change of voltage, drawing energy from the power supply. A designer
at the technological and architectural level can try to minimize the
variables in these equations to minimize the overall energy
consumption. However, power minimization is often a complex process
of trade-offs between speed, area, and power consumption. The
current work proposes reduction of dynamic switching power in 16*16
complex multiplier by using higher order compressors to reduce the
switching activity as well as reduction of gate counts.
Multipliers require high amount of power and delay during the partial products
addition. At this stage, most of the multipliers are designed with different kind of adders
that are capable to add two/three or at most 4 bits by using 4-2 compressors. For higher
order multiplications, a huge number of adders or compressors are used to perform the
partial product addition. Binary counter property has been merged with the compressor
property to develop higher order compressors[3] [5].

1.2 Complex Number:-


A complex number is a number comprising a real and imaginary part. It can be
written in the form a + bi, where a and b are real numbers, and i is the standard imaginary
unit with the property i 2 = −1. To construct a complex number, we associate with each
real number a second real number. A complex number is then an ordered pair of real
numbers(a,b).
Complex numbers were first conceived and defined to to find solutions to cubic
equations. The solution of a general cubic equation in radicals (without trigonometric
functions) may require intermediate calculations containing the square roots of negative
numbers, even when the final solutions are real numbers. This ultimately led to the
fundamental theorem of algebra, which shows that with complex numbers, a solution
exists to every polynomial equation of degree one or higher. Complex numbers thus form
an algebraically closed field, where any polynomial equation has a root.
Complex numbers are usually written in the form (A+Bi), where a and b are real
numbers, and i is the imaginary unit, which has the property i 2 = −1. The real number a is
called the real part of the complex number, and the real number b is the imaginary part.
For example, 3 + 2i is a complex number, with real part 3 and imaginary part 2. If,
Z=A+Bi, the real part A is denoted by Re(Z) and imaginary part B is denoted by Im(Z).
The complex numbers (C) are regarded as an extension of the real numbers (R) by
considering every real number as a complex number with an imaginary part of zero. The
real number a is identified with the complex number a + 0i. Complex numbers with a real
part of zero (Re(z)=0) are called imaginary numbers. Instead of writing 0 + bi, that
imaginary number is usually denoted as just bi. If b equals 1, instead of using 0 + 1i or 1i,
the number is denoted as i.
Two complex numbers are said to be equal if and only if their real parts are
equal and their imaginary parts are equal. In other words, if the two complex numbers are
written as a + bi and c + di with a, b, c, and d real, then they are equal if and only if a = c
and b = d.[4] [5]

1.2.1 Operations of Complex Numbers:-


Complex numbers are added, subtracted, multiplied, and divided by formally applying
the associative, commutative and distributive laws of algebra, together with the equation
i 2 = −1. Here,i is the abbreviation of √–1(square root of -1). In other words, i is
something whose square is –1.

i) Addition:-

ii) Subtraction:-

iii)Multiplication:-

iv) Division:-

1.3 Organization of Thesis:-


Chapter 2. “Survey of Complex Multiplication”, in that General rules, Cases and Types
of Complex Multiplication is explained.
Chapter 3. These chapter will explained Basic “Multiplier Unit” using Compressor
technique, in that we explained how to generate partial products, compressor technique
and parallel adder to generate multiplication.
Chapter 4. Explained “Types of Multiplication”. It explains both unsigned and signed
number multiplication.
Chapter 5. “Results and Discussion”, it will explain all behavioral simulation result,
synthesis result and power calculation result for every multiplier.
Chapter 6. “Conclusion and Future Work”, will give conclusion of the thesis and any
future work.

-:References:-
[1] Power Reduction Techniques for Ultra-Low-Power Solutions by Virage
Logic Corporation.
[2] R.M.Badghare, S.K.Mangal, R.B.Deshmukh , R.M.Patrikar (2009), “Design
of Low Power Parallel Multiplier”, Journal of Low Power Electronics, Volume 5,
Number 1, April 2009, 31-39.
[3] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2-
ns16×16-Bit Binary Multiplier Using. High Speed Compressors”,
International Journal of Electrical, Computer, and Systems Engineering,
2009, 234-239.
[4] Conway, John B. (1986), Functions of One Complex Variable I, Springer, ISBN 0-
387-90328-3
[5] K.Z. Pekmestzi, "Complex Number Multipliers" IEE Proceed- ings (Computers and
Digital Technology), Vol. 136, No. 1, 1989, pp. 70-75
Chapter 2.

Survey of Complex Multiplication

In many real-time DSP applications, high performance is a prime target.


However, achieving this may be done at the expense of area, power dissipation and
accuracy. Attempts have been made to use alternative number systems to optimize the
realization of arithmetic blocks, maintaining high performance without incurring
prohibitive area and power increases[1].
Fourier transforms play an important role in many digital signal processing
applications including speech, signal and image processing. However, direct computation
of Discrete Fourier Transform (DFT) requires on the order of N2 operations where N is
the transform size. Parallel-pipelined FFTs are preferred for both high throughput and
low power consumption.
2.1 General rule of Complex Multiplication:-
Consider two complex numbers: (a+bi) and (c+di) ,then

(a+bi).(c+di)=(ac-bd) + (ad+bc)i

(ac-bd) is the Real Part of Complex Multiplication and (ad+bc) is the Imaginary Part of
Complex Multiplication.
Remember that (ac–bd), the real part of the product, is the product of the real
parts minus the product of the imaginary parts, but (ad + bc), the imaginary part of the
product, is the sum of the two products of one real part and the other imaginary part.
The positive value is called the modulus of Z and is denoted as |Z|.

Z=a+bi , then |Z|=

2.2 Cases of Multiplication:-

i) Multiplication of Complex Number with Real Number:-


In the above formula for multiplication, if d is zero, then you get a formula for
multiplying a complex number a+bi and a real number c together:
(a+bi).c = ac + bc i.
In other words, we just multiply both parts of the complex number by the real
number. For example, let us take two numbers (1+2i) and 3 then after multiplication
of these two numbers we get:-
(1+2i).3= 3+6i
Geometrically, when you double a complex number, just double the distance from the
origin, 0. Similarly, when you multiply a complex number z by 1/2, the result will be
half way between 0 and z. You can think of multiplication by 2 as a transformation
which stretches the complex plane C by a factor of 2 away from 0; and multiplication
by 1/2 as a transformation which squeezes C toward 0.

ii) Multiplication of Complex Number with Imaginary Number:-


In the above formula for multiplication, if c is zero, then you get a formula for
multipliying a complex number a+bi and a imaginary number d together:
(a+bi).di = -bd+ad i.
In other words, we just multiply both parts of the complex number by the
imaginary number. For example, let us take two numbers (1+2i) and 3i then after
multiplication of these two numbers we get:-
(1+2i). 3i= -6+3i
2.3 Types of Complex Multiplication
2.3.1 Complex Multiplication for Area Efficient:-
i) Complex Multiplication using LNS [2]:-
Complex Multiplication for Lower Area i.e. to reduce hardware cost of realizing
Complex Multiplier is explained below using Logarithmic Number System(LNS). LNS
based complex multiplier employs correction algorithm. It composed with four real
multipliers, one adder and one subtractor. Attempts have been made to optimize the
realization of the complex multiplier by reducing the number of multipliers and
accumulating the partial products; however, the wider the input, the more partial product
layers that must be added in order to compute the result. To solve this problem, one can
consider the LNS to realize the multiplication as shown in Equations

Xo=AC-BD = log -1(log A + log C) – log -1(log B+ log D)


Yo=BC+AD = log -1(log B+log C) + log -1(log A + log D)

Figure shows the complex multiplier block diagram that is composed from
logarithmic and anti-logarithmic converters and N-Bit Adders. This method can
significantly reduce the hardware to build a multiplier.
LNS provides a simple technique to compute multiplication at the cost of reduced
precision. This approach has limited accuracy.

ii) Complex Multiplier using OBC and DA [3] :-


A well known Area-Efficient method to implement Complex Multiplier is Offset
Binary Coded and Distributed Arithmetic. The structure of Complex Multiplier using
OBC-DA is shown below:-
Figure 2.1. OBC-DA based Complex Multiplier structure[3]

It is formed by the following modules:


a) Two registers that store a W-bits word each (-(cR-cI) and -(cR+cI)), whose outputs are
connected to two multiplexers that are controlled by an XOR of the
input bits.
b) Two shift-accumulators SA to add and shift the multiplexer output.
In this structure a subtraction can happens in each cycle of the computation, as a
difference with the previous one where it only happens during the last cycle. The extra-
bit slide is a bit-serial adder which is needed to complete the two’s complement in any
cycle. Another difference is that SA2 includes hardware for loading the offset
value (Ao) in carry registers.

2.3.2 Multiplication of Complex Number using a low power parallel multiplier:-


The Conventional Technique of Complex Multiplier is given as
(A + Bj) . (C + Dj) = (AC –BD) + (AD + BC )j

It requires four multiplication and two adders . In this technique a different way for the
realization of complex multiplication that reduces complexity of the circuit. The
canonical form of the obtained circuits makes them well suited for VLSI realizations.
Besides circuit reduction, the hardware or software for the control in the realization of the
algorithms is simplified, especially when either of these includes only complex
operations, as in an FET. Each complex bit takes four possible values. Consequently, it
must be represented by two bits. This representation allows the development of
algorithms for operations with complex numbers and the ability to describe these
algorithms in the bit-level. It is natural that these algorithms and the corresponding
circuits have great similarities to those for real numbers in two’s complement form.
Complex Parallel multiplication is the most critical for realization. The parallel multiplier
includes specialized hardware circuitry designed to perform complex multiplication
operations at high speeds. The parallel multiplier requires significantly less die area than
conventionally required, which results in reduced manufacturing costs and reduced power
consumption.[4]

2.4 Related Research:-


In FPGA designs power reduction is possible only through reduced switching
activity, which is also called dynamic power. In general dynamic power consumption is
defined as the power consumed while the clock is running and the external inputs are
switching. In general design practices to reduce switching activity reduction can be
controlled at various levels of the design flow. Architectural decisions in the early design
phases have the greatest impact. For high switching signals, delay balancing and
reduction of the number of logic levels are among the most efficient techniques to tackle
power penalty. An obvious method to reduce the switching activity is to shut down the
idle part of the circuit, which is not in operating condition.
A general M x N parallel multiplier operates by computing the partial products in
parallel and by shifting and accumulating the partial products. Switching activity is
poorly correlated with the input coefficient. In particular, reducing the switching activity
of the component used in the design can minimize the power dissipation i.e. if kth bit of
the coefficient is zero, the kth row of adders need not be activated. However, this type of
multiplier does not help us for reduced switching since there is unnecessarily switching
of adders even if the kth bit is zero.
2.4.1 Braun Multiplier[4][5] :-
a3b0 a2b0 a1b0 a0b0

a3b1 + a2b1 + a1b1 + a0b1

a3b2 + a2b2 + a1b2 + a0b2

a3b3 + a2b3 + a1b3 + a0b3

+ + +

P7 P6 P5 P4 P3 P2 P1 P0
Figure 2.2 4x4 Braun Multiplier
Above figure shows structure of 4*4 Braun Multplier. An n*n bit Braun
Multiplier requires n(n-1) adders and n2 AND gates. In these technique each partial
product can be added to previous sum of partial products by using row of adders. The
Carry-out signals are shifted one bit to the left and then added to the sum of the first
adder which is adition of partial product bits. The shifting of carry-out bits to the left is
done by carry-save adder. As carry bits are passed diagonally downward to the next adder
stage, there is no horizontal carry propagation for the first four rows. Instead, the
respective carry bit is “saved” for the subsequent adder stage.
Braun Multiplier has some drawback that, the number of components required in
building the Braun Multiplier increases quadratically with number of bits. This makes
Braun Multiplier inefficient. The delay of Braun Multiplier is dependent on full adder cell
and also on final adder in last row. In this multiplier array, a full adder with balanced
carry and sum delays is desirable because sum and carry both are in critical path .

2.4.2 Baugh-Wooley Multiplier[6]:-


Baugh-Wooley Multiplier are used for both unsigned and signed number
multiplication. Signed Number operands which are represented in 2’s complemented
form. Partial Products are adjusted such that negative sign move to last step, which in
turn maximize the regularity of the multiplication array. Baugh-Wooley Multiplier
operates on signed operands with 2’s complement representation to make sure that the
signs of all partial products are positive.
To reiterate, the numerical value of 2’s complement numbers, suppose X and Y
can be obtained from following product terms made of one AND gate.

Variables with bars denotes prior inversions. Inverters are connected before the input of
the full adder or the AND gates as required by the algorithm. Each column represents the
addition in accordance with the respective weight of the product term.

2.4.3 Multiplier using Bypassing circuitary:-


In these technique, The main idea of our approach is based on the observation that
most modern multipliers produce a large number of signal transitions while adding zero
partial products. If, any bit of the multiplier is zero that row of adders need not to be
activated, since corresponding partial product is zero. The adders of these multiplier,
however perform summation of the zero partial products and, as result, exhibit redundant
signal switching. The increased activity of the internal nodes results in unnecessary
power dissipation[7] [8].
To disable this adder rows we have to bypass the partial product of previous adder
row to next adder row. It modifies the unnecessary transitions and bypass inputs to
outputs when corresponding partial product is zero. Multiplexers are used at the output of
full adder to pass the partial product directly when it is zero to the next stage.
Figure 2.3 4*4 Bypass Multiplier
The tri-state buffers, placed at the inputs of the adder cell, disable signal transitions in
those adding cells which are bypassed. The output carry-bits c are passed downwards,
instead of to the right [9].

2.4.4 Multiplier using Adder-Subtractor Unit(ASU)[4] :-

In these technique, higher power reduction can be achieved if the operand


contains more number of 0’s than 1’s. In this approach it was propose Binary / Booth
Recoding Unit which will force operand to have more number of zeros. The advantage
here is that if operand contains more successive number of ones then Binary / Booth
Recoding unit converts these ones in zeros. Adder-Subtractor Unit also removes the extra
2’s complement addition circuitry needed. Use of look up table is again an added
advantage to this design.

The switching activity of the component used in the design depends on the input
bit coefficient. This means if the input bit coefficient is zero, corresponding row or
column of adders need not be activated. If operand contains more zeros, higher power
reduction can be achieved. We proposed a Binary / Booth Recoding Unit which will
force operand to have more number of zeros.
s2b1 s1b1 s0b0
a3b0 a2b0 a1b0 a0b0
XO XO XO
R R R

a3b1
+/- +/- +/-
s2b2 s1b2 s0b2
XO XO XO
R R Mux a2 R Mux a1 Mux a0

+/- +/- +/-


a3b2
s2b3 s1b3 s0b3
XO XO Mux a2 XO Mux a1 Mux a0
R R R

+/-
text +/-
text +/-
text

s3b3 a2 a1 a0
Mux a2 Mux a1 Mux a0
XO
AND AND AND
R

SA SA SA

P6 P5 P4 P3 P2 P1 P0

Figure 2.4 4*4 ASU Multiplier [4]

Figure shows the 4x4 low power ASU multiplier structure. This technique will be
very useful as we go for higher width of the multiplicand specially when there are
successive numbers of ones.Each ASU will work as an adder or subtractor depending
upon the sign bit of sign register. For multiplication with b it will make ASU to work as
subtractor and with 0 and 1, it will work as an adder. The great advantage of this
technique is that we don’t need extra addition circuitry to add sign extension bits when
multiplicand bit is –1. In the upper row of architecture we need to and sign bits with b0.
Since when sj=1 and b0=0, if not added produces wrong outputs. At the bottom, ASU
will work as half adder or subtractor depending upon the sign bits. For higher width of
multiplicand smart adder chain will continue.

bi S(i-1)j+1
C(i-1)j
aj aj

ASU Sj

Cij 1 0 aj
Sij
Figure 2.5 Adder Subtractor Unit[1]

a ibj c (i-1)j

+/- S (i-1)j

CI+j+1 XOR

+ C(i-2)j

SI+j

Figure 2.6: - Smart Adder (SA)

The Modified Full Adder-Subtractor Unit is constructed as shown in figure. If aj is zero,


FA is disabled. Here sj is a sign bit of operand. Structure of smart adder is shown in
figure.

2.5 Signed Number Multiplication:-


As we seen in unsigned multiplication, user has to input number as well as sign
,so for total operation of this multiplier we required more hardware and more switching
operation hence the switching power, i.e. dynamic power will be more for Unsigned
Multiplication.
In Signed Multiplication, directly user has to enter signed number, so there is no
need to enter separate sign bit for all four numbers. The only difference between Signed
number and Unsigned number is the range of the number. As, we saw earlier in section
3.1 the range of the Unsigned number is from 0 to 2ⁿ-1. So, the range of the Signed
Number is from –2ⁿ -1 to +(2ⁿ -1-1).

2.5.1 Representation of Negative Numbers:-


For fixed-point number in a radix r system, we have to determine way of negative
number to be represented. Two different forms are commonly used:-

1. Sign and Magnitude Representation.


2. Complement Representation.
1.Sign and Magnitude Representation:-
In this form of representation sign and magnitude are represented separately. First
digit is sign bit and the remaining (n-1) bits are magnitude. In binary case, ‘0’ is
represented as positive and ‘1’ is represented as negative. In the non-binary case, value 0
and (r-1) are assigned to the sign digit of positive and negative number, respectively. In
the binary case all 2n sequences are utilized. The 2n-1 sequence from 00----0 to 01----1
represents positive number, while the remaining 2n-1 sequences from 10----0 to 11----1
represents negative number. A major disadvantage of the signed-magnitude
representation is that the operation to be performed may depend on the signs of the
operand. For example, when adding a positive number X and a negative number –Y, we
need to perform the calculation X+(-Y). If, Y>X, then we should obtain as a final result
–(Y-X). For that we have to perform (Y-X) ,i.e., switch the order of operands and
perform subtraction rather than addition, and then attach minus sign to it.
Example:- +7 would be 111 and then a 0 in front so 00000111 for an 8-bit representation.
-9 would be 1001 (+9) and then a 1 so 10001001 for an 8-bit representation

2. Complement Representation:-
In complement representation, numbers are represented as two’s complement in
the binary section. In this method, positive number is represented in the same way as
signed-magnitude method. It is most widely used method of representation. Positive
numbers are simply represented as a binary number with ‘0’ as sign bit. To get negative
number convert all 0’s to 1’s , all 1’s to 0’s and then add ‘1’ to it. Suppose, a number
which are in 2’s complement form and we have to find its value in binary, then if number
starts with ‘0’ then it is a positive number and if number starts with ‘1’ then it is a
negative number.
If, number is negative take the 2’s complement of that number, we will get number
in ordinary binary. Let us take, 1101. Take the 2’s complement then we will get 0011.
As, number is started with ‘1’ it is negative number and 0011 is binary representation of
positive 3. So, the number is -3. Similarly, we are representing other negative numbers in
2’s complement representation.
Suppose we are adding +5 and -5 in decimal we get ‘0’. Now, represent these
numbers in 2’s complement form, then we get +5 as 0101 and -5 as 1011. On adding
these two numbers we get 10000. Discard carry, then the number is represented as ‘0’
In this signed multiplication we had modified the Complex Multiplication
strategy, normally we are having Four Multipliers and three adder/subtractor blocks.
But,in modified strategy we require Three Multipliers and five Adders.

For Complex Multiplication of two numbers:-


(a+jb).(c+jd) we get
Real Part:- (c-d).b + c.(a-b)
Imaginary Part:- (c+d).a – c.(a-b)
So, we required only Three Multiplication term as c.(a-b) is common term in
both results. Hence, we are saving more power than we used in previous method of
Complex Multiplication.

2.5.2 Booth’s Recoding Algorithm:-


Parallel Multiplication using basic Booth’s Recoding algorithm
technique based on the fact that partial product can be generated for
group of consecutive 0’s and 1’s which is called as Booth’s Recoding.
These Booth’s Recoding algorithm is used to generate efficient partial
product. These Partial Products always have large number of bits than
the input number of bits. This width of partial product is usually
depends upon the radix scheme used for recoding. These generated
partial products are added by compressor’s as explained in section
3.2. So, these scheme uses less partial products which comprises low
power and area.
There are two types of algorithm Radix-2 and Radix-4 to
generate efficient partial products for multiplication. First we will
explain basic technique of Booth’s Recoding algorithm and then
Modified Booth’s Recoding technique for both Radix-2 and Radix-4
algorithm.
2.5.3 Basic Technique of Booth’s Recoding Algorithm for Radix-
2 and Radix-4:-
Booth has proposed Radix algorithm for high speed multiplication
which reduces partial products for multiplication. The Booth’s
algorithm for multiplication is based on this observation. To do a
multiplication A*B, where
A= an ,an-1…..a0 is a multiplier
B= bn ,bn-1…..b0 is a multiplicand
then, we check every two consecutive bits in A at a time:-
Ai Ai-1 Y Comments Explanation
0 0 0 Middle of 0’s String of 0’s shift only
0 1 1.B End of 1’s Add and Shift
1 0 -1.B Beginning of Add and Shift
1’s
1 1 0 Middle of 1’s String of 1’s shift only
Table 2.1. Booth’s Recoding algorithm Radix-2
Ai+1 Ai Ai-1 Y Comments Explanation
0 0 0 0 Strings of Two bit shift only
zeros
0 0 1 1.B End of 1’s Add and two bit shift
0 1 0 1.B A single 1 Add and two bit shift
0 1 1 2.B End of 1’s Add and two bit shift
1 0 0 -2.B Beginning of Add and two bit shift
1’s
1 0 1 -1.B A single 0 Add and two bit shift
1 1 0 -1.B Beginning of Add and two bit shift
1’s
1 1 1 0 Strings of Two bit shift only
zeros

Table 2.2. Booth’s Recoding algorithm Radix-4


Let us take example:-
Radix-2:-
Suppose A is Multiplier having value -5 and B is Multiplicand having value +2 then,

B=> 0010 (+2)


A=> 1011 (-5)
After looking into above table for multiplicand, first we see two LSB values and then
adjacent values in A. We, get partial product as:-
i) For 10 we have to perform -1.B, i.e., 2’s complement of B, 1110.
ii) For 11 we have to put all 0’s i.e., 0000.
iii) For 01 we have to perform 1.B, i.e., value of B,0010
iv) For 10 again -1.B, i.e. 1110.
Here, some bits are encapsulated called as correction bits to match the width of partial
products.

Radix 4:-
A=> -5 => 1 1 1 1 1 0 1 1
B=> 46 => 0 0 1 0 1 1 1 0, then the following Partial Products are
generated:-
In the above technique of Booth’s Algorithm vertical length of
partial products are more, hence more adders are required, so
power and area will be more.

-:References:-

[1] Solomentsev, E.D. (2001), "Complex number", in Hazewinkel, Michiel,


Encyclopaedia of Mathematics, Springer, ISBN 978-1556080104
[2] Man Yan Kong; Langlois, J.M.P.; Al-Khalili, D.(2008), “Efficient FPGA
implementation of complex multipliers using the logarithmic number system “Circuits
and Systems, 2008. ISCAS 2008. IEEE International Symposium on Digital Object
Identifier, Page(s): 3154 – 3157.
[3] Pascual, A.P.; Valls, J.; Peiro, M.M(1999), “Efficient complex-number multipliers
mapped on FPGA”, Electronics, Circuits and Systems, 1999. Proceedings of ICECS '99.
The 6th IEEE International Conference on
[4] R.M.Badghare, S.K.Mangal, R.B.Deshmukh , R.M.Patrikar (2009), “Design
of Low Power Parallel Multiplier”, Journal of Low Power Electronics, Volume 5,
Number 1, April 2009, 31-39.
[5] Jones, C.M. ; Dlay, S.S. ; Naguib, R.G.(Oct 1996), “Berger check prediction for
concurrent error detection in the Braun array multiplier”, Electronics, Circuits, and
Systems, 1996. ICECS '96., Proceedings of the Third IEEE International Conference,
Pages 81 - 84 vol.1
[6] C. R. Baugh and B. A.Wooley, .A two.s complement parallel array multiplication
algorithm., IEEE Trans. Comput., Dec. 1973, vol. C-22, pp. 1045-1047.
[7] Ko-Chi Kuo; Chi-Wen Chou (2006),” Low Power Multiplier with Bypassing
and Tree Strucuture” Circuits and Systems, 2006. APCCAS 2006. IEEE
Asia Pacific Conference 4-7 Dec. 2006,602 – 605.
[8] J. Ohban, V.G. Moshnyaga, and K. Inoue, Multiplier energy reduction through
bypassing of partial products, Asia-Pacific Conf. on Circuits and Systems. 2002.,vol.2,
pp. 13-17.
[9] Ming-Chen Wen, Sying-Jyan Wang, and Yen-Nan Lin, Low Power Parallel
Multiplier with Column Bypassing, Electronics letters, 10, 12 May 2005 Volume
41, Issue Page(s): 581 – 583

Chapter 3.

Multiplier Unit

As explained in previous chapters about various technique of Complex


Multipliers, we found that implementation of Complex Multipliers are implemented
using more than one number of Basic Multipliers are required, i.e. to implement normal
way to implement Complex Multiplication, four Basic Multipliers are required. To make
Complex Multiplier as low power unit, this Basic Multipliers are designed by using
Compressor technique. If, the Basic Multiplier is designed as low power then Complex
Multiplier also becomes a low power unit.
Figure 3.1 Internal Block Diagram of 16*16 Basic Multiplier[2]

The above figure shows Internal Block Diagram of Basic Multiplier. It consists of three
stages:-
i) Partial Product Generator
ii) Different Order Compressors
iii) Parallel Adder
Below is the description of all three blocks that are used for multiplication.

3.1 Partial Product Generator:-


In Unsigned Multiplier, normally we are generating partial products and adding
them to generate result of multiplier. Let ‘A’ and ‘B’ are two n-bit unsigned numbers
which is generating product ‘Z’ which is of 2n-bit. First we are generating Partial
products by using ‘AND’ operation. For n bit number multiplication n*n number of
partial product generated.
Let us take two 16-bit numbers A15-A0 called Multiplicand and B15-B0 called
Multiplier as inputs of multiplier, partial products are generated by ANDing each bit of
‘A’ with each bit of ‘B’, so 16*16=256 number of partial products are generated. Each
bit of multiplicand is ANDed with every bit of multiplicand. a0 is ANDed
with b0-b15 producing m00-m015 sixteen partial product for first row.
Similarly, for other 14 rows we are using AND operation of a1-a15 with
b0-b15 for producing other 240 remaining partial products i.e. from
m01-m1515.
Figure 3.2. Partial Product Generator(4
Bit)

In above diagram Partial Product Generator is explained.


a0 bit which is multiplicand is ANDed with other bits of multiplier b0-b3
producing sixteen partial products m00-m33. This Partial Products is
going to the inputs of Compressors to compress the partial product
stages. This Compressors are used to reduce the stages of partial
products into only two stages.

3.2 Different Order Compressors[1][3][4]:-


After Generation of Partial Products, these partial
products are going to inputs to compressors. Compressors are used to
reduce the partial product stages of the multiplier. The main operation
of compressors is to count number of 1’s. After generating partial
products we have make vertical groups. This vertical groups will count
number of 1’s and count value of that group is passed it on second
stage.

3.2.1 Adder as Counter:-


Adder circuit whether it is a full adder or half adder can be used as a
counter which counts number of 1’s.
Figure 3.3. Half Adder Figure
3.4.Full Adder

A B Carr Sum
y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0

Table 3.1. Half Adder as a Counter

A B C Carr Sum
y
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1

Table 3.2. Full Adder as a Counter[2]

Above table shows the half adder and full adder as a


counter, it counts number of 1’s , if inputs are A,B and C then its
count value carry and sum together gives number of 1’s in binary
form. Carry is Most Significant Bit and Sum is Least Significant Bit.
This adder which uses three inputs and generating two outputs, so
it means it compresses three bits into two bits called 3:2
compressor.
Similarly, on the basis of these logic we can make other
types of compressors having more number of inputs called higher
order compressors. These compressors count number of 1’s of
higher number of inputs. So, as vertical length of partial products
increases we can use these higher order compressors.

3.2.2Compressor Logic:-
Different Compressor logic based upon the concept of
counter of full adder. It can be defined as single bit adder circuit
that has more than three inputs as in full adder and less number of
outputs. It is noticed that in full adder there are three outputs so, it
will count upto three(11). Similarly, for three bit output it will count
upto maximum seven(111) value.
Compressors having four,five,six and seven number of
inputs produces three number of outputs which counts maximum
seven(111) value. Other Compressors having eight to fifteen
number of inputs produces four number of outputs which counts
maximum fifteen(1111) value. So, these compressors are build
depend on number of inputs they are having and what count value
they have to generate. Following is the description of different
compressor logics with their block diagrams:-

1) 4:3 Compressor:-
Figure 3.5. Block Diagram of 4:3
Compressor

Above figure shows block diagram of 4:3 Compressor. It consists of


four inputs and three outputs. 4:3 Compressor has two Half Adders and
one Parallel Adder. If, all four inputs are 1 then it will give maximum
count value as 100 . Consider the output bits represented as j, (j+1),
and (j+2). (j+2)th bit is MSB and jth bit is LSB.

2) 5:3 Compressor:-

Figure 3.6. Block Diagram of 5:3 Compressor

Above figure shows block diagram of 5:3 compressor. It consists of


five inputs and three outputs. 5:3 Compressors has one Half adder,
one Full adder and a Parallel Adder. So, the maximum count value
will be 101. Consider the output bits represented as j, (j+1), and
(j+2). (j+2)th bit is MSB and jth bit is LSB.

3) 6:3 Compressor:-

Figure 3.7. Block Diagram of 6:3 Compressor


Above figure shows block diagram of 6:3 compressor. It consists of
six inputs and three outputs. 6:3 Compressor has two Full adders
and one parallel adder.So, the maximum count value of 6:3
compressor will be 110. Consider the output bits represented as j,
(j+1), and (j+2). (j+2)th bit is MSB and jth bit is LSB.

4) 7:3 Compressor:-
Figure 3.8. Block Diagram of 7:3 Compressor

Above figure shows block diagram of 7:3 compressor. It consists of


seven inputs and three outputs. 7:3 Compressors has one 4:3
Compressor, one Full adder and one parallel adder. So, the
maximum count value of 7:3 compressor is 111. Consider the
output bits represented as j, (j+1), and (j+2). (j+2)th bit is MSB and
jth bit is LSB.

5) 8:4 Compressor:-
Figure 3.9. Block Diagram of 8:4
Compressor

Above figure shows block diagram of 8:4 compressor. It consists of


eight inputs and four outputs. 8:4 Compressor has one 5:3
Compressor, one Full Adder and one Parallel Adder. The maximum
count value of 8:4 compressor is 1000. Consider the output bits
represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is
LSB.

6) 9:4 Compressor:-
Figure 3.10. Block Diagram of 9:4
Compressor

Above figure shows block diagram of 9:4 Compressor. It consists of


nine inputs and four outputs. 9:4 Compressor has one 6:3 Compressor,
one Full Adder and one parallel adder. The maximum count value of
9:4 compressor is 1001. Consider the output bits represented as j,
(j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is LSB.

6) 10:4 Compressor:-

Figure 3.11. Block Diagram of 10:4 Compressor


Above Figure shows block diagram of 10:4 Compressor. It consists
of ten inputs and four outputs. 10:4 Compressor has one 7:3
Compressor, one Full Adder and one Parallel Adder.The maximum
count value of 10:4 compressor is 1010. Consider the output bits
represented as j, (j+1), (j+2), (j+3). (j+3)th bit is MSB and jth bit is
LSB.

7) 11:4 Compressor:-
Figure 3.12. Block Diagram of 11:4
Compressor

Above Figure shows Block Diagram of 11:4 Compressor. It consists


of eleven inputs and four outputs. 11:4 Compressor has one 7:3
Compressor, one 4:3 Compressor and one Parallel Adder. The
maximum count value of 11:4 compressor is 1011. Consider the
output bits represented as j, (j+1),(j+2) and (j+3). (j+3)th bit is MSB
and jth bit is LSB.

8) 12:4 Compressor:-

Figure 3.13. Block Diagram of 12:4


Compressor
Above Figure shows Block Diagram of 12:4 Compressor. It consists
of twelve inputs and four outputs. 12:4 Compressor has one 7:3
Compressor, one 5:3 Compressor and one three-bit Parallel adder.
The maximum count value of 12:4 compressor is 1100. Consider the
output bits represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is
MSB and jth bit is LSB.

9) 13:4 Compressor:-

Figure 3.14. Block Diagram of 13:4


Compressor

Above Figure shows Block Diagram of 13:4 Compressor. It consists


of thirteen inputs and four outputs. 13:4 Compressors has one 7:3
Compressor, one 6:3 Compressor and one three-bit parallel
adder.The maximum count value of 13:4 compressor is 1101.
Consider the output bits represented as j, (j+1), (j+2) and (j+3).
(j+3)th bit is MSB and jth bit is LSB.

10) 14:4 Compressor:-

Figure 3.15. Block Diagram of 14:4


Compressor

Above Figure shows Block Diagram of 14:4 Compressor. It consists


of fourteen inputs and four outputs. 14:4 Compressor has two 7:3
Compressors and one three-bit parallel adder. The maximum count
value of 14:4 compressor is 1110. Consider the output bits
represented as j, (j+1), (j+2) and (j+3). (j+3)th bit is MSB and jth bit
is LSB

11) 15:4 Compressor:-


Figure 3.16. Block Diagram of 15:4
Compressor

Above Figure shows Block Diagram of 15:4 Compressor. It consists


of fifteen inputs and four outputs. 15:4 Compressors has one 8:4
Compressor, one 7:3 Compressors and one three-bit parallel
adder.The maximum count value of 15:4 compressor is 1111.
Consider the output bits represented as j, (j+1), (j+2) and (j+3).
(j+3)th bit is MSB and jth bit is LSB

12) 16:5 Compressor:-


Figure 3.17. Block Diagram of 16:5
Compressor

Above Figure shows Block Diagram of 16:5 Compressor. It consists


of sixteen inputs and five outputs. 16:5 Compressors has two 8:4
Compressors and one four-bit parallel adder. The maximum count
value of 16:5 compressor is 10000. Consider the output bits
represented as j, (j+1), (j+2) ,(j+3) and (j+4). (j+4)th bit is MSB and
jth bit is LSB.

These different order Compressors are used to reduce the partial


product stages. Compressors are also used to reduce the switching
operations as we are used to count the number of 1’s only. The
partial products generated is divided into different order
compressors vertically.
3.3 Parallel Adders:-

Figure 3.18. Block Diagram of Parallel


Adder

Above figure shows Block Diagram of Parallel Adder. It consists of


cascaded Full Adder’s. Depending on length of output that many of
adders are used. For N*N multiplication 2N number of full adders are
used. Here, Cout of first full adder is connected to Cin of next adjacent
full adder. The main concept of these parallel adder is comes from
Carry Look-ahead Adder. The output of Parallel Adder is the final
output of Multiplier.

3.4 Architecture of Multiplier Using Compressor:-


Following figure shows the Architecture of 8*8 Multiplier using different
order Compressors.
.
Figure 3.19. Architecture of 8*8 Multiplier using
Compressors[2]

As, shown in above figure Partial Products are added in four


stages. Adders and different compressors are used to minimize the
stage operations. Compressors are used carefully so that minimum
number of outputs are generated. Consider column number eight,
where eight bits are added at the first stage. These eight bits are
added by using 8:4 Compressor, that generates four output which
eventually decreases number of bits for next stage.
It is to be mentioned that output of each compressor from 4:3 to
7:3 has bit position jth, (j+1)th and (j+2)th, where jth bit is LSB bit
and (j+2)th bit is MSB bit.Compressor from 8:4 to 15:4 has bit
position jth, (j+1)th, (j+2)th and (j+3)th, where jth bit is LSB and
(j+3)th is MSB. Compressor 16:5 has bit position jth, (j+1)th,
(j+2)th, (j+3)th and (j+4)th, where jth bit is LSB and (j+4)th is MSB.
Suppose, if compressor in column number four i.e.,4:3 Compressor,
its jth output goes to column number four and next adjacent output
i.e.,(j+1)th output goes to column number five and (j+2)th output
goes to column number six. Similarly, for eight column i.e. for 8:4
compressor,its jth output goes to column number eight and next
adjacent output (j+1)th output goes to column number nine and last
output(j+3)th output goes to column number eleven. Thus, these
compressors are used to reduce vertical critical path more rapidly.
Now, similarly for next stage if vertical path having bit more than
two bits, we used compressors of that many bits to reduce again
the vertical critical path. Finally, we use compressors upto the stage
where only vertically two bits are there and that two bits are added
parallely as explained in section 3.3.

-:References:-
[1] C.H.Chang, J.Gu, M.Zhang (2004) ,”Ultra low-voltage low-power CMOS
4-2 and 5-2 compressors for fast arithmetic circuits”, Circuits and
Systems Regular Papers, IEEE Transactions page(s): 1985- 1997, Volume:
51, Issue: 10, Oct. 2004.
[2] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2-
ns16×16-Bit Binary Multiplier Using. High Speed Compressors”,
International Journal of Electrical, Computer, and Systems Engineering,
2009, 234-239.
[3] J. Gu, C.H.Chang (2003), “Ultra low voltage low power 4-2 compressor
for high speed multiplications”. Circuits and Systems, 2003.ISCAS ’03.
Proceedings of the International Symposium, vol. 5, May 2003, 321-324.
[4] K. Prasad and K. K. Parthi (2001), “Low power 4-2 and 5-2 compressor”.
Proc. of the 35th Asilomar Conf. on Signals, Systems and Computors, vol.
1, ,2001,129-133.
Chapter 4.

Proposed Complex Multiplier

In these Chapter we proposed new Complex Multiplier for both unsigned and signed
Complex Multiplication.

4.1 Unsigned Multiplication:-


As, we saw in General rule of Complex Multiplication when we multiplying two
complex numbers we are getting four different multipliers and three
adders/subtractors. The range of unsigned number is 0 to 2ⁿ-1 Being as a unsigned
number, we have to enter separate sign for all four real numbers hence, we are getting
real and imaginary parts of the number with sign of real and imaginary by using some
combinational logic we are getting Real and Imaginary sign output.
Figure 4.1. Block Diagram of Unsigned Complex Multiplier

As shown in figure 1, we are entering four real numbers ‘a’,’b’,’c’ and ‘d’ & sign
of each number as ‘sa’, ‘sb’, ‘sc’, ‘sd’. After, multiplying the Real numbers using four
Multipliers and by using Add/Sub Block of 32 bit we are getting output as “rr” which is
Real part and “ri” which is Imaginary part of the result of Complex Multiplication.
Similarly, to get sign of result for both Real and Imaginary part we have to apply some
combinational logic for sign inputs and we are getting output sign as “ssr” for Real part
and “ssi” for Imaginary part.

As explained in Chapter 2. multiplication of Two Complex Numbers.


(a+bi).(c+di)=(ac-bd) + (ad+bc)i
As, we are entering sign of each number separately, we have to use some combinational
circuit to produce sign of result for Real part(sr) as well as Imaginary part(si).
Consider first term “ac” represent as ‘e’, “bd” represent as ‘f’, “ad” represent as ‘g’ and
“bc” represent as ‘h’. So, sign of these results represented as se,sf,sg and sh. So, these
sign results will be generated as by using XORing operations.
se= sa xor sc.

sf= sb xnor sd.

sg= sb xor sc.

sh= sa xor sd.


Figure 4.2. Combinational Logic for intermediate sign

Now, by using some condition on se, sf, sg, and sh, we are generating final sign result,
i.e. for “sr” for real part and “si” for imaginary part. We are applying 2:1 Mux to generate
the output sign value. ‘0’ is represented for Positive Value and ‘1’ is represented for
Negative Value.
Figure 4.3. Combinational Circuit for output sign Real Part and Imaginary Part

4.2 Signed Multiplication:-

Figure 4.4. Modified Complex Multiplier Block Diagram.


Above Block Diagram shows Modified Complex Multiplier which consists of three
multipliers and three adder/subtractor unit. These multiplier requires one less multiplier
compare to previous technique. So, it consumes less power. To perform signed
multiplication we are using Booth’s Radix algorithm. Booth’s Radix algorithm reduces
partial products as compared to normal multiplier algorithm. So, it reduces the switching
operation of the multiplier, hence reduces power. It is based on the fact that partial
product can be generated for group of consecutive zeros & ones which
is called as Booth’s recoding.

4.2.1 Modified Technique Recoding Algorithm for Radix-2 and


Radix-4[1][2]:-
Parallel Multiplication using basic Booth Recoding Technique is
explained in previous section. Since this technique requires lot of adders as a
result it requires more power & area. In next proposed multiplier design, we have reduced
number of adders required in partial product addition. Hence, reduction of vertical length
of Partial Products. In these technique, mainly correction bits are reduced This is done
without compromising correctness of multiplication of 2’s complement numbers. We
have used Multiplexer based Booth Recoding scheme to reduce the length and width of
partial products.
In these technique, change in scheme results in partial products which after recoding
are always greater than input bit length by one bit Radix-2 scheme. Similarly, in Radix-4
scheme recoding are always greater than input bit length by two bits. These additional
bit/bits are act as a correction bit/bits to get correct value of the multiplier. Also, at
hardware realization of Booth’s recoding scheme, we can remove extra select line, which
is used at the time of recoding. Because of this extra select lines multiplexer size become
large. We have observed that if we do not consider this extra bit at the time of hardware
realization we can reduces size of one multiplexer. So, in radix 2 LSB decides first partial
product. Also, in radix 4 first two LSB bits decides first partial product. Now these partial
products have been added using proposed array of adders to achieve correct
multiplication output. The working of this novel design has been explained in following
sections.
Figure 4.5 Block Diagram of Modified Booth’s Recoding unit Multiplier[1]

In order to achieve signed number multiplication Partial Products are generated


using Modified Booth’s Recoding Unit Multiplication block. After generation of new
Partial products these are added using Compressors and Parallel adder. Below is the
explanation of Modified Booth’s Recoding Unit for Multiplier.

4.2.2 Modified Booth’s Recoding Unit[3]:-


Partial Products are generated using Modified Booth’s Recoding Unit block. As,
we saw in previous section generation of Partial Products for basic Booth’s Recoding
algorithm, using the same concept we are generating partial products for Modified
Booth’s Recoding Algorithm having the length of partial product more than input bit
sequence by one for Radix-2 scheme and by two for Radix-4 scheme.
These modified technique is explained below:-
Radix-2 Method:-
As, we saw in Table 1. output partial products are added and shifted according to
input sequence. Here, we are using multiplexers to generate recoding unit. Select lines of
multiplexers are input bits of multiplier and outputs are according to modified table as
shown below:-

Ai Ai-1 Y Explanation
0 0 0 All 0’s
0 1 1.B [ B(n-1) , B ]
1 0 -1.B --------
[ B(n-1) , (-B) ]
1 1 0 All 0’s

Table 4.3 Modified Booth’s Recoding Algorithm Radix 2


This can be explained with simple example:-
Suppose B => 1100 (-4)
A => 1010 (-6)
So, according to table as shown above we will obtained recoding bits as partial products:-

PP0 => 0 0 0 0 0
PP1 => 0 0 1 0 0
PP2 => 1 1 1 0 0
PP3 => 0 0 1 0 0

Here, in Modified Booth’s Recoding algorithm one extra bit is added to the MSB
of the input bit sequence as shown in Table. The hardware realization for this recoding
unit is based on multiplexers and include 2’s complement unit. At the time of recoding
we are assuming one extra bit ‘0’ before the LSB of input bit sequence and these extra bit
‘0’ decides Partial Product according the sequence as explained in Table above. We have
observed that at the time of hardware realization only LSB is sufficient to get partial
products, because of these multiplexer become 2x1 rather than 4x1 and other
multiplexers will remain same as per their input select lines depending upon recoding
scheme. So, multiplexers are important hardware for Booth’s Recoding unit.

Radix-4 Method:-
Radix-4 scheme is same as above Radix-2 scheme which is also used to reduce
the partial product, so it is very useful for fast multiplication of long input bit sequence.
Here, partial products we got from recoding unit is always 2 bit more than input bits. So,
if input bits are n bits then partial product length will be of (n+2) bits.

Ai+1 Ai Ai-1 Y Explanation


0 0 0 0 All 0’s
0 0 1 1.A [A(n), A(n), A]
0 1 0 1.A [A(n), A(n), A]
0 1 1 2.A [A(n), A, 0]
1 0 0 -2.A --------
[A(n-1), -A, 0]
1 0 1 -1.A -------- --------
[A(n-1), A(n-1), -A]
1 1 0 -1.A -------- --------
[A(n-1), A(n-1), -A]
1 1 1 0 All 0’s

Table 4.4 Modified Booth’s Recoding Algorithm Radix-4

Above Table shows how partial products are generated according to input bit sequence.
Here, we are generating two extra bits according the input bit. These two bits are
correction bits to get corrected output of multiplication. MSBs of partial products need to
be added carefully. For that, new structure of adder array is introduced. This modification
removes the problem of large number of correction bits which requires more numbers of
adders hence more higher order compressors.

4.3 Compressors and Adders:-


Recoding and Addition scheme for Radix-2 and Radix-4 for four bit input sequence
[4] [5]:-

Figure 4.6 Addition scheme for Radix-2

Above figure shows the addition scheme for Radix-2 which having five bit partial
product. These partial product are added using compressor scheme as explained
previously. Here, value of m(0)(4) is added diagonally. i.e, added with diagonal bit which
is MSB of second partial product and also a correction bit. So, we are adding m(0)(4)
with m(1)(4) and result of that is putting in place of m(1)(4). Similarly, that new value of
MSB of second partial product row is added with old MSB of third partial product to get
new value of MSB of third partial product as shown in above figure. After getting new
values of correction bit we are adding these nits by using compressors.
Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2 [5]

Above figure shows Architecture of 8*8 Signed Multiplier for Radix-2 scheme where
partial products are generated by using Modified Booth’s Recoding Unit. Here, we are
generating partial product of 9 bits per row. In first stage, this partial products are divided
in vertical blocks, these vertical blocks are half adders, full adders and different order
compressors. Vertical block of 2 Bits are half adders and vertical block of 3 bits are full
adders. Output of these adders and compressors arranged as explained in chapter 3.
Horizontal blocks are parallel adders which are used for addition to generate final
multiplication result.
Figure 4.8 Addition scheme for Radix-4

Above figure shows addition scheme for Radix-4 which having six partial product
bits, four LSB bits are input sequence and two MSB’s are correction bit. Here, MSB of
the first row of partial products is added to both MSB’s of second row. In Modified
Radix-4 scheme total number of partial products row are half of the normal partial
product scheme. Suppose, if the multiplier is of 4*4 bit then total number of rows for
partial product including correction bits are two, i.e. half of the rows of original scheme
as shown in above figure. Similarly, for other wide bit multiplier using radix-4 scheme
total number of partial products row are half of the original, that results in less switching
operation hence, less power.
Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4

Above figure shows Architecture of 8*8 Signed Multiplier of Radix-4 scheme


where Partial Products are generated by using Modified Booth’s Recoding Unit. In this
scheme we are generating partial products of 10 bit each, i.e. extra two bit for each row as
explained in table of Radix-4 scheme. The main advantage of Radix-4 scheme is that
number of rows for partial products are become half of the Radix-2 method, i.e., here in
8*8 multiplier number of partial products row are become four, so less compressors are
required and hence less switching operation which causes low-power.
-: References:-
[1] D. A Pucknell, K. Eshraghain, Basic VLSI Design, Prentice-Hall, ISBN
81-203-0986-3.
[2] Israel Koren, Computer arithmatics algorithms A.K.Peters Ltd. ISBN 1568811608.
[3] A.D.Booth, A signed binary multiplication technique, Quarterly Journal of Mechanics
and Applied mathematics, vol-IV,pt-2-1951.
[4] C.H.Chang, J.Gu, M.Zhang (2004) ,”Ultra low-voltage low-power CMOS
4-2 and 5-2 compressors for fast arithmetic circuits”, Circuits and
Systems Regular Papers, IEEE Transactions page(s): 1985- 1997, Volume:
51, Issue: 10, Oct. 2004.
[5] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2-
ns16×16-Bit Binary Multiplier Using. High Speed Compressors”,
International Journal of Electrical, Computer, and Systems Engineering,
2009, 234-239.
Chapter 5.

Results and Discussion

5.1 Behavioral Simulation


5.2 Synthesis Report
5.3 Power Calculation
5.4 Layout

This section shows all the results of different blocks which are used for
implementation of Complex Multiplier. It consists of Simulation Results of different
blocks, Synthesis Report and Power Calculation of different blocks. Power of the design
is calculated by giving 100 Random Inputs. Test Bench is written in VHDL. The textio
format is used where, input is given in input file called infile and we are getting output in
output file called outfile. All of the below design are simulated using ModelSim XE III
6.2g, synthesized by using Xilinx ISE Project Navigator 9.1i, power calculation using
Xilinx XPower tool. Power Calculation is also calculated in ASIC Encounter synthesis
tool.
5.1 Behavioral Simulation:-
i) Unsigned Basic Multiplier16*16:-

Figure 5.1 Behavioral Simulation of Unsigned 16*16 Basic Multiplier

Above Figure shows the simulation of 16*16 unsigned multiplier. Inputs are ‘a’ and ‘b’ each of 16 bit,
while ‘z’ is the 32 bit output. As, this is unsigned multiplier range of input number is from 0 to 65535.
Here, in these type of multiplier no negative number is considered. All are positive numbers. As
shown in the simulation diagram if both inputs ‘a’ and ‘b’ value is entered as unsigned 7 i.e.
“0000000000000111” in binary we get output ‘z’ value as 49 in unsigned format. Consider the
maximum value i.e. 65535 which is highest value for 16 bit unsigned format. It consists of all 1’s i.e.
“1111111111111111” in binary, we get output ‘z’ as 4294836225 which is the maximum value for
16*16 unsigned multiplier.
ii) Unsigned Complex Multiplier 16*16:-

Figure 5.2 Behavioral Simulation of 16*16 Unsigned Complex Multiplier.

Above figure shows waveform of 16*16 Complex Multiplier for unsigned number. Here, four
inputs are there ‘a’,’b’,’c’ and ‘d’ of 16 bit input each. As, the inputs are unsigned number, we have to
enter sign of each number separately. So, for all four inputs we are entering sign bit as ‘sa’ for input
‘a’, ‘sb’ for input ‘b’, ‘sc’ for input ‘c’ and ‘sd’ for input ‘d’.
As explained in section 4.1 block diagram of unsigned complex multiplication, we are getting
output of complex multiplier as shown in above figure. Operation of Complex Multiplier is explained
in above simulation waveform.

iii) Signed Multiplier 16*16:-


a) Radix-2:-
Figure 5.3 Behavioral Simulation of 16*16 Basic Signed Multiplier
Above figure shows Behavioral Simulation of 16*16 Basic Signed Multiplier. In these scheme
we have to enter signed values of input i.e.,’a’ and ‘b’. Inputs are of 16 bit while output ‘x’ is of 32 bit.
Here, the range of the numbers are from -32768 to +32767. As, these is signed number multiplier so
both positive and negative numbers are considered.
As shown in above figure result of signed multiplier, here we don’t have to input sign value of
each input as we are required in Unsigned scheme. Negative numbers are entered in 2’s complement
form. Suppose, we are putting value of ‘a’ and ‘b’ as 7 and -7 respectively. As, ‘a’ is positive number
so we enter value as “0000000000000111” and ‘b’ as negative number so, we enter value as
“111111111111001” for -7 which is in 2’s complement form. Result ‘z’ we got here in these case is in
binary form is “1111111111001111” which is value of -49 in 2’s complement form.

b) Radix-4
In Radix-4 design simulation result is same as Radix-2 scheme. Only difference between these two
schemes are synthesis report

iv) Signed 16*16 Complex Multiplier:-


i) Radix-2
Figure 5.4 Behavioral Simulation of 16*16 Complex Signed Multiplier
Above figure shows Behavioral simulation of 16*16 Complex Signed Multiplier. In these
scheme we are entering inputs ‘a’,’b’,’c’ and ‘d’ in both positive and negative format. So, there is no
need to enter sign bits for all inputs.
As, we discussed the range of the number and format of number in previous section, consider
the first example where a=1,b=2,c=3 and d=4. All these numbers are positive number so we put their
binary values as normal binary weighted values. After calculation of (1+2i).(3+4i) we get result as
5-10i. Real part is +5 and imaginary part is -10. These result in binary format is written as for +5 it is
“00000000000000000000000000000101” and for -10 it is “11111111111111111111111111110110”
which is in 2’s complement form.

v) Radix-4:-
Behavioral Simulation of Radix-4 Complex Multiplier is same as Radix-2 scheme.
5.2 Synthesis Report:-
i) Unsigned Basic Multiplier16*16:-

Design Summary:-
a) Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization:
Number of 4 input LUTs: 714 out of 18,560 3%
Logic Distribution:
Number of occupied Slices: 405 out of 9,280 4%
Number of Slices containing only related logic: 405 out of 405 100%
Number of Slices containing unrelated logic: 0 out of 405 0%
Total Number of 4 input LUTs: 714 out of 18,560 3%

Number of bonded IOBs: 64 out of 564 11%


Total equivalent gate count for design: 4,287
Combinational Path Delay:- 34.009ns

ii) Unsigned Complex Multiplier 16*16:-


Design Summary:-
Logic Utilization:
Number of Slice Latches: 2 out of 18,560 1%
Number of 4 input LUTs: 3,422 out of 18,560 18%
Logic Distribution:
Number of occupied Slices: 1,891 out of 9,280 20%
Number of Slices containing only related logic: 1,891 out of 1,891 100%
Number of Slices containing unrelated logic: 0 out of 1,891 0%
Total Number of 4 input LUTs: 3,422 out of 18,560 18%
Number of bonded IOBs: 136 out of 564 24%
IOB Latches: 66
Total equivalent gate count for design: 21,760
Combinational Path Delay:- 41.271 ns

iii) Signed Basic Multiplier 16*16 radix 2:-


Design Summary
Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization:
Number of 4 input LUTs: 811 out of 18,560 4%
Logic Distribution:
Number of occupied Slices: 468 out of 9,280 5%
Number of Slices containing only related logic: 468 out of 468 100%
Number of Slices containing unrelated logic: 0 out of 468 0%
Total Number of 4 input LUTs: 812 out of 18,560 4%
Number used as logic: 811
Number used as a route-thru: 1
Number of bonded IOBs: 64 out of 564 11%
Total equivalent gate count for design: 4,980
Combinational Path Delay:-35.432 ns

iv) Signed Basic Multiplier 16*16 radix-4.


Design Summary
Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization:
Number of 4 input LUTs: 705 out of 18,560 3%
Logic Distribution:
Number of occupied Slices: 392 out of 9,280 4%
Number of Slices containing only related logic: 392 out of 392 100%
Number of Slices containing unrelated logic: 0 out of 392 0%
Total Number of 4 input LUTs: 707 out of 18,560 3%
Number used as logic: 705
Number used as a route-thru: 2
Number of bonded IOBs: 63 out of 564 11%
Total equivalent gate count for design: 4,422
Combinational Path Delay:-35.858 ns

v) Signed 16*16 Complex Multiplier Radix-2:-


Design Summary:-
Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization:
Number of 4 input LUTs: 3,903 out of 18,560 21%
Logic Distribution:
Number of occupied Slices: 2,238 out of 9,280 24%
Number of Slices containing only related logic: 2,238 out of 2,238 100%
Number of Slices containing unrelated logic: 0 out of 2,238 0%
Total Number of 4 input LUTs: 3,908 out of 18,560 21%
Number used as logic: 3,903
Number used as a route-thru: 5
Number of bonded IOBs: 126 out of 564 22%
Total equivalent gate count for design: 24,231

Combinational path Delay:- 58.181 ns

vi) Signed 16*16 Complex Multiplier Radix-4:-


Design Summary:-
Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization:
Number of 4 input LUTs: 3,195 out of 18,560 17%
Logic Distribution:
Number of occupied Slices: 1,758 out of 9,280 18%
Number of Slices containing only related logic: 1,758 out of 1,758 100%
Number of Slices containing unrelated logic: 0 out of 1,758 0%
Total Number of 4 input LUTs: 3,200 out of 18,560 17%
Number used as logic: 3,195
Number used as a route-thru: 5
Number of bonded IOBs: 126 out of 564 22%
Total equivalent gate count for design: 20,301

Combinational path delay: 57.847ns

5.3 Power Calculation:-


i) Unsigned Basic Multiplier 16*16:-

a) Xilinx FPGA xc2vp20-5ff1152:-


Dynamic Power:-52.68mW
Static Power:- 540.72 mW
Power-Delay Product:- 1.79 nJ

b) ASIC Encounter Synthesis:-


Number of Cells:- 668 out of 549815
Dynamic Power:- 18.97 mW

ii) Unsigned Complex Multiplier 16*16:-

a) Xilinx FPGA xc2vp20-5ff1152:-


Dynamic Power:- 6486.61mW
Static Power:- 7248.75mW
Power-Delay Product:- 267.7nJ

iii) Signed Basic Multiplier 16*16 radix 2:-

a) Xilinx FPGA xc2vp20-5ff1152:-


Dynamic Power:- 87.34mW
Static Power:- 554.68mW
Power-Delay Product:-3.09 nJ
b) ASIC Encounter Synthesis:-
Number of Cells:- 2818 out of 75981
Dynamic Power:- 3.84 mW

iv) Signed Basic Multiplier 16*16 radix-4:-

a) Xilinx FPGA xc2vp20-5ff1152:-


Dynamic Power:- 81.21mW
Static Power:- 464.07mW
Power-Delay Product:-2.9nJ
b) ASIC Encounter Synthesis:-
Number of Cells:- 653 out of 17774
Dynamic Power:- 2.83 mW

v) Signed Complex Multiplier 16*16 radix-2:-

a) Xilinx FPGA xc2vp20-5ff1152:-


Dynamic Power:- 80.78mW
Static Power:- 951.67mW
Power-Delay product:-4.69nJ
b) ASIC Encounter Synthesis:-
Number of Cells:- 3509 out of 115564
Dynamic Power:- 25.63 mW

vi) Signed Complex Multiplier 16*16 radix-4:-

a) Xilinx FPGA xc2vp20-5ff1152:-


Dynamic Power:- 80.78mW
Static Power:- 951.67mW
Power-Delay product:-4.69nJ
b) ASIC Encounter Synthesis:-
Number of Cells:- 1621 out of 46147
Dynamic Power:-10.48mW

5.4 Layout:-
Signed Complex Multiplier 16*16:-
Chapter 6.

Conclusion and Future Work

6.1 Conclusion
6.2 Future Work

This Chapter summarizes the conclusion for the design and also explained about future work.
6.1 Conclusion:-
Parallel Complex Multiplier using different order Compressors is explained. Use of
Compressors are used to reduce the switching activity and propagation delay for the Multipliers. It
also reduced vertical critical path delay, hence reduces stages of partial products. Optimal use of all
these thirteen different compressors improves the speed as well as power performance of the
multiplier. As, the delay and power both are reduced then power-delay product is also reduced.
Results are calculated in both FPGA and ASIC. FPGA we used in our design is xc2vp20-
5ff1152 to calculate all synthesis report and power for all multipliers. For, ASIC design we used
Encounter Synthesis Tool to calculate hardware information and power for all multipliers. It is found
that signed multipliers has less area and low power compared to unsigned multiplier.

6.2 Future Work:-


Complex Multiplier of higher width can be implemented using these compressors. More higher
order compressors can be design to reduce the vertical height for higher width multiplier, hence we
can achieve less power.Design of these Complex Multipliers are used to implement FFT/IFFT design
which are used in DSP applications.
SUMMARY:-
In order to evaluate performance of low power Complex Multiplier using Compressor technique, we
implement all these designs on Xilinx xc2vp20-5ff1152 FPGA. We compare the performance of
proposed Complex Multiplier using Compressor with research Multipliers which are explained in
Chapter 2. Table below highlighting the performance of all the multipliers with dynamic power,
combinational path delay and speed-power product.

Das könnte Ihnen auch gefallen