Beruflich Dokumente
Kultur Dokumente
USING COMPRESSORS
BY
Nilay Chandrakant Ghumre
UNDER GUIDANCE OF
PROF. DR. R. B. Deshmukh
Date:
CERTIFICATE
DECLARATION
Nagpur for degree of Master of Technology in VLSI Design. I carried it out under the
Engineering ).
This thesis has not been submitted to any other University/ Institute for
supported during the project work. Without them I could not have completed the project
patience and valuable guidance throughout entire project, Prof. Dr. R. M. Patrikar for
their valuable suggestions and the whole VLSI design lab members for their cooperation
and coordination.
while completing this project work, I want to thank my parents, without their emotional
and moral support nothing was possible. Their love and support always encouraged me,
and last but not least I am very thankful to God, who provided me good health and good
Compressors are used to compress partial product addition stages. Higher order
compressors permit the reduction of the vertical critical paths in parallel multiplier
resulting in better speed-power product for the multiplier circuit. Thesis presents a novel
scheme for 16*16 bit multiplier using thirteen different types of compressors. The
scheme is optimized for low power as well as high speed implementation over reported
schemes. It represents low power multiplier design methodology, which counts only
number of 1’s in the partial products.
.
CONTENTS
1. INTRODUCTION
1.1 Introduction
1.2 Complex Number
1.2.1 Operation of Complex Numbers
1.3 Organization of Thesis
3. MULTIPLIER UNIT
3.1 Partial Product Generator
3.2 Different Order Compressors
3.2.1 Adder as Counter
3.2.2 Compressor Logic
3.3 Parallel Adders
3.4 Architecture of Multiplier using Compressors
7. REFERENCES
LIST OF FIGURES
Figure 2.1. OBC-DA based Complex Multiplier structure
Figure 2.2. 4x4 Braun Multiplier
Figure 2.3. 4*4 Bypass Multiplier
Figure 2.4 4*4 ASU Multiplier
Figure 2.5 Adder Subtractor Unit
Figure 2.6: - Smart Adder (SA)
Figure 3.1. Internal Block Diagram of 16*16 Basic Multiplier
Figure 3.2. Partial Product Generator (4 Bit)
Figure 3.3. Half Adder
Figure 3.4. Full Adder
Figure 3.5. Block Diagram of 4:3 Compressor
Figure 3.6. Block Diagram of 5:3 Compressor
Figure 3.7. Block Diagram of 6:3 Compressor
Figure 3.8. Block Diagram of 7:3 Compressor
Figure 3.9. Block Diagram of 8:4 Compressor
Figure 3.10. Block Diagram of 9:4 Compressor
Figure 3.11. Block Diagram of 10:4 Compressor
Figure 3.12. Block Diagram of 11:4 Compressor
Figure 3.13. Block Diagram of 12:4 Compressor
Figure 3.14. Block Diagram of 13:4 Compressor
Figure 3.15. Block Diagram of 14:4 Compressor
Figure 3.16. Block Diagram of 15:4 Compressor
Figure 3.17. Block Diagram of 16:5 Compressor
Figure 3.18. Block Diagram of Parallel Adder
Figure 3.19. Architecture of 8*8 Multiplier using Compressors
Figure 4.1. Block Diagram of Unsigned Complex Multiplier
Figure 4.2. Combinational Logic for intermediate sign
Figure 4.3. Combinational Circuit for output sign Real Part and Imaginary Part
Figure 4.4. Modified Complex Multiplier Block Diagram
Figure 4.5 Block Diagram of Modified Booth’s Recoding unit Multiplier
Figure 4.6 Addition scheme for Radix-2
Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2
Figure 4.8 Addition scheme for Radix-4
Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4
LIST OF TABLES
Introduction
The electronics industry has achieved a phenomenal growth over the last two
decades, mainly due to the rapid advances in integration technologies, large-scale
systems design - in short, due to the advent of VLSI. The number of applications of
integrated circuits in high-performance computing, telecommunications, and consumer
electronics has been rising steadily, and at a very fast pace. Increasing demand for
portable electronics for computing and communication, as well as other applications, has
necessitated longer battery life, lower weight, and lower power consumption. In order to
satisfy these requirements, research activities focusing on low power/low voltage design
techniques are underway. Since 'power' is now one of the design decision variables, the
expanded design space required for low power has further increased the complexity of an
already non-trivial task. Low power design basically involves two concomitant tasks:
power estimation and analysis and power minimization. These tasks need to be carried
out at each of the levels in the design hierarchy, namely, the behavioral, architectural,
logic, circuit and physical levels.[1]
In the survey of the current state of the field, many of the salient power
estimation and minimization techniques proposed for low power VLSI design are
reviewed. For each of the design levels, we provide an overview of several power
estimation and minimization approaches and the CAD tools that support them. Finally,
future research issues are discussed that will be necessary in order to make the low power
design endeavor a successful one. In the majority of digital signal processing (DSP)
applications the critical operations are the multiplication and accumulation. Real-time
signal processing requires high speed and high throughput Multiplier unit that consumes
low power, which is always a key to achieve a high performance digital signal processing
system. The purpose of this work is design and implementation of a low power multiplier
unit with block enabling technique to save power[2].
1.1 Introduction
Sizes of devices are scaling down by Moore Law. The sources of energy
consumption on a CMOS chip can be classified as static and dynamic
power dissipation. The dominant component of energy consumption in
CMOS is dynamic power consumption caused by the actual effort of the
circuit to switch. A first order approximation of the dynamic power
consumption of CMOS circuitry is given by the formula:
P = C*V2*f
i) Addition:-
ii) Subtraction:-
iii)Multiplication:-
iv) Division:-
-:References:-
[1] Power Reduction Techniques for Ultra-Low-Power Solutions by Virage
Logic Corporation.
[2] R.M.Badghare, S.K.Mangal, R.B.Deshmukh , R.M.Patrikar (2009), “Design
of Low Power Parallel Multiplier”, Journal of Low Power Electronics, Volume 5,
Number 1, April 2009, 31-39.
[3] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2-
ns16×16-Bit Binary Multiplier Using. High Speed Compressors”,
International Journal of Electrical, Computer, and Systems Engineering,
2009, 234-239.
[4] Conway, John B. (1986), Functions of One Complex Variable I, Springer, ISBN 0-
387-90328-3
[5] K.Z. Pekmestzi, "Complex Number Multipliers" IEE Proceed- ings (Computers and
Digital Technology), Vol. 136, No. 1, 1989, pp. 70-75
Chapter 2.
(a+bi).(c+di)=(ac-bd) + (ad+bc)i
(ac-bd) is the Real Part of Complex Multiplication and (ad+bc) is the Imaginary Part of
Complex Multiplication.
Remember that (ac–bd), the real part of the product, is the product of the real
parts minus the product of the imaginary parts, but (ad + bc), the imaginary part of the
product, is the sum of the two products of one real part and the other imaginary part.
The positive value is called the modulus of Z and is denoted as |Z|.
Figure shows the complex multiplier block diagram that is composed from
logarithmic and anti-logarithmic converters and N-Bit Adders. This method can
significantly reduce the hardware to build a multiplier.
LNS provides a simple technique to compute multiplication at the cost of reduced
precision. This approach has limited accuracy.
It requires four multiplication and two adders . In this technique a different way for the
realization of complex multiplication that reduces complexity of the circuit. The
canonical form of the obtained circuits makes them well suited for VLSI realizations.
Besides circuit reduction, the hardware or software for the control in the realization of the
algorithms is simplified, especially when either of these includes only complex
operations, as in an FET. Each complex bit takes four possible values. Consequently, it
must be represented by two bits. This representation allows the development of
algorithms for operations with complex numbers and the ability to describe these
algorithms in the bit-level. It is natural that these algorithms and the corresponding
circuits have great similarities to those for real numbers in two’s complement form.
Complex Parallel multiplication is the most critical for realization. The parallel multiplier
includes specialized hardware circuitry designed to perform complex multiplication
operations at high speeds. The parallel multiplier requires significantly less die area than
conventionally required, which results in reduced manufacturing costs and reduced power
consumption.[4]
+ + +
P7 P6 P5 P4 P3 P2 P1 P0
Figure 2.2 4x4 Braun Multiplier
Above figure shows structure of 4*4 Braun Multplier. An n*n bit Braun
Multiplier requires n(n-1) adders and n2 AND gates. In these technique each partial
product can be added to previous sum of partial products by using row of adders. The
Carry-out signals are shifted one bit to the left and then added to the sum of the first
adder which is adition of partial product bits. The shifting of carry-out bits to the left is
done by carry-save adder. As carry bits are passed diagonally downward to the next adder
stage, there is no horizontal carry propagation for the first four rows. Instead, the
respective carry bit is “saved” for the subsequent adder stage.
Braun Multiplier has some drawback that, the number of components required in
building the Braun Multiplier increases quadratically with number of bits. This makes
Braun Multiplier inefficient. The delay of Braun Multiplier is dependent on full adder cell
and also on final adder in last row. In this multiplier array, a full adder with balanced
carry and sum delays is desirable because sum and carry both are in critical path .
Variables with bars denotes prior inversions. Inverters are connected before the input of
the full adder or the AND gates as required by the algorithm. Each column represents the
addition in accordance with the respective weight of the product term.
The switching activity of the component used in the design depends on the input
bit coefficient. This means if the input bit coefficient is zero, corresponding row or
column of adders need not be activated. If operand contains more zeros, higher power
reduction can be achieved. We proposed a Binary / Booth Recoding Unit which will
force operand to have more number of zeros.
s2b1 s1b1 s0b0
a3b0 a2b0 a1b0 a0b0
XO XO XO
R R R
a3b1
+/- +/- +/-
s2b2 s1b2 s0b2
XO XO XO
R R Mux a2 R Mux a1 Mux a0
+/-
text +/-
text +/-
text
s3b3 a2 a1 a0
Mux a2 Mux a1 Mux a0
XO
AND AND AND
R
SA SA SA
P6 P5 P4 P3 P2 P1 P0
Figure shows the 4x4 low power ASU multiplier structure. This technique will be
very useful as we go for higher width of the multiplicand specially when there are
successive numbers of ones.Each ASU will work as an adder or subtractor depending
upon the sign bit of sign register. For multiplication with b it will make ASU to work as
subtractor and with 0 and 1, it will work as an adder. The great advantage of this
technique is that we don’t need extra addition circuitry to add sign extension bits when
multiplicand bit is –1. In the upper row of architecture we need to and sign bits with b0.
Since when sj=1 and b0=0, if not added produces wrong outputs. At the bottom, ASU
will work as half adder or subtractor depending upon the sign bits. For higher width of
multiplicand smart adder chain will continue.
bi S(i-1)j+1
C(i-1)j
aj aj
ASU Sj
Cij 1 0 aj
Sij
Figure 2.5 Adder Subtractor Unit[1]
a ibj c (i-1)j
+/- S (i-1)j
CI+j+1 XOR
+ C(i-2)j
SI+j
2. Complement Representation:-
In complement representation, numbers are represented as two’s complement in
the binary section. In this method, positive number is represented in the same way as
signed-magnitude method. It is most widely used method of representation. Positive
numbers are simply represented as a binary number with ‘0’ as sign bit. To get negative
number convert all 0’s to 1’s , all 1’s to 0’s and then add ‘1’ to it. Suppose, a number
which are in 2’s complement form and we have to find its value in binary, then if number
starts with ‘0’ then it is a positive number and if number starts with ‘1’ then it is a
negative number.
If, number is negative take the 2’s complement of that number, we will get number
in ordinary binary. Let us take, 1101. Take the 2’s complement then we will get 0011.
As, number is started with ‘1’ it is negative number and 0011 is binary representation of
positive 3. So, the number is -3. Similarly, we are representing other negative numbers in
2’s complement representation.
Suppose we are adding +5 and -5 in decimal we get ‘0’. Now, represent these
numbers in 2’s complement form, then we get +5 as 0101 and -5 as 1011. On adding
these two numbers we get 10000. Discard carry, then the number is represented as ‘0’
In this signed multiplication we had modified the Complex Multiplication
strategy, normally we are having Four Multipliers and three adder/subtractor blocks.
But,in modified strategy we require Three Multipliers and five Adders.
Radix 4:-
A=> -5 => 1 1 1 1 1 0 1 1
B=> 46 => 0 0 1 0 1 1 1 0, then the following Partial Products are
generated:-
In the above technique of Booth’s Algorithm vertical length of
partial products are more, hence more adders are required, so
power and area will be more.
-:References:-
Chapter 3.
Multiplier Unit
The above figure shows Internal Block Diagram of Basic Multiplier. It consists of three
stages:-
i) Partial Product Generator
ii) Different Order Compressors
iii) Parallel Adder
Below is the description of all three blocks that are used for multiplication.
A B Carr Sum
y
0 0 0 0
0 1 0 1
1 0 0 1
1 1 1 0
A B C Carr Sum
y
0 0 0 0 0
0 0 1 0 1
0 1 0 0 1
0 1 1 1 0
1 0 0 0 1
1 0 1 1 0
1 1 0 1 0
1 1 1 1 1
3.2.2Compressor Logic:-
Different Compressor logic based upon the concept of
counter of full adder. It can be defined as single bit adder circuit
that has more than three inputs as in full adder and less number of
outputs. It is noticed that in full adder there are three outputs so, it
will count upto three(11). Similarly, for three bit output it will count
upto maximum seven(111) value.
Compressors having four,five,six and seven number of
inputs produces three number of outputs which counts maximum
seven(111) value. Other Compressors having eight to fifteen
number of inputs produces four number of outputs which counts
maximum fifteen(1111) value. So, these compressors are build
depend on number of inputs they are having and what count value
they have to generate. Following is the description of different
compressor logics with their block diagrams:-
1) 4:3 Compressor:-
Figure 3.5. Block Diagram of 4:3
Compressor
2) 5:3 Compressor:-
3) 6:3 Compressor:-
4) 7:3 Compressor:-
Figure 3.8. Block Diagram of 7:3 Compressor
5) 8:4 Compressor:-
Figure 3.9. Block Diagram of 8:4
Compressor
6) 9:4 Compressor:-
Figure 3.10. Block Diagram of 9:4
Compressor
6) 10:4 Compressor:-
7) 11:4 Compressor:-
Figure 3.12. Block Diagram of 11:4
Compressor
8) 12:4 Compressor:-
9) 13:4 Compressor:-
-:References:-
[1] C.H.Chang, J.Gu, M.Zhang (2004) ,”Ultra low-voltage low-power CMOS
4-2 and 5-2 compressors for fast arithmetic circuits”, Circuits and
Systems Regular Papers, IEEE Transactions page(s): 1985- 1997, Volume:
51, Issue: 10, Oct. 2004.
[2] A. Dandapat, S. Ghosal, P. Sarkar, D. Mukhopadhyay (2009), “A 1.2-
ns16×16-Bit Binary Multiplier Using. High Speed Compressors”,
International Journal of Electrical, Computer, and Systems Engineering,
2009, 234-239.
[3] J. Gu, C.H.Chang (2003), “Ultra low voltage low power 4-2 compressor
for high speed multiplications”. Circuits and Systems, 2003.ISCAS ’03.
Proceedings of the International Symposium, vol. 5, May 2003, 321-324.
[4] K. Prasad and K. K. Parthi (2001), “Low power 4-2 and 5-2 compressor”.
Proc. of the 35th Asilomar Conf. on Signals, Systems and Computors, vol.
1, ,2001,129-133.
Chapter 4.
In these Chapter we proposed new Complex Multiplier for both unsigned and signed
Complex Multiplication.
As shown in figure 1, we are entering four real numbers ‘a’,’b’,’c’ and ‘d’ & sign
of each number as ‘sa’, ‘sb’, ‘sc’, ‘sd’. After, multiplying the Real numbers using four
Multipliers and by using Add/Sub Block of 32 bit we are getting output as “rr” which is
Real part and “ri” which is Imaginary part of the result of Complex Multiplication.
Similarly, to get sign of result for both Real and Imaginary part we have to apply some
combinational logic for sign inputs and we are getting output sign as “ssr” for Real part
and “ssi” for Imaginary part.
Now, by using some condition on se, sf, sg, and sh, we are generating final sign result,
i.e. for “sr” for real part and “si” for imaginary part. We are applying 2:1 Mux to generate
the output sign value. ‘0’ is represented for Positive Value and ‘1’ is represented for
Negative Value.
Figure 4.3. Combinational Circuit for output sign Real Part and Imaginary Part
Ai Ai-1 Y Explanation
0 0 0 All 0’s
0 1 1.B [ B(n-1) , B ]
1 0 -1.B --------
[ B(n-1) , (-B) ]
1 1 0 All 0’s
PP0 => 0 0 0 0 0
PP1 => 0 0 1 0 0
PP2 => 1 1 1 0 0
PP3 => 0 0 1 0 0
Here, in Modified Booth’s Recoding algorithm one extra bit is added to the MSB
of the input bit sequence as shown in Table. The hardware realization for this recoding
unit is based on multiplexers and include 2’s complement unit. At the time of recoding
we are assuming one extra bit ‘0’ before the LSB of input bit sequence and these extra bit
‘0’ decides Partial Product according the sequence as explained in Table above. We have
observed that at the time of hardware realization only LSB is sufficient to get partial
products, because of these multiplexer become 2x1 rather than 4x1 and other
multiplexers will remain same as per their input select lines depending upon recoding
scheme. So, multiplexers are important hardware for Booth’s Recoding unit.
Radix-4 Method:-
Radix-4 scheme is same as above Radix-2 scheme which is also used to reduce
the partial product, so it is very useful for fast multiplication of long input bit sequence.
Here, partial products we got from recoding unit is always 2 bit more than input bits. So,
if input bits are n bits then partial product length will be of (n+2) bits.
Above Table shows how partial products are generated according to input bit sequence.
Here, we are generating two extra bits according the input bit. These two bits are
correction bits to get corrected output of multiplication. MSBs of partial products need to
be added carefully. For that, new structure of adder array is introduced. This modification
removes the problem of large number of correction bits which requires more numbers of
adders hence more higher order compressors.
Above figure shows the addition scheme for Radix-2 which having five bit partial
product. These partial product are added using compressor scheme as explained
previously. Here, value of m(0)(4) is added diagonally. i.e, added with diagonal bit which
is MSB of second partial product and also a correction bit. So, we are adding m(0)(4)
with m(1)(4) and result of that is putting in place of m(1)(4). Similarly, that new value of
MSB of second partial product row is added with old MSB of third partial product to get
new value of MSB of third partial product as shown in above figure. After getting new
values of correction bit we are adding these nits by using compressors.
Figure 4.7 Architecture of 8*8 Signed Multiplier for Radix-2 [5]
Above figure shows Architecture of 8*8 Signed Multiplier for Radix-2 scheme where
partial products are generated by using Modified Booth’s Recoding Unit. Here, we are
generating partial product of 9 bits per row. In first stage, this partial products are divided
in vertical blocks, these vertical blocks are half adders, full adders and different order
compressors. Vertical block of 2 Bits are half adders and vertical block of 3 bits are full
adders. Output of these adders and compressors arranged as explained in chapter 3.
Horizontal blocks are parallel adders which are used for addition to generate final
multiplication result.
Figure 4.8 Addition scheme for Radix-4
Above figure shows addition scheme for Radix-4 which having six partial product
bits, four LSB bits are input sequence and two MSB’s are correction bit. Here, MSB of
the first row of partial products is added to both MSB’s of second row. In Modified
Radix-4 scheme total number of partial products row are half of the normal partial
product scheme. Suppose, if the multiplier is of 4*4 bit then total number of rows for
partial product including correction bits are two, i.e. half of the rows of original scheme
as shown in above figure. Similarly, for other wide bit multiplier using radix-4 scheme
total number of partial products row are half of the original, that results in less switching
operation hence, less power.
Figure 4.9 Architecture of 8*8 Signed Multiplier for Radix-4
This section shows all the results of different blocks which are used for
implementation of Complex Multiplier. It consists of Simulation Results of different
blocks, Synthesis Report and Power Calculation of different blocks. Power of the design
is calculated by giving 100 Random Inputs. Test Bench is written in VHDL. The textio
format is used where, input is given in input file called infile and we are getting output in
output file called outfile. All of the below design are simulated using ModelSim XE III
6.2g, synthesized by using Xilinx ISE Project Navigator 9.1i, power calculation using
Xilinx XPower tool. Power Calculation is also calculated in ASIC Encounter synthesis
tool.
5.1 Behavioral Simulation:-
i) Unsigned Basic Multiplier16*16:-
Above Figure shows the simulation of 16*16 unsigned multiplier. Inputs are ‘a’ and ‘b’ each of 16 bit,
while ‘z’ is the 32 bit output. As, this is unsigned multiplier range of input number is from 0 to 65535.
Here, in these type of multiplier no negative number is considered. All are positive numbers. As
shown in the simulation diagram if both inputs ‘a’ and ‘b’ value is entered as unsigned 7 i.e.
“0000000000000111” in binary we get output ‘z’ value as 49 in unsigned format. Consider the
maximum value i.e. 65535 which is highest value for 16 bit unsigned format. It consists of all 1’s i.e.
“1111111111111111” in binary, we get output ‘z’ as 4294836225 which is the maximum value for
16*16 unsigned multiplier.
ii) Unsigned Complex Multiplier 16*16:-
Above figure shows waveform of 16*16 Complex Multiplier for unsigned number. Here, four
inputs are there ‘a’,’b’,’c’ and ‘d’ of 16 bit input each. As, the inputs are unsigned number, we have to
enter sign of each number separately. So, for all four inputs we are entering sign bit as ‘sa’ for input
‘a’, ‘sb’ for input ‘b’, ‘sc’ for input ‘c’ and ‘sd’ for input ‘d’.
As explained in section 4.1 block diagram of unsigned complex multiplication, we are getting
output of complex multiplier as shown in above figure. Operation of Complex Multiplier is explained
in above simulation waveform.
b) Radix-4
In Radix-4 design simulation result is same as Radix-2 scheme. Only difference between these two
schemes are synthesis report
v) Radix-4:-
Behavioral Simulation of Radix-4 Complex Multiplier is same as Radix-2 scheme.
5.2 Synthesis Report:-
i) Unsigned Basic Multiplier16*16:-
Design Summary:-
a) Xilinx FPGA xc2vp20-5ff1152:-
Logic Utilization:
Number of 4 input LUTs: 714 out of 18,560 3%
Logic Distribution:
Number of occupied Slices: 405 out of 9,280 4%
Number of Slices containing only related logic: 405 out of 405 100%
Number of Slices containing unrelated logic: 0 out of 405 0%
Total Number of 4 input LUTs: 714 out of 18,560 3%
5.4 Layout:-
Signed Complex Multiplier 16*16:-
Chapter 6.
6.1 Conclusion
6.2 Future Work
This Chapter summarizes the conclusion for the design and also explained about future work.
6.1 Conclusion:-
Parallel Complex Multiplier using different order Compressors is explained. Use of
Compressors are used to reduce the switching activity and propagation delay for the Multipliers. It
also reduced vertical critical path delay, hence reduces stages of partial products. Optimal use of all
these thirteen different compressors improves the speed as well as power performance of the
multiplier. As, the delay and power both are reduced then power-delay product is also reduced.
Results are calculated in both FPGA and ASIC. FPGA we used in our design is xc2vp20-
5ff1152 to calculate all synthesis report and power for all multipliers. For, ASIC design we used
Encounter Synthesis Tool to calculate hardware information and power for all multipliers. It is found
that signed multipliers has less area and low power compared to unsigned multiplier.