Beruflich Dokumente
Kultur Dokumente
Department of Electronics And Telecommunication Engineering, Sipnas College Of Engineering And Technology, Amravati University, Amravati444606, Maharashtra, India. vinachaudhary@rediffmail.com
ABSTRACT : Designing multipliers that are of high-speed, low power, and/or regular in layout are of substantial research interest. Many attempts have been made to reduce the number of partial products generated in a multiplication process one of them is Wallace tree multiplier. Wallace Tree CSA structures have been used to sum the partial products in reduced time. This paper proposes a 8*8 bit modified Wallace tree construction is investigated and evaluated. Wallace high-speed multiplier uses full adders and half adders, 4:2 compressor, 3:2 compressor in their reduction phase. Therefore, minimizing the number of half adders used in a multiplier reduction will reduce the complexity. A modification to the Wallace reduction is presented that the delay is the same as for the conventional Wallace reduction. KEY WORDS : CSA, Compressor, Wallace 1. INTRODUCTION In the majority of digital signal processing ( DSP ) applications the critical operations usually involve many multiplications and/or accumulations. For realtime signal processing, a high speed and high throughput Multipliers-Accumulator (MAC) always a key to achieve a high performance digital signal processing system. In the last few years, the main consideration of MAC design is to enhance its speeds. This is because, battery energy available for these portable products limits the power consumption of the system. To achieve high speed MAC it is necessary to design high speed multiplier. To achieve high speed, the well-known Wallace highspeed multiplier uses carry save adders to reduce an N-row bit product matrix to an equivalent two row matrix that is then summed with a carry propagating adder to give the product. The carry save adders are conventional full adders whose carries are not connected, so that three words are taken in and two words are output. The Wallace multiplier also uses full adders and half adders, 4:2 compressor, 3:2 compressor in their reduction phase This paper presents a modified design that greatly reduces the number of half adders in Wallace multipliers. Wallace Tree Construction There exist a handful of ways to construct the Wallace Tree. One method Considers all the bits in each column at a time and compresses them into two bits. (a sum and a carry). A second method, which is used in this project, considers all the bits in each fours rows at a time and compresses them in an appropriate manner (see Figure 1).
1, 2
Page 45
Fig 1. Traditional Wallace Tree construction In Figure 1, the Wallace tree uses 4:2 compressor, 3:2 compressors, full adders and half adders to compress a 12 partial products tree. Each 4:2 compressor takes in four bits from the same position j, one bit from the previous position j-1 (which is the carry our of the compressor in the previous position) and outputs one sum bit in the position j and two carry outs into the next position j+1. 2. BACKGROUND Any multiplier can be divided into three stages: Partial products generation stage, partial products addition stage, and the final addition stage. In the first stage, the multiplicand and the multiplier are multiplied bit by bit to generate the partial products product The second stage is the most important, as it is the most complicated and determines the speed of the overall multiplier. This project will be focused on the optimization of this stage, which consists of the addition of all the partial products. If speed is not an issue, the partial products can be added serially, reducing the design complexity. However, in highspeed design, the Wallace tree construction method is usually used to add the partial products in a tree-like fashion in order to produce two rows of partial products that can be added in the last stage. Although fast, since its critical path delay is proportional to the logarithm of the number of bits in the multiplier, the Wallace tree introduces other problems such as wasted layout area and increased complexity. In the last stage, the two-row outputs of the tree are added using any high-speed adder such as carry save adder to generate the output result.
Page 46
3.
inputand1
P o77 P n76 P n67
P m 75
P m 66
P m 57
P k73
P k64
P k55
P k46
P k37
P h70
P h61
P h52
P h43
P h34
P h25
P h16
P h07
P g60
P g51
P g42
P g33
P g24
P g15
P g06
P l 74
P l 65
P l 56
P l 47
P j 72
P j 63
P j 54
P j 45
P j 36
P j 27
P i 71
P i 62
P i 53
P i 44
P i 35
P i 26
P i 17
P f 50
P f 41
P f 32
P f 23
P f 14
P f 05
full_adder
U 20
new 32
U 49
c m42
U 63
full_adder
U 62
U 13 full_adder
new 32
U 11
full_adder
U9
U 19 cm42
U5 full_adder
new 32
U6
U7 full_adder
Cin
i n1
i n2
i n3
i n4
ci n
Cin
Cin
i n1
i n2
i n3
i n4
Cin
ci n
Cin
i n1
i n2
i n3
i n4
Cin
Cin
carryout
carryout
car ryout
carry
sum
sum
carry
sum
sum
carry
sum
U 10
CO
CO
CO
CO
CO
CO
CO
c1
c2
c1
c2
8*8 bits multiplier is designed in a 0.25m process with 2.8V supply. The critical path of the Wallace tree is simulated to be less than 1ns. The Wallace tree multiplier block diagram is shown in Figure 2.It takes a 8 bits multiplicand and a 8 bits multiplier and generates result. The partial products are then added in a Wallace tree fashion as shown in Figure 1. It consists of Following numbers of components:
U 15
a b ca s
O u tp u t1 4 O u u t1 5 O u tp t1 3 O u tp u t1 1 O tp u t1 0 4:2t p compressoru takest pfive inputs and produces utwo O u u t1 2 outputs with carry given to a next stage. 3:2 compressor takes four inputs and produces two
U 35
full_adder
A CO
B S
Cin
U 21
c m42
U 60
cm42
U 26
c m42
U 17
ci n
ci n
ci n
U 14
c m42
U 28
new 32
U 31
new 32
ci n
sum
sum
sum
i n1
i n2
i n3
i n4
i n1
i n2
i n3
i n4
c1
c2
c1
c2
c1
c2
car ryout
carr yout
sum
car ry
sum
carry
sum
c1
c2
U 37
full_adder
U 30
full_adder
full_adder
U 33
Cin
Cin
Cin
full_adder
Cin
CO
CO
CO
U 66
U 16
full_adder
U 32
full_adder
CO
S CO S
A CO
B S
Cin
outputs with carry given to a next stage. Full adders takes three inputs to produce two outputs which contains sum and carry while half adders takes two inputs and produces two outputs containing sum and carry. As half adder does not contribute much more in reduction phase , it is necessary to reduce the number of half adders. The modified Wallace tree construction reduces the number of half adders and hence complexity reduces. 8*8 bit multiplication provides 64 partial which are reduced using Wallace tree by using above components. 4. OPTIMIZATION O u tp u t7 The circuits produced by contemporary VHDL synthesis tools are, unfortunately, highly sensitive to the manner in which the original behavioral or O u tp u t8 O u tp u t9 structural description is expressed. When designing the Wallace tree multiplier, using different VHDL
U 47
a b ca s ca s
A CO
B S a ca
Cin
U 42
half_adder
U 41
half_adder
U 40
half_adder
a ca
b s
a ca
b s
a ca
b s
U 43
b s
half_adder
U 51
half_adder
U 50
half_adder
U 48
half_adder
U 52
half_adder
a ca
b s
a ca
b s
ca
ca
s a ca
U 53
half_adder
O u tp u t6
b s
U 58
full_adder
U 55
full_adder
U 65
full_adder
A CO
B S
Cin
A CO
B S
Cin
A CO
B S
Cin
U 56
full_adder
U 61
full_adder
A CO
B S
Cin
A CO
B S
Cin
Page 47
6. VERIFICATION OF SIMULATION Quartus II is powerful simulation tool developed by Altera Technologies for altera devices. VHDL code can be compiled and special libraries for altera devices can be used to simulate any hardware efficiently. All the basic modules designed for Wallace tree multiplier are compiled and tested vigorously for functional correctness using waveforms. The complete circuit of 8*8 bit Wallace tree multiplier is described in VHDL, as a structural component. The hierartial structure is created and the complete simulation can be observed. Fig.3 shows the waveform simulation result for the Wallace tree multiplier
Fig.3 Simulation waveform for Wallace Tree Multiplier 7. RTL VIEW The RTL view of the 8*8 bit Wallace tree multiplier is shown in fig.4.The Wallace tree multiplier is designed and implemented using QuartusII by Device:CycloneII:EP2C20F484C7
Page 48
9. CONCLUSION The Wallace tree multipliers can be solved & analyzed using a new modified method of Wallace tree construction. The modified tree has a slightly smaller critical path ,a slightly larger wiring overhead but requires low power and high speed. A 8x8 bits Wallace tree multiplier was designed in 0.25m.As per analysis the power core dynamic thermal dissipation is 0.15mW which is much less. Total logic elements and total combinational functions required are only less than1% from the available. There is no requirement of logic registers, registers, pins memory bits and PLLs. This modified design of multiplier which consist of 4:2 compressor, 3:2 compressor, full adders and reduced no. of half adder and reduces the complexity and save the power. 10. REFERENCES [1] Niichi Itoh, Yuka Naemura, Hiroshi Makino, Yasunobu Nakase, Tsutomu Yoshihara, and Yasutaka Horiba, A 600-MHz 54 x 54-bit Multiplier with Rectangular-Styled Wallace Tree, JSSC,vol.36, no. 2, February 2001.
Total Thermal Power Dissipation: 73.15 mW Power Core Dynamic Thermal Dissipation : 0.15mW Power Core Static Thermal Dissipation : 47.36 mW I/O Thermal Power Dissipation : 25.63 mW Total Logic Elements : 155 / 18,752( <1%) Total Combinational Function : 155 / 18,752( < 1% ) Dedicated Logic register : 0 / 18,752( < 0% ) Total Registers : 0 Total Pins : 32 / 315 (< 10% ) Total Virtual Pins : 0 Embedded Multiplier 9-bit Elements : 0 / 52 ( 0 % ) Total PLLs : 0 / 4 ( 0 %) [2] Gensuke Goto, Tomio Sato, Masao Nakajima, and Takao Sukemura, A 54x54-b RegularlyStructured Tree Multiplier, JSSC, col. 27, no. 9, September 1992. [3] Siraram Vangal et.al. A 5GHz Floating Point Multiply-Accumulator in 90nm Dual Vt CMOS,ISSCC 2003. [4] Robert Montoye, et.al. A Double Precision Floating Point Multiply, ISSCC 2003. [5] Niichi Itoh, Yuka Naemura, Hiroshi Makino, Yasunobu Nakase, A Compact 54x54 Multiplier with Improved Wallace-Tree Structure, 1999 Symposium on VLSI Circuits Digest of Technical Papers. [6] Pascal C.H. Meier, Rob A. Rutenbar, L. Richard Carley, Exploring Multiplier Architecture and Layout for Low Power, IEEE 1996 Custom Integrated Circuits Conference. [7] S. Shah, A.J. Al-Khalili, D. Al-Khalili, Comparison of 32-bit Multipliers for Various Performance Measures, The 12th International Conference on Microelectronics, Tehran, Oct. 31-
Page 49