Sie sind auf Seite 1von 6

2017 4th International Conference on Signal Processing, Communications and Networking (ICSCN -2017), March 16 – 18, 2017, Chennai,

INDIA

A Survey Paper on Modern Technologies in


Fixed-Width Multiplier
1
Jency Rubia J, 2Sathish Kumar G A
1
Research Scholar, 2Professor
Department of Electronics and Communication Engineering,
Sri Venkateswara College of Engineering, Chennai-25, India.
1
jencyrubia@gmail.com, 2sathish@svce.ac.in

significant part (LSP) array. The LSP is further distributed


Abstract—The vital element of the DSP processor is Multiplier into LSP major and LSP minor. LSP major includes the h
unit. The main objectives of the DSP processor are speed, power, most significant columns of LSP, while remaining neq = n −
delay and area. These goals have been realized with fixed-width
multiplier whose output bits equal to input bits. The fixed-width h column represents the LSP minor. Where ‘h’ represented as
multiplier implemented DSP processors can be applied for audio design constraint, its range is from 0 to n. The leftmost column
signal processing, video signal processing, VLSI signal of LSP minor is called Input Correction (IC) as shown in fig.1.
processing, speech recognition, digital communication, medical Another parameter called Not Formed (NF) denotes the subset
imaging, MRI, MP3 and so on.. Many researchers are optimizing, of the partial products that are not formed to consume the area.
the performance of the multiplication process. In this review The NF part is indicated as gray color in the fig. 1. To achieve
paper, the technologies to achieve the objectives of the DSP
processor have been studied. And also the most recent same output bits as input, the NF part does not contribute to
developments in the multiplier circuit have been discussed. In the computation. So the output has some sort of errors.
this paper, first, the brief background of the fixed-width
multiplier is outlined. Then, several multiplier architectures
proposed for MAC (multiplier-accumulator) presented,
narrating their functioning principles and key features. To
provide a perception into future research directions, open
research issues are discussed at the completion of this paper.

Index Terms—Array Multiplier, Parallel multiplier, Fixed-


width multiplier, Exhaustive simulation, Multiply-accumulate
unit, Mean square error, Mean absolute error.

I. INTRODUCTION
Multiplier circuit is an essential for any DSP related
operations. (Such as convolution, filtering, FFT, etc.,)
Multiplier operation involves multiplicand and the multiplier.
The 2n bit product can be formed by the multiplication of n-bit
multiplicand by the n-bit multiplier. The generated partial
Fig.1. General Architecture of Fixed-width Multiplier for 8×8 matrix
products are n × n bits so that the computation period takes
long. To enhance the operation performance, parallel or array So to amend the errors, many correction biasing circuits can
multipliers have been employed. But it consumes more area be implemented. These are the challenging tasks in the area of
for holding n2 partial product bits (where n is the number of the multiplier circuit. In the following chapters discussed
bits). To shrink the quantity of the partial products, rounded or various methods to tackle the presiding factors such as area
truncated perceptions were elevated. Since the truncation and power consumption.
involves some elimination in the products, it results having This paper is arranged as follows. Chapter 2 covers the
huge difference. different methodology to achieve the dominant parameters of
In the fixed-width multiplier case, the number of output bits the Multiplier and design issues. Chapter 3 discussed the
same as the input. The fig.1 is illustrations the universal future research issues. Finally, chapter 4 concludes the paper.
architecture of fixed-width multiplier. Their partial product
matrix is separated into several parts. The leftmost n column II. LITERATURE SURVEY
of the partial product matrix represents the MSP (Most
This survey is deals various approaches regarding
Significant Part). The rest of the matrix signifies the least
improvement in the performance of multiplier and Multiply-
Accumulate (MAC) unit. Implementing the multiplier circuit
978-1-5090-4740-6/17/$31.00 ©2017 IEEE
2017 4th International Conference on Signal Processing, Communications and Networking (ICSCN -2017), March 16 – 18, 2017, Chennai, INDIA

or MAC for VLSI signal processing application is the toughest terms of area, Jou’s outcome was better and it consumes 50%
job. Because Moore’s law described that every year the of the chip area. This type of fixed-width multiplier was very
number of transistors in a chip should be double or the area of appropriate for many DSP applications such as arithmetic
the chip should be reduced. But the challenging factors of coding, wavelet transformation for audio signal processing
multiplier circuit are area and power consumption. To and video signal processing, digital filtering.
diminish chip area, power consumption and to achieve small
delay, researches were tried many methodologies from early
days to nowadays. Let’s discuss the methods, problems, key
features and advantages.
In 1993, M.J Schulte and E.E Swartzlander [1] were
proposed a method, for consuming the area requirements by
calculating the final results only based on most significant
partial products along with the correction constant. The
selected correction constant has the capability to reduce the
introduced average and the mean square error. Since truncated
multiplier involves the reduction and rounding error, Lin [2]
analyzed the rounding error and reduction error separately to
determine correction constant. But it is an open a way for Fig.2. Sections of the parallel multiplier and generation of four terms of P
choosing an awful correction constant. Here, the selection of (Product)
the correction constant involves the inverse of addition of
errors. The expectation function of the two errors is computed In [1], E.E Swartzlander and others suggested a method for
separately and then combines with the selected correction effective truncated multiplication with correction constant.
constant. The computed product P’ is P’= P+ Ereduct +Eround The same researcher investigated the truncated multiplier with
+ C where P symbolized as the true product, Ereduct indicated approximate rounding (variable correction) and published the
that the reduction error, Eround is the rounding error and C is article in 1999 [5]. The constant correction system involves
the correction constant. This paper discussed some equations only most significant columns and to compensate least
for determining average error, mean square error and significant columns, a constant that is equal to the average
maximum error of the rounded product. By using this value of the omitted least significant columns will be added. In
technique, the hardware supplieslessen from 25 to 35 percent. variable correction scheme, the correction terms are added by
Kidambi and others was the one who brought the idea for replacing the half adders positioned the upper right edge of the
reducing the area of the multiplier circuit efficiently in 1996 truncated partial product matrixwith the full adders. Figure 3.
[3]. He designed multiplier circuit having two Nbit inputs and shows that the five gates are added to compensate least
outputs. The resultant output is in the form of quantized form significant columns. The obtained values will be added to the
thereby realized by excluding half of the adder cells that are rightmost columns. In terms of circuit complexity the variable
responsible for the propagation of the partial products. Figure correction method has a poor result than fixed correction
2 explains the generation of four terms of the product in the method. The error analysis can be done by exhaustive
parallel multiplier. Ph and Pl representing the most and least simulation. And it provides uncertain maximum errors and a
significant segments, Ah and Al are the most and least parts of lesser mean and variance. The average power dissipation also
‘A’, while Bh and Bl are those of ‘B’. The shaded region reduced by 40%.
represents the cells that generate discarded results due to
truncation. Because of the elimination, least significant part of
the product (Pl ) will not form. So quantization error has
become raised. To reduce or compensate the error,
probabilistic analysis can be done and the obtained
probabilistic bias values fed back to the remaining adder cells.
The proposed concept saves 50% of the area. Finally, he
implemented this truncated multiplier in the digital filter. He
observed better signal-to-noise ratio in the digital filter using
proposed multiplier than the standard multiplier.
Jou and others extended the work carried by Kidambi and
published a paper in 1997 [4]. Jou proposed a fixed-width
multiplier for DSP applications whose inputs and outputs
having N-bits. His aim was to design low error multiplier
without sacrificing performance. The previous architecture
does not have any awareness about the generation of carry.
But Jou and others feed these carry input to the carry Fig.3. Variable Correction Array Multiplier
generating circuit to effectively minimize the error. So in
978-1-5090-4740-6/17/$31.00 ©2017 IEEE
2017 4th International Conference on Signal Processing, Communications and Networking (ICSCN -2017), March 16 – 18, 2017, Chennai, INDIA

L.D Van and others referred [3], [4] and discovered two
never discussed problems. The two problems are selection of
the appropriate indices and whether other minimum-error
multipliers present or absent [6]. He analyzed the problems
and published paper in 2000. He derived the effective
expressions to choose a generalized index and better error
reduction function to lessen truncation error. The designed 2’s
complement fixed-width multiplier was successfully
implemented in a digital FIR filter for speech signal (a)
processing.
E.E Swartzlander and others considered only unsigned and
2’s complement number system in [1], [5]. But he would like
to investigate the negative 2’s complement number system. He
published his research paper in the year 2006 [7]. He realized (b)
that in the conventional two’s complement number system, Fig.4. Proposed truncated multiplication in the unsigned number system. (a)
both truncated multiplier with the constant correction method The bit product matrix (b) The proposed logic
and the truncated multiplier with the variable correction Nicola and others published an article named as “Design of
method provide natural extreme error. But negative two’s Fixed-Width Multipliers with Minimum Mean Square Error”
complement number system overcome the above error by in the year of 2007 at IEEE conference [9]. In earlier days, the
using modified product matrix structure. However, truncation exhaustive search method is used for approximate the
with constant correction has some restrictions. When both the occurred error in the fixed-width multiplier. Which have the
input values are ones, the resultant of the negative two’s limitation of appropriate only for small ‘n’ values. The
complement system having maximal error. If the absolute specialty of this paper is the error compensation function is
value of the partial product bits not formedis greater than the computed analytically for the first time. Also, these analytical
correction constant, the truncated result slightly greater than -1 expressions optimal for all ‘n’ value and preferablyfit for the
instead of +1. Multiplication with variable correction method implementation of fast tree-based multiplier. He compared all
produces no extreme error. The negative two’s complement the existing results with the proposed method for 0.18 μm
number system is appropriatefor truncated multiplication of technology and efficiently implemented in hardware. Nicola
signed inputs than the conventional two’s complement number confirmed the performances of the suggested technique with
system. the test chip of fixed-width multiplier designed with the
E.E Swartzlander and others proposed another technique proposed technique using UMC 0.18 μm technology. The
which provides fewer errors than the earlier methods [8]. He performance of the multiplier enhanced by a reduction in the
published the paper titled “Truncated Multiplication with propagation delay 13% and power dissipation reduction 50%.
Symmetric Correction” in the year 2006. The
suggestedtechnology can be contributed to any number system In [10], Nicola and others prepared an in-depth analysis of
such as unsigned and two’s complement numbers. This truncated multipliers employing variable-correction in 2010.
method offers not only lesser errors, but also less hardware He derived the optimal compensation function, which can be
requirements by a specialized counter. Though the former figured by the quadratic form of the Input Correction (IC)
correction methods successfullyrecompense errors, the terms. The optimum compensation function is responsible for
negative and positive maximum errors are asymmetrical. The themean square error [23-25]. Also he determines the sub-
introduced advised technology makes reduction in various optimal compensation function with a linear combination of
types of errors such as maximum error, mean error and the IC terms. The function result is accountable for the
variance. complexity of the hardware implementation. The performance
Figure 4 shown in below represents the concept of of the proposed system observed from 0.18 μm technology. In
symmetric correction. This method understood by adding the order to verify the proposed multiplier’s efficiency, he
partial products bits in the n-k-1 column and a designed logic designed FIR filter by using a Multiply-Accumulate unit.
to the n-k column to recompense the undeveloped partial Figure 5 shows that general architecture of the MAC unit. The
product bits in the n-k least significant columns. Logical ones input and taps of the filter given as 16 bit fractional 2’s
on top of the matrix is used for compensate the rounding error. complement signed binary values. To overcome the negative
K denotes the number of columns for more accurate results. effect of the overflowing, four guard bits are combined to the
The final product will be taken as P2n−1 ….Pn .This method accumulator. Then the final output is rounded to the 16 bit. He
used specialized counter that reduces the partial product bit of tested MAC performance with several types of multipliers
an additional logic and so experienced very less complexity such as full width, full round, truncated multiplier (with
than preceding methods. h=0,1,2) . From his investigation, truncated multiplier results
are better in terms of area, power and relative power.

978-1-5090-4740-6/17/$31.00 ©2017 IEEE


2017 4th International Conference on Signal Processing, Communications and Networking (ICSCN -2017), March 16 – 18, 2017, Chennai, INDIA

showed, that the computational complexity is evaluated using


the time complexity analysis. And hence, such an analysis of
time complexity of the existing methodology is an open topic.
Table.1. Comparison Results of Different Architectures

Fig.5. General Architecture of MAC unit Electrical


Error Implement
Techno Performance
Archi- Performance ation &
Nicola and others extend the previous work and published a - (Area,
tectures (MAE, MSE, Applicatio
logy Delay,
paper in 2011 [11]. To improve the performance of the Mean) n
Power)
previous work, the author proposed two new topologies. The Swartz
first topology (2- bits fixed-width multiplier) is depending on et al - Typical Typical -
a uniform coefficient quantization. The second technology [1],[5]
(1.5 bit fixed-width multiplier) is established on non-uniform Kidambi Simulation/
et al - Better Typical Butterwort
quantization. The Non-uniform quantization represented with [3] h Filter
certain coefficients is quantized by two bits, while the residual Jou et al 0.8
bits are quantized by a single bit. He designed a multiply- Better Average -
[4] μm
accumulate unit with 90 nm CMOS technology and observed Van Simulation/
- Moderate Moderate
their result. The experimental results are shows that the et al [6] FIR Filter
suggested fixed-width multiplier has better enactment HW/Fixed-
Nicola 0.18
Moderate Good width
regarding area, power and error approximation than full-width et al [9] μm
Multiplier
and full round multiplier. Nicola 0.18 HW/
Good Better
David De Carlo and others [12] has designed the fixed- et al [10] μm FIR Filter
width multiplier using some new techniques and published Nicola 90 HW/
Better Far better
et al [11] nm FIR Filter
paper in 2013. He generates the partial products matrix (PPM)
David 65 Far more
using auxiliary tree (AT) for improving the performance. et al [12] nm
Efficient
Better
HW/MAC
Instead, it leads to accommodate a larger length expression.
To reduce its length, the acquired expression can be further All of the architectures, [3] and [6] introduced in this
simplified by Signed Digit (SD) recoding. The mentioned paper have only been estimated through simulations, where
technique is called as Auxiliary Tree and Signed Digit real time and practical implementations is not performed. The
recoding (AT & SD). The main aim of the author was to find possibility of the methods for practical implementation is an
and reduce Maximum Absolute Error (MAE). In order to open issue. Because the practical implementation of a real
reduce the MAE, linear compensation function has to be hardware can face various challenges, such as consumption of
derived. To determine the compensation function, need the area, power and delay [17,18]. Likewise, none of the
value of coefficients. This is the relation to find the co- methodologies were practically implemented with the
efficient qi = wi 2−B , where wi is the coefficient through proposed MAC in any DSP processors for practical
direct search and ‘B’ is the negative power of two. For greater applications like audio signal processing, video signal
values of ‘B’, the better compensation function will occur, but processing, image compression and enhancement and so on.
the circuit complexity will be more. So, these coefficients Hence it is an open research problem.
should be provided with a less number of bits, to improve the Also, it would be greater, if space complexity analysis may
performance of the practical multiplier. He investigated the be done. Space complexity is defined as the amount of
performance of proposed variations of multipliers in MAC and memory required by an algorithm. If the required amount of
filter also. memory is known, the usage of the area can be calculated
accurately. So, a detailed space complexity analysis is
essential for the effective performance analysis. To understand
III. FUTURE SCOPE AND OPEN ISSUE the space complexity concept thoroughly, a detailed analysis
Table 1 outlines the basic properties of the MAC surveyed can be found in the paper [14]. Hence it is one of the future
in this paper. The performance of the MAC was realized by scopes.
the electrical and error performance. The error performance The clock signal is responsible for the accumulator
deals with the mean square error (MSE), mean absolute error operation in MAC. Since the clock is the source of the power,
(MAE) and mean error. The electrical performance of the it would be impressive, if the clock-skew analysis is carried
MAC was examined by the area, delay and power parameters out. To an effective investigation of the existing methods the
[19-22]. But it would also be better, if the performance power consumption, clock-skew analysis must be performed.
estimates of the MAC could be observed by the time As in [15], the clock-skew analysis will be carried out and it is
complexity analysis. Time complexity is defined as the time also an open issue.
taken for the execution of an algorithm. Hence, such analysis It has been briefly explained above, the performance of
would help to improve the delay further and to identify which the MAC proficiency is improved by the fast computation.
architecture would be faster than others. Pietro and others [13] Furthermore, there is a need for worst-case analysis to find the

978-1-5090-4740-6/17/$31.00 ©2017 IEEE


2017 4th International Conference on Signal Processing, Communications and Networking (ICSCN -2017), March 16 – 18, 2017, Chennai, INDIA

longest computation path. It can be concluded that critical path on Circuits and Systems—I: Regular Papers, vol. 60, No. 9, September
delay analysis [16] is required to reduce the delay through the 2013, pp. 2375-2388.
longest delay path of the circuit thereby optimize the speed of [13] Pietro et al., “Time complexity of evolutionary algorithms for
the computation and increase the performance of the MAC. combinatorial optimization: A decade of results,” International journal
of automation and computing, July 2007, pp. 281-293.
Kevin et al., “The space complexity of pass-efficient algorithms for
IV. CONCLUSION clustering,” published in SODA’06 proceedings of the seventeenth
This paper has summarized the recent trends in the annual ACM-SIAM symposium on discrete algorithm, 2006, pp. 1157-
development of the Multiply-Accumulate (MAC) unit. A 1166.
concise introduction of the fixed-width multiplier has been [14] D.W.Bailey et al., “Clocking design and analysis for a 600-MHz alpha
given to understand the basics of the multiplier. Many microprocessor,” IEEE Journal of solid-state circuits, vol. 33, issue 11,
proposed multiplier methods of MAC have been discussed Nov. 1998, pp. 1627-1633.
[15] D.W.Bailey et al., “Clocking design and analysis for a 600-MHz alpha
along with their salient features. The open research issues and
microprocessor,” IEEE Journal of solid-state circuits, vol. 33, issue 11,
future scopes also discussed to contribute to extend the Nov. 1998, pp. 1627-1633.
possible new research directions. This paper will lead the (Journal)
research community to explore and inspect the new [16] Stefan et al., “Critical path analysis in the network with fuzzy activity
architecture design in the future. times,” published in Elsevier, vol. 122, issue2, Sep 2001, pp. 195-204.
[17] ] I.S. Chang, Y. Tsujimura, M. Gen, T. Tozawa, An ePcient approach
for large scale project planning based on fuzzy Delphi method, Fuzzy
REFERENCES Sets and Systems 76 (1995) 277–288
[18] D. Dobberpuhl et al., “A 200-MHz 64-bit dual-issue CMOS
[1] Michael J.Schulte and Earl E.Swartzlander, “Truncated Multiplication microprocessor,” Digital Tech. J., vol. 4, pp. 35–50, 1992
with Correction Constant,” VLSI Signal Processing, VI, New York, IEEE [19] L. D. Van, S. S.Wang, S. Tengchen,W. S. Feng, and B. S. Jeng,
Press, 1993, pp. 338-396. “Design of a lower error fixed-width multiplier for speech processing
[2] Y .C. Lin, “Single precision multiplier with reduced circuit complexity application,” Proc. IEEE Int. Symp. Circuits and Systems, vol. 3, pp.
130–133, May 1999.
for signal processing application,” IEEE Transactions on Electronic
[20] M. J. Shulte, J. E. Stine, and J. G. Jansen, “Reduced power dissipation
Computers, 1992, pp.1333-1336.
through truncated multiplication,” in Proc. IEEE Alessandro Volta
[3] S.S.Kidambi, F.E.Guibaly, and A.Antoniou, “Area-Efficient Multipliers Memorial Workshop Low-Power Des., 1999, pp. 61–69.
for Digital Signal Processing Applications,” IEEE Transactions on [21] D. De Caro and A. G. M. Strollo, “High-performance direct digital
Circuits and Systems-II, Vol. 43, No. 2, February 1996. frequency synthesizers using piecewise-polynomial approximation,”
[4] J.M.Jou and S.R.Kuang, “Design of a low-error fixed-width multiplier IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52, no. 2, pp. 324–337,
for DSP applications,” Electronics letter, vol. 33, no. 19, September Feb. 2005.
1997, pp. 1597-1598. [22] V. G. Oklobdzija, D. Villeger, and S. S. Liu, “A method for speed
optimize partial product reduction and generation of fast parallel
[5] E.E.Swartzlander, “Truncated multiplication with approximate
multipliers using an algorithmic approach,” IEEE Trans. Comput., vol.
rounding,” Conference Record of the Thirty-Third Asilomar Conference
45, no. 3, pp. 294–306, Mar. 1996.
on Signals, Systems and Computers, 1999, pp. 1480-1483. [23] J. Um and T. Kim, “Optimal bit-level arithmetic optimisation for
[6] L.D.Van, S.S.Wang, and W.S.Feng, “Design of the lower error fixed- highspeed circuits,” Electron. Lett., vol. 36, no. 5, pp. 405–406, Mar.
width multiplier and its application,” IEEE Transactions on Circuits and 2000.
Systems –II, vol. 47, no. 10, oct.2000. [24] P. F. Stelling, C. U. Martel, V. G. Oklobdzija, and R. Ravi, “Optimal
[7] H.Park and E.E.Swartzlander Jr., “Truncated multiplication with circuits far parallel multipliers,” IEEE Trans. Comput., vol. 47, no. 3,
symmetric correction,” Proc. Asilomar Conf. on Signals, systems and pp. 273–285, Mar. 1998.
Computers (ACSSC), oct.2006, pp. 931-934. [25] Y. C. Lim, “Single-precision multiplier with reduced circuit
complexity for signal processing applications,” IEEE Trans. Comput.,
[8] H.Park and E.E.Swartzlander Jr., “Truncated multiplication for the
vol. 41, no. 10, pp. 1333–1336, Oct. 1992.
negative two’s complement number system,” 49th IEEE International
Midwest Symposium on Circuits and System, Aug. 2006, pp. 428-432.
[9] N.Petra, D.De Caro, and A.G.M.Strollo, “Design of fixed-width
multipliers with minimum mean square error,” Proc. IEEE Eur. Conf. on
Circuits Theory and Des. (ECCTD 2007), Sevilla, Spain, Aug. 2007, pp.
464-467.
[10] N.Petra, D.De Caro et al., “Truncated binary multipliers with variable
correction and minimum mean square error,” IEEE Transactions on
Circuits and Systems-I, vol. 57, no. 6, June 2010, pp. 1313-1325.
[11] N.Petra, D.De Caro et al.,“Design of fixed-width multipliers with linear
compensation function,” IEEE Transactions on Circuits and Systems-I,
vol. 58, no. 5, May 2011, pp. 947-960.
[12] Davide De Caro et al., “Fixed-Width Multipliers and Multipliers-
Accumulators With Min-Max Approximation Error,” IEEE Transactions

978-1-5090-4740-6/17/$31.00 ©2017 IEEE


2017 4th International Conference on Signal Processing, Communications and Networking (ICSCN -2017), March 16 – 18, 2017, Chennai, INDIA

Jency Rubia J received B.Tech degree


from the Department of Electronics and
Communication Engineering, Karunya
University, Tamil Nadu, India in 2013. She
got ME degree from Vel Tech Multi Tech
Engineering College, India in 2015.
She is a Full time Research
Scholar at Sri Venkateswara College of Engineering, Anna
University, Chennai, India. She is an Associate Member of
IETE. Her research interest is VLSI circuit design, low power
VLSI, FinFET technology, device modeling, MAC and VLSI
signal processing.

Email: jencyrubia@gmail.com

Sathish Kumar G.A received BE


degree from the Department of
Electronics and Communication
Engineering, Bharathidasan University,
Tamil Nadu, India. He received ME
degree with specialization Applied
Electronics from the PSG College of Technology, Coimbatore,
India. He received PhD degree from Anna University,
Chennai.
He is currently working as a Professor in the
Department of Electronics and Communication, Sri
Venkateswara College of Engineering, Anna University,
Chennai. He had sixteen years working experience. He is a
recognized supervisor of Anna University for guiding PhD
scholars and Doctoral Committee member. He is reviewer of
ELSEVIER and British journal of applied science and
technology. His research interests are network security and
cryptography, networking and VLSI Design and VLSI signal
processing algorithms.

Email: sathish@svce.ac.in

978-1-5090-4740-6/17/$31.00 ©2017 IEEE

Das könnte Ihnen auch gefallen