Scope and Objective

27
CHAPTER 2
LITERATURE REVIEW ON VEDIC MULTIPLIERS
A detailed investigation about the multiplication using Vedic

Mathematics sutras is presented in this chapter. In addition, the theoretical
formulations involved in the computation are also discussed.
2.1 SCOPE AND OBJECTIVE
Arithmetic operations using Vedic Mathematics are a growing

trend in the FPGA community. It has become critical to optimize
multiplication units for standard FPGA technology. But, the FPGA design
space is very different from the VLSI design space; thus, optimizations for
FPGAs differ significantly from optimizations for VLSI. In particular, the
FPGA environment constrains the design space such that only limited
parallelism is effectively exploited to reduce the latency. Obtaining the right
balances among the clock speed, the latency and the area in FPGAs is
particularly challenging. This article presents the implementation details for
binary multipliers and Vedic multipliers for FPGAs and the implementation
of MAC using FPGA technology. The area requirement is approximately 500
slices for the adder and for the multiplier it is below 750 slices. In this
chapter, various research works related to binary multiplication using Vedic
sutras are reviewed.
28
2.2 RELATED WORKS
2.2.1 Simulation level
Mark Santoro & Mark Hurwitz (1989) presented 64  64-bit

iterating multiplier, the Stanford Pipelined Iterative Multiplier (SPIM). The
pipelined array consists of a small tree of 4:2 adders. The 4:2 tree is better
suited than a Wallace tree for a VLSI implementation because it is a more
regular structure. A 4:2 carry-save accumulator at the bottom of the array is
used to iteratively accumulate partial products, allowing a partial array to be
used, which reduces the area. In piped mode SPIM can initiate a multiply
every four cycles (47 ns), for a throughput in excess of 20-million multiplies
per second. SPIM required an average of 72 mA at 85 MHz, and only 10 mA
in standby mode. SPIM contains 41 000 transistors with a core size of
3.8X6.5 mm, and an array size of 2.9 X 5.3 mm.
Jalil Fadavi & Ardekani (1992) proposed the architecture for the
design of an M by N Booth encoded parallel multiplier and also explained the
algorithm for reducing the delay inside the branches of the Wallace tree
section. The final stage of adding two N + M – I bit numbers is done by an
optimal carry select adder stage. The algorithm for optimal partitioning of the
N+M-l bit adder is also presented. Gary W Bewick (1994) analyzed different
multiplication algorithms and compared their performance in terms of speed,
area and power. The summation network and partial product generation logic
consume most of the power and area of a multiplier, so there may be more
opportunities for improving multipliers by optimizing summation networks to
try to minimize these factors. Reducing the number of partial products and
creating efficient ways of driving the long wires needed in controlling and
providing multiples to the partial product generators are areas where further
work may prove fruitful.
29
Oscal T.C. Chen et al. (2003) presented low-power 2‘s

complement multiplier by minimizing the switching activities of partial
products using the radix-4 Booth algorithm. A smaller effective dynamic
range is processed to generate Booth codes, thereby increasing the probability
that the partial products become zero. The dynamic-range determination unit
is employed to control input data paths, the multiplier with a column-based
adder tree of compressors or counters is designed. To further reduce power
consumption, the two multipliers based on row-based and hybrid-based adder
trees are realized with operations on effective dynamic ranges of input data.
Functional blocks of these two multipliers can preserve their previous input
states for non-effective dynamic data ranges and thus, reduce the number of
their switching operations.
Emil Axelsson (2003) explored the use of Hardware Description

Language (HDL) Lava for the description and analysis of binary multiplier
circuits. Ron Waters & Swatzlander (2010) designed high speed Wallace
multipliers that use full adders and half adder in their reduction phase. Half
adder does not reduce the number of partial product bits. The reduction of the
number of half adders in a multiplier reduces the complexity. A modification
to the Wallace reduction is presented that ensures that the delay is the same as
for the conventional Wallace reduction.
Sumit Vaidhya & Deepak Dandekar (2010) presented a

comparative study of different multipliers for low power requirement and
high speed. The comparison is shown in Figure 2.1. The paper gives
information of ―Urdhva Tiryakbhyam‖ algorithm of Ancient Indian Vedic
Mathematics which is utilized for multiplication to improve the speed, area
parameters of multipliers. Vedic Mathematics suggests one more formula for
multiplication of large number, i.e. ―Nikhilam Sutra‖ which can increase the
speed of the multiplier by reducing the number of iterations. The delay of
30
Nikhilam Sutra Vedic multiplier is compared with conventional multipliers

and is given in Table 2.2. Chitralekha Mehera (2012) explored the
explanations about the Sutra ―Nikhilam Navatascaramam Dasatah‖ of Vedic
Mathematics in computing mathematical problems viz., 10‘s complement,
multiplication tables, addition, subtraction, multiplication and division
considering various examples.
Table 2.1 Comparison of conventional binary multipliers
Table 2.2 Delay comparison of Nikhilam Sutra with conventional

multiplier
Nikhilam Sutra
Array multiplier Booth multiplier
Name of Vedic multiplier
the
multiplier 16x16 16x16 16x16
8x8 bits 8x8 bits 8x8 bits
bits bits bits
Delay
47 92 117 232 27 39
(ns)
31
Jayashree Taralabenchi et al. (2012) attempted the implementation

of the prototype of binary multiplier using Booth algorithm (for signed
number) and the systolic array multiplication algorithm (for unsigned
number). Jasbir Kaur & Kavitha (2013) presented the design and
implementation of Wallace tree multiplier. The Speed of Wallace tree
multiplier is enhanced by using compressor techniques. The complexity is
minimized by reducing the number of half adders and full adders used in a
multiplier.
Energy recovery is proving to be a promising approach for low-

power circuit design, whose advantage results from its inherent nature of
deriving a constant current from the power clock and the FETs working with
minimum voltage between the source and drain at any instance of time.
Belgudri Ritesh Appasaheb & Kanchana Bhaaskaran (2013) implemented an
efficient multiplier using ancient computational techniques using charge
recovery logic. This circuit is compared against the existing Vedic multiplier
circuits designed using conventional CMOS logic, to validate the
performance. A 4  4 Vedic multiplier using 2 N-2P type of charge recovery
logic structure is implemented using industry standard SPICE tools. Manchal
Ahuja (2013) presented a paper with multiplier design using different
bypassing techniques. The comparison is made with different adders. Vishal
Singla (2013) explained divide and conquer method well in his work and this
method is utilized in the design of the floating point multiplier.
Radheshyam Gupta et al (2014) designed a high speed, low power

digital multiplier using Vedic multiplication algorithm with a very efficient
leakage control technique called Multiple channel CMOS (McCMOS)
technology. The 16 bit Vedic multiplier has been designed using McCMOS
technology and 65nm and 45nm node technology are used for comparative
analysis.
32
Mogre & Bhalke (2015) presented unsigned 22 High-Speed 16

bit matrix multiplier using Virtex 5 FPGA. The hierarchical structuring has
been used to optimize for multipliers using ―Urdhava Trigyagbhyam‖ sutra
(vertically and crosswise) which is one of the sutra for Vedic mathematics.
Pratyusha et al. (2016) compared three fast multipliers Dadda, Wallace and
Booth multiplier and implemented them using cadence. Yogita Bansal &
Charu Madhu (2016) designed a novel architecture of Vedic multiplier with
‗Urdhava-tiryakbhyam‘ methodology for 16 bit multiplier and multiplicand
with the use of compressor adders. The results are given in Table 2.3.
Equations for each bit of 32 bit resultant are calculated distinctly and
compressor adders are used to implement these equations. They are chosen as
they decrease vertical critical delays in comparison to the conventional
architectures of compressors implemented using half and full adders only and
so make the multiplier fast.
Table 2.3 Comparison table of combinational delay for 16 bit Vedic

multipliers
Combinational Percentage
16 bit multiplier
delay (ns) improvement (%)
Array multiplier 43.946 27
Wallace tree
46.046 30.5
multiplier
Booth multiplier 37.041 13.6
Yogita Bensal 32 -
33
2.2.2 Implementation Level
Niichi Itoh et al. (2001) presented an efficient layout method for a

high-speed multiplier. The Wallace-tree method is generally used for high-
speed multipliers. In the conventional Wallace tree, however, every partial
product is added in a single direction from top to bottom. Therefore, the
number of adders increases as the adding stage moves forward. As a result, it
generates a dead area when the multiplier is laid out in a rectangle. To solve
this problem, we propose a rectangular Wallace-tree construction method. In
our method, the partial products are divided into two groups and added in the
opposite direction. The partial products in the first group are added
downward, and the partial products in the second group are added upward.
Using this method, the dead area is eliminated. Also, the carry propagation
between the two groups is optimized to realize high speed and a simple
layout.
Virendra Magar (2013) presented high speed and low

combinational delay multiplier i.e. the fundamental block of MAC based on
ancient Vedic mathematics. It enables parallel generation of partial products
and eliminates unwanted multiplication steps. Multiplier architecture is based
on generating all partial products and their sums in one step. Chipscope VIO
is used to give random inputs of desired values by user, on which the
proposed Vedic multiplication is performed. The proposed algorithm is
modeled using VHDL i.e. Very High Speed integrated circuit Hardware
Description Language. The propagation time of the proposed architecture is
found quiet less. The Xilinx Chipscope VIO generator allows us to give the
runtime inputs. The Xilinx Chipscope tool will be used to test the FPGA
inside results while the logic running on FPGA. The Xilinx Spartan 3 Family
FPGA development board is used in this circuit.
34
Premanandha et al. (2013) proposed an 8-bit multiplier using a

Vedic Mathematics (Urdhva Tiryagbhyam sutra) for generating the partial
products. The partial product addition in Vedic multiplier is realized using
carry-skip technique. An 8-bit multiplier is realized using a 4-bit multiplier
and modified ripple carry adders. In the proposed design, the number of logic
levels has been reduced, thus reducing the logic delay. Pavan Kumat et al.
(2013) described the implementation of an 8-bit Vedic multiplier enhanced in
terms of the propagation delay when compared with conventional multipliers.
A 8-bit barrel shifter which requires only one clock cycle for ‗n‘ number of
shifts is utilized. The design is implemented and verified using FPGA and ISE
Simulator. The core was implemented on Xilinx Spartan-6 family FPGA.
Kayal et al. (2013) presented a high speed, low power digital

multiplier by taking the advantage of Vedic multiplication algorithms with a
very efficient leakage control technique called McCMOS technology. A 8 bit
Vedic multiplier using Multiple channel CMOS (McCMOS) technology, by
using 130 nm, 90 nm, 65 nm & 45 nm node technology is designed and
comparative simulation results indicating the performance of the circuit are
also presented. Khaldoon & Abhulmughni (2014) presented a new, efficient
reduction scheme to implement tree multipliers on field programmable gate
arrays (FPGAs) in a way that is more suitable for the lookup tables (LUTs)
structure in FPGAs. The scheme is based on using a library of m:n counters.
The aim of this scheme is to minimize the number of reduction steps to
maximize reduction ratio which in turn reduces area and delay.
Panwit Tuwanuti & Nopphagaw Thongbai (2014) described the

mathematic parallel process idea by using Vedic Multiplier techniques.
Urdhva Tiryakbhyam Sutra and Nikhilam Sutra are used to implement on
Multi core processing with MPICH2 (MPI protocol). Urdhva Tiyakbhyam
Sutra is used to split long digits to sub block for distributing to some other
35
core. Nikhilam Sutra is used to reduce value to improve some of computation

effectively.
Jagannatha et al. (2014) has proved the efficiency of Urdhva

Triyagbhyam—Vedic method for multiplication, which strikes a difference in
the actual process of multiplication itself and enables the parallel generation
of intermediate products. The Field Programmable Gate Array (FPGA)
realization is achieved and next standard cell–based ASIC design of the
multiplier is also realized.
Sowmiya et al. (2013) designed branch based –pass transistor

logic or ULPFA full adder and high performance, low power full adder is
designed and compared with the existing logics such as CPL and DPL .The
adders are designed and implemented in 8  8 Vedic multiplier and the
parameters like area, transistor count and power dissipation are compared by
using Tanner EDA tool. Poornima et al. (2013) presented high speed 8x8 bit
Vedic multiplier architecture which is quite different from the conventional
method of multiplication like add and shift. Further, the Verilog HDL coding
of Urdhva Tiryakbhyam Sutra for 88 bits multiplication and their FPGA
implementation by Xilinx Synthesis Tool on Spartan 3 kit have been done and
the output has been displayed on LED‘s of Spartan 3 kit. Pohokar et al. (2015)
implemented the basic building block 16  16 Vedic multiplier based on
Urdhva- Tiryagbhyam Sutra in FPGA with minimum propagation delay.
Jinesh et al. (2015) proposed a new architecture for high end

processor, which gives better performance than existing architectures. In this
work digital coding is done in Verilog HDL, synthesis of the design is done
by using Xilinx ISE 14.7 and Cadence encounter RTL Compiler. Analysis of
the digital system is also done using powerful cadence tool Encounter.
36
Supriya Srimani et al. (2015) designed a multiplier circuit based on

Vedic sutras and method for DSP operations based on ancient Vedic
mathematics is contemplated. The multipliers based on Vedic multiplication
sutra ‗Urdhva-Tiryakbhyam‘ are designed and the design of 4 × 4 has been
sketched in DSCH2. The layout of those circuits has also been generated by
Microwind. The noise power has been calculated by T-Spice-13 in 45 nm
Technology. This algorithm is implemented in MATLAB and also compared
with the inbuilt functions in MATLAB.
Zain Shabbir (2015) designed a low-power, high-speed 4  4

multiplier using Dadda algorithm. The full adder blocks used in this multiplier
have been designed using reduced-split precharge-data driven dynamic sum
logic. Flip flops used in the pipeline registers have been designed to increase
input signal noise margin, resulting in the minimization of output signal
glitches. The multiplier circuit is implemented in 1P-9M Low-K UMC 90nm
CMOS process technology. Post-layout simulations are carried out using
Cadence Virtuoso.
Josmin Thomas et al. (2015) compared different adders such as

Carry look ahead adder (CLA), Carry select adder (CSLA), Ladner Fischer
adder (LFA), Brent Kung adder (BKA), Kogge Stone adder (KSA) and
compressors in Vedic multiplier. The number of adders can be minimized by
using special adders called compressors which can add the more number of
bits at a time. This paper gives information of Urdhva Tiryakbhyam algorithm
of Vedic Mathematics which is utilized for multiplication to improve the
speed and area of multipliers. The power consumption of Vedic multiplier
depends on the type of the adder used so a comparison which has already
done in RTL Cadence compiler is taken for the comparative study here.
37
2.2.3 Application
Now a day, the implementation of Vedic multipliers in different

fields is growing up. Ashwanth and Premanandha, 2013 proposed a Urdhava
Tiryakbhyam Q-format multiplier using Verilog hardware description
language and structural form of coding. The Q15 and Q31 format multipliers
are designed using the building block of 8  8 and 16  16 Urdhava
Tiryakbhyam integer multipliers which in turn are made up of 4  4 multiplier
blocks. Amina Naaz et al. (2014) applied the design of Vedic multiplier using
Carry Select Adder (CSLA) for FIR architecture.
Nithu Mangalath et al. (2014) designed an efficient universal

multi-mode floating point multiplier using Vedic Mathematics. In the
proposed technique single, precision and quadruple precisions are included
within a single architecture and the multiplier is designed using Vedic
mathematics. Laxman et al. (2014) presented a design of efficient complex
number multiplier using the Vedic sutra ‗‗Urdhva Tiryakbhyam‘‘ from
ancient Indian Vedic mathematics. The Convolution is a mathematical way of
combining two signals to form a third signal. For convolving two signals,
multipliers are the most important and main component.
Saji et al. (2015) proposed a high speed Vedic multiplier using

multiplexer based adder. Proposed design is simulated using ModelSim and
synthesized using Xilinx ISE 14.7. When compared with existing Vedic
multipliers, proposed design shows a significant improvement in speed. Richa
Sharma et al. (2015) presented the design of high speed multiplier and
squaring architectures based upon ancient Indian Vedic mathematics sutras. In
existing Vedic multiplier architectures, the partial product terms are computed
in parallel and then added at the end to get the final result. Here, all the partial
products are adjusted using concatenation operation and are added using
38
single carry save adder instead of two adders at different stages. The high
speed Vedic multiplier architecture is then used in the squaring modules.
Balwir et al. (2015) designed a multiplier using Peasent algorithm

which processes faster than the other and will consume less area and power.
Pratyusha Chowdari & Beatrice Seventline (2016) deals with an architectural
approach of designing an Adaptive Filter (AF) with Vedic Multiplier (VM)
and is an efficient method in achieving less power consumption without
altering the filter performance-called as a Low Power Adaptive Filter with
Vedic Multiplier (LPAFVM). LMSA-Least Mean Square algorithm is used
for designing the FIR filter. An adaptation process takes place by performing
convergence of output computed by the VF to a desirable output of an LMS
algorithm is used.
From these papers, it is observed that various multipliers are

designed, simulated using Hardware Description Language and implemented
in hardwares. In some papers, the review of multiplication algorithms is
given. Among the conventional multipliers, Wallace produces the output with
minimum delay. Therefore, modifications in the adder tree structures using
compressors are designed and tested. The Wallace tree multiplier with Booth
encoding is the fastest multiplier among all the binary multipliers. But, it does
not have a linear structure.
Urdhava Tiryakbhyam is normally considered as a Vedic

multiplier because it is suitable for any input. This algorithm is widely used in
many digital signal processing applications. But very few papers are designed
based on Nikhilam sutra. This sutra is not efficient due to its limitations in the
usage of inputs. Therefore, the multiplier based on Nikhilam sutra is not used.
Normally, Vedic multipliers are used to reduce the computational complexity.
Previously, it is not implemented practically. But the research is focussed
towards the implementation of different Vedic sutras. In some work, the
39
comparison between conventional binary multipliers and Vedic multipliers is

performed. The critical path delay for Urdhava multiplier is increased when
the amount of bits increase. So, this method is like an array multiplier. In
Nikhilam sutra, the multiplication of the complement of the numbers is
carried out. If the number is closer to the base value, the resultant value will
be small and the number of bits will be reduced. But if it is not closer, the
complement value will have greater magnitude and the bits cannot be
reduced.
Nikhilam Sutra is not used practically due to its limitations in the

ranges of inputs. If the numbers are closer to the base value, the algorithm is
efficient. The reminders are calculated by finding 2‘s complement of the
numbers. The multiplier is required to multiply the 2‘s complement of the
multiplicand and the multiplier. Again, this requires N-bit multiplier. But, in
this work, a new multiplication algorithm based on Nikhilam Sutra and
Karatsuba is designed. From the Nikhilam sutra, the remainder value is fixed
to N-2 bits. This requires only N-2 bits. By using Karatsuba algorithm, the
multiplicative structure is designed by reducing the multipliers by additions
and shift operations. By combining two algorithms, the computational time is
reduced. For squaring the numbers, no algorithm is derived based on
Nikhilam sutra. Here, in this work, squaring module using Nikhilam sutra is
designed and implemented in FPGA. In the next chapter, the multiplier based
on Nikhilam sutra and Karatsuba is designed.

Scope and Objective

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Scope and Objective

Hochgeladen von

Copyright:

Verfügbare Formate

27

LITERATURE REVIEW ON VEDIC MULTIPLIERS

A detailed investigation about the multiplication using Vedic

2.1 SCOPE AND OBJECTIVE

Arithmetic operations using Vedic Mathematics are a growing

2.2 RELATED WORKS

2.2.1 Simulation level

Mark Santoro & Mark Hurwitz (1989) presented 64  64-bit

Oscal T.C. Chen et al. (2003) presented low-power 2‘s

Emil Axelsson (2003) explored the use of Hardware Description

Sumit Vaidhya & Deepak Dandekar (2010) presented a

Nikhilam Sutra Vedic multiplier is compared with conventional multipliers

Table 2.1 Comparison of conventional binary multipliers

Table 2.2 Delay comparison of Nikhilam Sutra with conventional

Jayashree Taralabenchi et al. (2012) attempted the implementation

Energy recovery is proving to be a promising approach for low-

Radheshyam Gupta et al (2014) designed a high speed, low power

Mogre & Bhalke (2015) presented unsigned 22 High-Speed 16

Table 2.3 Comparison table of combinational delay for 16 bit Vedic

Array multiplier 43.946 27

Booth multiplier 37.041 13.6

2.2.2 Implementation Level

Niichi Itoh et al. (2001) presented an efficient layout method for a

Virendra Magar (2013) presented high speed and low

Premanandha et al. (2013) proposed an 8-bit multiplier using a

Kayal et al. (2013) presented a high speed, low power digital

Panwit Tuwanuti & Nopphagaw Thongbai (2014) described the

core. Nikhilam Sutra is used to reduce value to improve some of computation

Jagannatha et al. (2014) has proved the efficiency of Urdhva

Sowmiya et al. (2013) designed branch based –pass transistor

Jinesh et al. (2015) proposed a new architecture for high end

Supriya Srimani et al. (2015) designed a multiplier circuit based on

Zain Shabbir (2015) designed a low-power, high-speed 4  4

Josmin Thomas et al. (2015) compared different adders such as

Now a day, the implementation of Vedic multipliers in different

Nithu Mangalath et al. (2014) designed an efficient universal

Saji et al. (2015) proposed a high speed Vedic multiplier using

Balwir et al. (2015) designed a multiplier using Peasent algorithm

From these papers, it is observed that various multipliers are

Urdhava Tiryakbhyam is normally considered as a Vedic

comparison between conventional binary multipliers and Vedic multipliers is

Nikhilam Sutra is not used practically due to its limitations in the

Das könnte Ihnen auch gefallen