ASIC Design of A High Speed Low Power Circuit For Factorial Calculation Using Ancient Vedic Mathematics

Microelectronics Journal 42 (2011) 13431352
Contents lists available at SciVerse ScienceDirect
Microelectronics Journal
journal homepage: www.elsevier.com/locate/mejo
ASIC design of a high speed low power circuit for factorial calculation using
ancient Vedic mathematics
P. Saha a, A. Banerjee b, A. Dandapat c, P. Bhattacharyya d,n
a
School of VLSI Technology, Bengal Engineering and Science University, Shibpur, Howrah 711103, West Bengal, India
Department of Electronics and Communication Engineering, JIS College of Engineering, Kalyani 741235, India
c
Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata 700032, India
d
Department of Electronics and Telecommunication Engineering, Bengal Engineering and Science University, Shibpur, Howrah 711103, West Bengal, India
b
a r t i c l e i n f o
abstract
Article history:
Received 28 January 2011
Received in revised form
2 September 2011
Accepted 5 September 2011
Available online 29 September 2011
ASIC design of a high speed low power circuit for factorial calculation of a number is reported in this
paper. The factorial of a number can be calculated using iterative multiplication by incrementing or
decrementing process and iterative multiplication can be computed through parallel implementation
methodology. Parallel implementation along with Vedic multiplication methodology for calculation of
factorial of a number ensures signicant reduction in propagation delay and switching power
consumption due to reduction of stages in multiplication process, in comparison with the conventionally used Vedic multiplication methodologies like Urdhva-tiryakbyham (UT) and Nikhilam Navatascaramam Dasatah (NND) based implementation methodology. Transistor level implementation was
carried out using spice specter with standard 90 nm CMOS technology and the results were compared
with the above mentioned conventional methodologies. The propagation delay for the calculation of
4-bit factorial of a number was only 42.13 ns while the power consumption of the same was
58.82 mW for a layout area of 6 mm2. Improvement in speed was found to be 33% and 24%
while corresponding reduction of power consumption in 34.48% and 24% for the factorial
calculation circuitry in comparison with UT and NND based implementations, respectively.
& 2011 Elsevier Ltd. All rights reserved.
Keywords:
Vedic multiplier
Incrementer
Zero detectors
Decrementer
Factorial design
High speed.
1. Introduction
ASIC implementation of the logarithmic, exponential, trigonometric and other arithmetic circuits plays a pivotal role in the
eld of general and special purpose computer [1,2]. Generally,
such type of computations is implemented through software
programs, like NewtonRaphson, TaylorMacLaurin series, or
polynomial approximations. The computation of the factorial
circuitry is of immense importance for ASIC implementation of
such series (NewtonRaphson, TaylorMacLaurin series, or polynomial approximations).
The principal components required for hardware implementation
of factorial calculation circuitry are incrementer/decrementer and
multiplier for successive multiplication. Therefore the successive
multiplication and incrementer/decrementer limits the overall speed
of the factorial implementation technique. Substantial amount of
work has so far been reported on multiplier [310], such as shift and
Corresponding author. Tel.: 91 3326684561; fax: 91 3326682916.

E-mail addresses: pb_etc_besu@yahoo.com (P. Bhattacharyya),
sahaprabir1@gmail.com (P. Saha), banerjee.arindam1@gmail.com (A. Banerjee),
anup.dandapat@gmail.com (A. Dandapat).
0026-2692/$ - see front matter & 2011 Elsevier Ltd. All rights reserved.
doi:10.1016/j.mejo.2011.09.001
add multiplier, tree multiplier, array multiplier, signed digit

multiplier, etc., to improve the operating speed and power
consumption. The common multiplication is done using shift
and add operations [3], where the sequential mechanism is used
and produces large propagation delay. In parallel multipliers, the
partial products are generated through Booths encoding [4]
techniques and the partial products are added with the help of
parallel adders, therefore the generation and addition stages limit
overall speed of the parallel multiplier [5]. To reduce the number
of partial products, modied Booths algorithm [6] is one of the
most popular mechanisms while to achieve speed improvements
Wallace Tree algorithm [79] that reduce the number of sequential addition stages can be incorporated. Another solution for
partial product addition has been reported by Wang in 1995,
where the compressors [10] are used for the partial products
addition stages, which reduces the carry propagation signicantly. Canonical Signed Digit (CSD) multiplier [1113] is the
alternate solution for fast multiplication, but the procedure for
generation of the partial products are same and a large number of
pre- and post-processing units are required for binary to CSD
conversion. However, with increasing parallelism, the amount of
shifts between the partial products and intermediate sums to be
added will increase, which may result in the reduction of speed,
1344
P. Saha et al. / Microelectronics Journal 42 (2011) 13431352
increase in silicon area due to irregularity of structure and also

increased power consumption due to increase in interconnections
resulting from complex routing.
Signicant amount of research work has so far been published
on incrementer/decrementer with an aim to improve speed of
operation using program counter [14] and frequency divider [15].
But the bottleneck of the above mentioned methods is the
incorporation of sequential mechanism to implement the design.
An incrementer/decrementer can also be implemented using
adder/subtractor [16], but the major drawback being its low
operating speed due to long carry propagation from LSB to MSB
[17]. In this work, with an aim to circumvent the above drawbacks, multiple look-ahead and dynamic circuitry based incrementer/decrementer [17] has been adopted, and circuit level
implementation using multiplexer [18] has been carried out for
computation of high speed factorial circuitry.
In algorithmic and structural levels, a lot of multiplication
techniques had been developed to enhance the performance of
multiplier; by reducing the partial products and subsequent
addition stages. Vedic Mathematics [19] is the ancient system
of Indian mathematics, which has a unique technique of calculations based on 16 Sutras (Formulae). Urdhva-tiryakbyham
(a Sanskrit word means vertically and crosswise) formula, is
used for smaller number multiplication. Few research papers
[2023] have so far been published using Urdhva-tiryakbyham
formula aiming for fast multiplication. However, Mehta and Gwali
[24] explored multiplication using Urdhva-tiryakbyham sutra
indicating the carry propagation issues.
Likewise, a multiplier design using all from 9 and last from
10 formula (Nikhilam Navatascaramam Dasatah sutra) has
been reported by Tiwari et al. [25] in 2009, but without any
hardware module implementation in the circuit level. Very
recently, we [26] reported a general multiplier based on the same
principle of NND sutra but the specic application like parallel
multiplication methodology was not explored. Two main formulae of ancient Vedic mathematic called Urdhva-tiryakbyham
(UT) and Nikhilam Navatascaramam Dasatah (NDD) were
used for multiplication process facilitating factorial circuitry
implementation. In this approach, an (N N) bit multiplier
implementation was transformed into just one small multiplication (bit length 5N) and one adder/subtractor implementation.
The reported multiplication methodology was compared with
previously reported Vedic mathematical architectures [20,26].
It was observed that proposed design not only produced less
number of partial products but also exhibited regular structure,
thereby leading to an optimized layout area. As, the factorial of a
number is the product of all the positive integers up to it, the
factorial of a number can be calculated by the iterative multiplication of the given number with the decremented values of the
given number up to 1 or the iterative multiplication up-to the
number starting from 1. On account of recursive multiplication,
the overall factorial circuitry exhibit irregular array architecture
leading towards large layout area and higher propagation delay.
In this work, this problem was resolved by the parallel implementation methodology. The resulted multiplication procedure
was applied for high speed factorial computation circuitry and
results were compared with other Vedic multiplication [20,26]
based implementations. The overall factorial circuitry and all the
individual circuit modules like, incrementer/decrementer, zero
detector, different bit length multiplication, etc., were simulated
and performance parameters such as propagation delay and
dynamic switching power consumption were calculated through
spice simulator in 90 nm CMOS technology. It was revealed that
computation of 4-bit factorial of a number circuitry consumed
58.82 mW power with a delay of 42.13 ns for a layout area
of 6 mm2.
2. Algorithm for factorial computation

The factorial of a number is the product of all the positive
integers less than or equal to it. The factorial of a number can be
calculated by the iterative multiplication of the given number
with the decremented values of the given number or the iterative
multiplication up-to the number starting from 1. This does not
include zero. Factorial zero is dened as being 1. Mathematically
factorial of input number can be written as:
8
1
n0
>
< n
Y
n!
1
k n Z0
>
:
k1
2.1. Mathematical representation

Consider n bit numbers X, and then X can be represented as
X 2k z
where k is the exponent of X and z is the residue. The next input to

the multiplier would be either incremented or decremented value
of X. Assume, next input is equal to X 71
X 7 1 2k z 71
Then factorial of a number can be computed either by

decremented value up-to 1 and iterative multiplication of the
number or incremented value starting from 1 up-to the same
number. Mathematically the formula can be represented as
8
1
for n 0
>
< n
Y
4
Result P I
k for n Z 1
>
:
k1
From Eq. (4) general expression of the product terms after nth
iteration is equal to Pn.
Mathematically PI can be formulated as:
PI 2k PI1 zP I1 7 IP I1
Here I is the number of iteration to be executed to calculate the

factorial of a number. Proof of Eq. (5):
Assume I1
P1 XX 7 1 2k z2k z 71 22k 2k 2z z2 7 2k z
P1 2k Xz zz 7X
Assume I2
Now consider again Y is either incremented or decremented by
one. So Y is replaced by its new value
P2 P 1 X 7 2
P2 2k P 1 zP 1 7 2P 1
Flow chart representation of algorithm for computation of

factorial of a number is described in Fig. 1. In this diagram result
is initialized to 1, because factorial 0 is dened as 1. First the
input number is checked by a zero detector circuitry, if the input
number is equal to 0 then the result is directly displayed 1
(without entering the actual factorial calculation circuit leading to
less power consumption) otherwise the input number is fed
towards the inner loop. Basically, there are two ways for the
computation of factorial of a number, i.e: (i) by decrementing
process from that number and recursive multiplication; and (ii)
incrementing process up-to that number starting from 1 and
recursive multiplication. Up-Down signal has been considered to
for both way computations. If the UpDown signal is low then
it follows the decrement and iterative multiplication process,
otherwise it follows the increment and iterative multiplication

process. The iterative counter (i) has been initialized as 1 for
incrementing process and initialized as input number for decrementing process. A comparator circuitry has been introduced for
comparing the iterative counters result; it is equal to the input
number or it is greater than 0 for incrementing and decrementing processes, respectively.
To compute a value of factorial of a number, the hardware consist
four parts, viz., (i) zero detector, (ii) decrementer/incrementer, (iii)
comparator, and (iv) multiplier. The main function of the zero
detectors is to check the input values, whether it is zero or not. If
the input number is zero, then output of the zero detectors promoted
to the nal results, i.e. equals to 1, otherwise it passes through
recursive multiplication procedure. The procedure for the iterative
multiplication can be implemented in two ways, like, decrementing
1345
process or incrementing process. Assume, input is a 4-bit number

and the rst time, input is either incremented or decremented, as a
result, input number and its incremented/decremented numbers are
multiplied and produce 8 bit output. The rst time incremented or
decremented result is again incremented/decremented for second
time and multiplied with the 8 bit output, which was produced from
multiplier block. Thus, again 8 4 bit multipliers are required, and
produce 12 bit output. The procedure continued till the last iteration.
2.2. Drawback of existing algorithm
The bottleneck of existing algorithm can be envisaged as; due
to the recursive multiplication, the length of the multiplier
increases with respect to iteration, leading to excessive complexity of hardware, which in turn results in excessive delay and large
power consumption. Moreover, in this case at a time only one
multiplication can take place, thereby overall propagation delay
further increases. To solve these problem, parallel implementation methodologies has been considered in the proposed circuit
implementation, which reduces the multiplication stages.
2.3. Modied factorial calculation algorithm
In this section, factorial calculation algorithm has been computed in parallel manner leading towards high speed operation.
The pseudo-code for the modied algorithm is given below;
where input of the given number is initialized as Num. Arr1[i]
has been considered for storing the initial value, and Arr2[j],
Arr3[k], Arr4[l], and Arr5[m], has been considered for storing the
second or higher stages multiplication values, respectively.
Fig. 1. Flow chart representation of algorithm for calculation of factorial of an

integer number.
Fact(Num)
for each i from 0 to Num-1
Arr1[i]i 1
end for
for each i from Num to 15
Arr1[i]1
end for
for each j from 0 to i/2
Arr2[j] Arr1[2nj]nArr1[2nj1];
end for
for each k from 0 to j/2
Arr3[k]Arr2[2nk]nArr2[2nk 1];
end for
Fig. 2. Multiplication methodology for modied factorial computation.
1346
for each l from 0 to k/2

Arr4[l]Arr3[2nl]nArr3[2nl1];
end for
for each m from 0 to l/2
Arr5[m] Arr4[2nm]nArr4[2nm1];
end for
return Arr5[m]
Fig. 2 shows the schematic representation of above mentioned
hardware implementation methodology, where, all the input
registers has been initially lled with 0001. Parallel implementation methodology has been adopted for computation, and as a
result, 4-bit factorial of a number can be calculated within only
4 stages, thereby, signicant reduction in the propagation delay
takes place. Vedic multiplication methodologies have been implemented for multiplication.
3. Vedic mathematics for multiplication

The potentiality of Vedic Mathematics, especially for calculations regarding multiplications, was reported by Sri Bharati Krsna
Thirthaji Maharaja, in the form of Vedic Sutras (formulae) [19].
He explored the mathematical potentials from Vedic primers and
showed that the mathematical operations can be carried out
mentally to produce fast answers using these Sutras. In this
paper, only NDD and UT, formulae are used.
3.1. Nikhilam Navatascaramam Dasatah (NDD) sutra
NDD means all from 9 and last from 10. The same formula is
applicable for the implementation of multiplier. Using the same
methodology (N N) multiplier is transformed into addition/
subtraction and a small ( 5N) multiplication, thereby reduces
carry propagation leading towards high speed operation. A simple
example will sufce to clarify the operations:
As shown in Fig. 3, the multiplier and the multiplicand are
written in two rows followed by the differences of each of them
from the chosen base, such that there now exist two columns of
numbers, one consisting of the numbers to be multiplied (Column
1) and the other consisting of their compliments (Column 2). The
product also consists of two parts, which are divided by a vertical
line for the purpose of illustration. The right hand side (RHS) of
the product can be obtained by simply multiplying the numbers
of the Column 2 (2 36). The left hand side (LHS) of the product
can be found by cross subtraction of the second number of
Column 2 from the rst number of Column 1 or vice versa, i.e.,
998 003995 or 997 002 995. The nal result is obtained by
concatenating RHS and LHS, i.e., (Answer 995006). Mathematical description of this sutra can be formulated as
Assuming A and B are two numbers to be multiplied and their

product is equal to P. Mathematically A and B can be expressed as
A
n1
X
Ai 10i and B
i0
n1
X
Bi 10i where Ai ,Bi A f0,1,. . .,9g
10
i0
Multiplication rule can be written as

P AB
11
Eq. (11) can be reformulated by adding and subtracting the

term 102n 10n(A B) in the right hand side
P AB 102n 10n A B102n 10n A B
12
P f10n A B102n g 102n 10n A B AB
13
Eq. (13) can be derived for both the numbers if the number is
greater than the base or less than the base.
( n
10 10n A B 10n A10n B if A,B 4 10n
P
14
10n AB AB
if A,B o 10n
where n is any positive integer and A and B are the 10n0 s
complements of A and B. Mathematical expression of Nikhilam
Navatascaramam Dasatah sutra for binary number system is
given hereunder:
Consider two n bit numbers X and Y, k is exponents, z1, z2 are
residues of X and Y, respectively. Mathematically, X and Y can be
represented as: X 2k 7z1 , Y 2k 7z2
The product term of X and Y is assumed as P and can be
represented as:
P X Y 2k 7 z1 2k 7z2
15
For the fast multiplication using extended rule of the sutra the
bases of the multiplicand and the multiplier assuming same, thus
the Eq. (15) can be rewritten as
P XY 2k X 7 z2 7z1 z2
16
From Eq. (16) it is observed that a large number multiplication

can easily decomposed into a small number multiplication,
addition/subtraction and shifting, leading towards the reduction
of hardware cost, propagation delay and power consumption.
Small number of the multiplication can easily implemented using
Urdhva-tiryakbyham sutra (formula).
3.2. Urdhva-tiryakbyham (UT) sutra
The meaning of the term UT is Vertically and crosswise and
it is applicable to all the multiplication operations. This procedure
is simply known as array multiplication technique [24]. Mathematical expressions of UT for binary number system is
given below:
Assume the product of two N-bit words are described as
X
N
1
X
xi 2i
17
yj 2j
18
i0
and
Y
N1
X
j0
where xi, yj e {0, 1}

Multiplication can be described as
P XY
N
1
X
i0
P
Fig. 3. Implementation of multiplication using NDD sutra.
XX
i
xi 2i
N
1
X
yj 2j
19
j0
xi yj 2i j
20
Let kij
P
2N1
1
X NX
xi yki 2k
21
k0i0
2N1
X
pk 2k
control0 and no incrementing operation is done. When input bits

are non-zero then incrementing operation is required as control1.
Another signal Ctrl is generated to set (logic 1) the LS bit of
the input number when all the input bits are zero. Buffers are
used to pass the all other input bits (except LS bit) as it is.
22
4.2. Comparator
k0
where
pk xi yki
1347
23
4. Circuit modules and complete factorial design circuit

The advantages of CMOS transmission gate (TG) logic over
conventional CMOS and CPL [26,27] logic are well established. As
the CMOS transmission gate consists of one PMOS and one NMOS,
connected in parallel, the ON resistance is smaller than even a
single NMOS. The circuit modules required for computation of
factorial of a number are described in the following subsections.
All the circuit modules for the computation have been implemented using TG. Sections 4.14.8 describe the operations of all
the modules and subsequently complete design of factorial
calculation is described in Section 4.9.
4.1. Zero detectors
Consider the array of n bit number given as X xn 1, xn 2,y,
x2, x1, x0. The zero detector circuit identies the input number is
zero or not and if the input number is zero then it sets the LS bit of
the number to logic 1. This implementation has been considered
for factorial 0 computation. Hardware implementation of zero
detectors is shown in Fig. 4, where X is the input and output is
represented as Y. Booleans equation for zero detector has been
implemented form Fig. 4.
Control x3 x2 x1 x0 Ctrl x0 where Ctrl x3 x2 x1
24
Y 3 buffered x3
25
Y 2 buffered x2
26
Y 1 buffered x1
27
Y 0 Ctrl x0
28
In the zero detector circuit, one control signal is generated,

which indicates that the input number is zero or not. This signal is
fed to the incrementer circuit and determines whether incrementing
operation is required or not. When all the input bits are zero then
Comparator circuit [28] is required to compare the value for

incrementer based design for computing of factorial of a number.
Incrementer block increments the value, which is starting from 0
and the comparator block is comparing the incremented value
and the given number. If the incremented value is less than the
given number then it is incremented again, and the same stage of
comparison is followed. The iterative process continues until the
incremented value is equal to the given number. To compare two
numbers we have used two parallel adder stages that checks
whether one number is equal, greater or lesser than the other
number. Let us assume the two 4-bit numbers to be compared are
A and B. The numbers can be dened as A a3, a2, a1, a0 and B b3,
b2, b1, b0. We want to compare the values of A with respect to the
values of B, (Here the value of B is taken as a xed number and the
number A is user dened. To simplify the calculation the number
is taken as 4 bit array; higher bit array can be calculated in similar
manner). The rst stage adder basically performs subtraction
operation. A,B are the inputs of the parallel adder and the carry
bit is set high. After rst stage of addition if the resultant carry bit
(rst stage) is high then B 4A or B A. If B oA then the resultant
carry bit is low. Now consider the case B4 A or BoA, which gives
the second stage XOR output to be non-zero, and the second
stage resultant carry is high. If AB then the XOR output is equal
to zero and the second stage resultant carry bit is low. Finally the
rst stage resultant carry and second stage resultant carry bit are
passing through an AND operation producing the control signal.
Hardware implementation of comparator is shown in Fig. 5.
4.3. Incrementer/decrementer
In this section, multiplexer based incrementer [17,18] has
been adopted for computation of high speed factorial circuitry.
Mathematical explanation of the reported design is shown below.
Y X 1
8
< X X X X
J
0
1
J1
YJ
: X0
29
1 r j rn1
j0
30
Circuit level implementation has been carried out using CMOS

transmission gate [TG] to make the circuit faster. An n-bit MUXbased incrementer is designed as shown in Fig. 6. It is composed
of a data-out MUX array and a selection module (SM) used to nd
the rst one bit. The output of SM is D0, D1,y,Dn 1. It can be
noted that each bit of the decrement result Y can be derived by a
MUX operation.
4.4. Adder/subtractor
Fig. 4. Circuitry for checking zero value at the input bit stream.
The conventional adder/subtractor block has been implemented [27] to perform addition as well as subtraction in a single
block, and their performance parameters have been checked
using standard 90 nm CMOS technology. Here the control
(addsub) signal is used for the operation of addition or subtraction. For addition purpose the addsub signal is active low and to
subtract it is active high. The circuit level diagram for the reported
diagram is shown in Fig. 7.
1348
Fig. 5. Hardware implementation of comparator.
Table 1
Combination of shifting operation.
A7
A6
A5
A4
A3
A2
A1
A0
Fig. 6. Hardware implementation of multiplexer based incrementer.
A6
A5
A4
A3
A2
A1
A0
0
A5
A4
A3
A2
A1
A0
0
0
A4
A3
A2
A1
A0
0
0
0
A3
A2
A1
A0
0
0
0
0
A2
A1
A0
0
0
0
0
0
A1
A0
0
0
0
0
0
0
A0
0
0
0
0
0
0
0
When
When
When
When
When
When
When
When
S2S1S0 000
S2S1S0 001
S2S1S0 010
S2S1S0 011
S2S1S0 100
S2S1S0 101
S2S1S0 110
S2S1S0 111
outputs. In general, for n bit inputs, the number of select lines

needed islog2 n. So for eight bit input, the number of select inputs is
log2 8 3. 00000001 is initially loaded to the input of the multiplexers. As for example if the select inputs S2 S1 S0111 then the
shifted output is 10000000.
4.6. Multiplier using UT sutra
Eq. (23) shows that the co-efcient of multiplication that can be
achieved by the convolution sum of the two nite number
sequences. Considering the long-hand sequences of Eq.. (22), 4 bit
multiplier algorithm is shown in Fig. 9. The hardware implementation of the same principle can be implemented using standard array
multiplication technique [26]. For sake of simplicity 4-bit multiplier
is considered, higher order bit multipliers can be realized in a similar
manner. Partial products are added in two stages. Adders and 4 to
3 compressors are used to minimize the stage operations. Compressors and adders are used carefully so that a minimum number of
outputs would be generated. Thus using minimum number of
adders/compressors partial products are added without compromising the number of bits generation for the next stage operation.
Fig. 7. Hardware implementation of Adder/Subtractor.
4.5. Logical shifter (LS)

Logical shifter is a Barrel shifter, which can shift a number, more
than one times given by the select inputs. The shifting operation
executed by a Barrel shifter is shown in Table 1. Here we have
designed the left shift operation. Fig. 8 shows the architecture for
8 bit Logical Shifter. LS consist of several multiplexers. The number
of multiplexers required can be determined by the number of
4.7. Radix extraction unit (REU)

Generally binary Radix can be dened as, Radix 2Ex
Pn1
i
term Ex is the exponent, which can be
i 0 r i 2 .Where, the
Plog n1
expressed as, Ex i 20 exi 2i . Architecture of the radix extraction unit is shown in Fig. 10. The output of the priority encoder
is the radix, which is again fed to the binary encoder, which
ultimately generates the exponent [29]. In the following example,
the function of the RSU has been discussed. Example: If the binary
input is 1110 (1410), then the PE generates 00010000 (1610),
1349
Fig. 8. Hardware implementation of Logical Shifter.
Fig. 11. Block diagram of complete Vedic multiplier.
Fig. 9. Multiplication Procedure using UT sutra.
(i) Radix Extraction Unit (REU) (ii) Adder/Subtractor (iii) Logical

Shifter and (iv) Multiplier using UT sutra. The REU is required to
select the proper radices, and its exponent values corresponding
to the input numbers. The selected radix is chosen nearer to the
given number, and resulted as easier multiplication. The Subtractor blocks are required to extract the residual parts i.e., z1 and
z2. Multiplication values of z1 and z2 has been easily determined
by UT sutra. The rst adder-subtractor block has been used to
calculate the value of X 7z2 . Output result taken from adder/
subtractor is simply logically left shifted by k unit to produce the
value of 2k X 7 z2 . The nal result can be implemented by adding
or subtracting the shifter output and the multiplication which has
been implemented by UT sutras output.
4.9. Complete factorial design circuitry
Fig. 10. Architecture for Radix Extraction Unit.
which is of eight bits. The corresponding encoder output that is

the exponent is 100 (410).
4.8. Complete design of Vedic multiplier
Hardware implementation of NND sutra is shown in Fig. 11.
The architecture can be decomposed into four main subsections:
Calculation of factorial of a number has been computed in

parallel manner leading towards high speed operation. From
owchart diagram (Fig. 2), where, all the input registers has been
initially lled with 0001. At the starting point of the owchart
diagram rst it checks whether the input number is zero or not. If
the input is zero then for all the register values are set to 1, then
multiplication starts. If the input value is not equal to zero, then
incrementer start incrementing the values up-to the number, and
register values are updated with the incremented values for
multiplication. Parallel implementation methodology has been
1350
Table 2
Performance parameters like propagation delay (ps), average dynamic power
consumption (mW) and Energy delay product (10 27) Js analysis of different
components such as zero detector, incrementer/decrementer, comparator, adder/
subtractor, REU.
Circuit module
Delay (ps)
Power (mW)
EDP (10 27)Js
Zero detector
Incrementer/decrementer
Comparator
Adder/subtractor (4-Bit)
REU
120
180
148
140
376
1.02
3.14
2.15
0.856
0.678
14.16
101.74
47.09
16.3
95.85
Fig. 12. Block diagram for hardware implementation of factorial calculation.
considered for computation; as a result, 4-bit factorial of a

number can be calculated within 4 stages, thereby, signicantly
reduces the propagation delay. The block diagram for factorial
calculation is shown in Fig. 12.
Propagation Delay (nS)
Transistor level simulation for factorial calculation circuit was

performed through Spice Specter simulator using 90 nm CMOS
technology with 1 V power supply, operated at 10 MHz. Dual
threshold voltage (VT) operating mode was considered for simulation to determine the performance parameters. In designing
calculation of factorial of a number like 3-bit, 4-bit number and
5-bit, all the individual modules such as zero detector, incrementer/decrementer, Vedic multiplier was implemented using TG to
make the circuit faster. Lowering supply voltage reduces the
power dissipation in quadratic fashion and becomes attractive.
Though low supply voltage affects delay but it is compensated by
the lower RC delay of TG circuit and the dual threshold CMOS
technology. It is also to be noted that each TG circuit requires less
number of transistor than conventional CMOS implemented
circuits thus reduces the layout area. The individual performance
parameters such as propagation delay, dynamic switching power
consumption and Energy Delay Product (EDP) for different circuit
modules, i.e. zero detector, incrementer/decrementer, comparator is
shown in Table 2. We focused our main concentration for reducing
the propagation delay, dynamic average switching power consumption and energy delay product.
It is worth mentioning here, that we have taken the implementation methodology from different references [20,26], and implemented it in the same technological environments (spice specter
with standard 90 nm CMOS technology) and then compared the
performance parameters. Fig. 13 indicating the comparison results
of different multipliers, which have been designed in the same
technological environment. For the comparison point of view the
ideas have been considered form the references and simulated and
performance parameters was computed using the same MOSFET
technology le. Input data was taken in a regular fashion for
experimental purpose. The delay and the power measured using
the worst-case pattern and from the output where the delay is
maximum. From Fig. 13, it is observed that, for higher bit length
multiplication Vedic multiplier offered substantial reduction of
propagation delay and dynamic switching power consumptions.
Fig. 14 indicates the performance parameters such as propagation
delay, and dynamic switching power consumptions analysis for
factorial computation of different bit length sequences, like 3-bit,
4-bit and 5-bit number. Fig. 14 also indicating that comparison of the
same circuitry, which have been implemented by different architectures such as UTand NND. From Fig. 14, it is observed that the
proposed design offered 33% and 24% improvement in propagation delay while corresponding reduction of power consumption in
[20]
[26]
Proposed
0
4x4
16
14
12
10
8x8
16x16
32x32
16x16
32x32
[20]
[26]
proposed
7
Power (uW)
5. Results and discussions
6
5
4
3
2
1
0
4x4
8x8
Fig. 13. Comparison of results of different type Vedic multipliers (VM), implemented in same environment, in terms of performance parameters such as
propagation delay (ns) and dynamic switching power (mW), as a function of input
number of bits.
34.48% and 24% for the factorial calculation circuitry in comparison with UT and NND based implementations, respectively. Fig. 15
represents the layout of the proposed factorial circuitry for a 4-bit
number using parallel Vedic multiplication methodology, for a layout
area of only 6 mm2. It can be envisaged from the above discussion
that the Vedic multiplier is the most critical element in improving
the speed of the circuit to compute Factorial of a number.
180
Propagation Delay (nS)
160
1351
through parallel implementation methodology. This novel architecture combines the advantages of ancient Vedic formulae and the
parallel implementation techniques thereby leading to signicant
reduction in the number of stages, resulting in high speed operation.
In circuit realization, an (N N) bit multiplier implementation was
transformed into just one small multiplication (bit length 5N) and
one adder/subtractor implementation, thereby high speed operation,
for factorial computation. The propagation delay for the calculation
of 4-bit factorial of a number was only 42.13 ns while the power
consumption of the same was 58.82 mW for a layout area of
6 mm2. Improvement in speed was found to be 33% and 24%
while corresponding reduction of power consumption in 34.48% ,
24% for the factorial calculation circuitry in comparison with UT
and NND based implementation respectively. It can be envisaged
that speed improvement in factorial computation circuit is attributed signicantly from incorporation of the Vedic multiplier.
[20]
[26]
Proposed
140
120
100
80
60
40
20
0
3-Bit
Power Consumption (mW)
200
4-Bit
5-Bit
[20]
[26]
Proposed
150
100
50
0
3-Bit
4-Bit
5-Bit
Fig. 14. Comparison of results for computation of factorial of a number using

parallel Vedic implementation methodology in terms of performance parameters
like propagation delay (us) and dynamic switching power (mW) analysis as a
function of input bit length.
Fig. 15. Layout of factorial design circuit using parallel Vedic multiplication
methodology. Layout consumes only 6 mm2 area. Layout have been implemented using L-Edit V-13 of T-Spice simulator.
6. Conclusion
In this paper, based on ancient Vedic mathematics, we report on
a novel circuitry for computation of factorial of a 4-bit number
References
[1] J.P. Deschamps, G.J.A. Bioul, G.D. Sutter, Synthesis of Arithmetic Circuits, FPGA,
ASIC and Embedded Systems, Wiley Interscience Publications, 2006 180198.
[2] J.F. Hart, E.W. Cheney, C.L. Lawson, H.J. Maehly, C.K. Mesztenyi, J.R. Rice,
H.G. Thacher, C. Thacher, H.G. Witzgall Jr., Computer Approximations, Wiley,
1968.
[3] M. M.-Dastjerdi, A. A.-Kusha, M. Pedram, BZ-FAD: A Low-Power Low-Area
Multiplier Based on Shift-and-Add Architecture, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst. 17 (2) (2009) 302306.
[4] A.D. Booth, A signed binary multiplication technique, Q. J. Mech. Appl. Math.
(1952) 236240 IV.
[5] Y.-H. Seo, D.-W. Kim, A. New VLSI, Architecture of Parallel Multiplier
Accumulator Based on Radix-2 Modied Booth Algorithm, IEEE Trans. Very
Large Scale Integr. (VLSI) Syst. 18 (2) (2010) 201208.
[6] J. Hu, L. Wang, T. Xu, A low-power adiabatic multiplier based on modied
Booth algorithm, in: Proceedings of the IEEE International Symposium on
Integrated Circuits, Singapore, September 2007, pp. 489492.
[7] C.S. Wallace, A suggestion for a fast multiplier, IEE Trans. Electron. Comput.
EC-13 (1) (1964) 1417.
[8] M. Young, The Techincal Writers Handbook, CA: University Science, Mill
Valley, 1989.
[9] F. Carbognani, F. Buergin, N. Felber, H. Kaeslin, W. Fichtnes, A 2.7-/SPL mu/W/
MHz transmission-gate-based 16-bit multiplier for digital hearing aids, in:
Proceeding of the IEEE 48th Midwest Symposium on Circuit and Systems,
Covington, KY, August 2005, pp. 14061409.
[10] Z. Wang, G.A. Jullien, W.C. Miller, A new design technique for column
compression multipliers, IEEE Trans. Comput. 44 (8) (1995) 962970.
[11] K.-J. Cho, S. Jo, Y.-E. Kim, Y.-N. Xu, J.-G. Chung, Constant multiplier design
using specialized bit pattern adders, in: Proceeding of the IEEE Fifteenth
International Conference on Electronics, Circuits and Systems, St. Juliens,
August 2008, pp. 4144.
[12] S.L. Chen, X.-Y. Tian, X.-J. Zhao, Improved multiplier of CSD used in digital
signal processing, in: Proceeding of the IEEE International Conference on
Machine Learning and Cybernetics, Kunming, July 2008, pp. 29052908.
[13] A. Avizienis, Signed-digit number representations for fast parallel arithmetic,
IRE Trans. Electron. Comput. EC-10 (1961) 389400.
[14] M.R. Stan, A.F. Tenca, M.D. Ercegovac, Long and fast up/down counters, IEEE
Trans. Comput. 47 (7) (1998) 722735.
[15] D.R. Lutz, D.N. Jayashima, Programmable modulo-K counters, IEEE Trans.
Circuits Syst.: Fund. Theory Appl. 43 (11) (1996) 939941.
[16] R. Hashemian, Highly parallel increment/decrement using CMOS technology,
in: Proceedings of the 33rd IEEE Midwest Symposium on Circuit and System,
Calgary, Alberta, Canada, August 1990, vol. 2, pp. 866869.
[17] C.-H. Huang, J.-S. Wang, Y.-C. Huang, A high-speed CMOS incrementer/
decrementer, in: Proceeding of the IEEE International Symposium on Circuits
and Systems, Sydney, Australia, May 2001, vol. 4, pp. 8891.
[18] S. Bi, W.J. Gross, W. Wang, A. Al-Khalili, M.N.S. Swamy, An area-reduced
scheme for Modulo 2n 1 addition/subtraction, in: Proceeding of the IEEE
Ninth International Database Engineering and Application Symposium, July
2005, pp. 396399.
[19] J.S.S.B.K.T. Maharaja, Vedic Mathematics, Motilal Banarsidass Publishers Pvt
Ltd, Delhi, 2001.
[20] P. Mehta, D. Gawali, Conventional versus Vedic mathematical method for
hardware implementation of a multiplier, in: Proceedings of the IEEE
International Conference on Advances in Computing, Control, and Telecommunication Technologies, Trivandrum, Kerala, December 2009, pp. 640642.
[21] M. Ramalatha, K. Thanushkodi, K.D. Dayalan, P. Dharani, A. Novel Time and
energy efcient cubing circuit using Vedic mathematics for nite eld
arithmetic, in: Proceedings of the IEEE International Conference on Advances
in Recent Technologies in Communication and Computing, Kerala, October
2009, pp. 873875.
1352
[22] M. Ramalatha, K.D. Dayalan, P. Dharani, S.D. Priya, High speed energy
efcient ALU design using Vedic multiplication techniques, in: Proceedings
of the IEEE International Conference on Advances in Computational Tools for
Engineering Applications, Zouk Mosbeh, July 2009, pp. 600603.
[23] S. Akhter, VHDL implementation of fast N N multiplier based on vedic
mathematic, in: Proceedings of the IEEE, Eighteenth European Conference on
Circuit Theory and Design, Seville, August 2007, pp. 472475.
[24] P. Mehta, D. Gawali, Conventional versus Vedic mathematical method for
hardware implementation of a multiplier, in: Proceedings of the IEEE
International Conference on Advances in Computing, Control, and Telecommunication, Trivandrum, Kerala, December 2009, pp. 640642.
[25] H.D. Tiwari, G. Gankhuyag, C.M. Kim, Y.B. Cho, Multiplier design based on
ancient Indian Vedic Mathematics, in: Proceedings of the IEEE International
SoC Design Conference, Busan, November 2008, pp. 6568.
[26] P. Saha, A. Banerjee, P. Bhattacharyya, A. Dandapat, High Speed ASIC Design

of Complex Multiplier Using Vedic Mathematics, in: Proceedings of the IEEE
Student Technology Symposium, Kharagpur, January 2011, pp. 237241.
[27] P.K. Saha, A. Banerjee, A. Dandapat, High Speed Low Power, Complex multiplier design using parallel adders and subtractors, Int. J. Electron. Elect. Eng.
(IJEEE) 07 (11) (2009) 3846.
[28] S. Veeramachaneni, M.K. Krishna, L. Avinash, P.S. Reddy, M.B. Srinivas,
Efcient design of 32-bit comparator using carry look-ahead logic, in:
Proceedings of the IEEE Northeast workshop on Circuits and Systems,
Montreal, August 2007, pp. 867870.
[29] R. Hashemian, A high speed compact priority encoder, in: Proceedings of the
IEEE, 32nd Midwest Symposium on Circuits and systems, Champaign, August
1989, pp. 197200.

ASIC Design of A High Speed Low Power Circuit For Factorial Calculation Using Ancient Vedic Mathematics

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

ASIC Design of A High Speed Low Power Circuit For Factorial Calculation Using Ancient Vedic Mathematics

Hochgeladen von

Copyright:

Verfügbare Formate

Microelectronics Journal 42 (2011) 13431352

Contents lists available at SciVerse ScienceDirect

Corresponding author. Tel.: 91 3326684561; fax: 91 3326682916.

add multiplier, tree multiplier, array multiplier, signed digit

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

increase in silicon area due to irregularity of structure and also

2. Algorithm for factorial computation

2.1. Mathematical representation

where k is the exponent of X and z is the residue. The next input to

Then factorial of a number can be computed either by

Here I is the number of iteration to be executed to calculate the

Flow chart representation of algorithm for computation of

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

otherwise it follows the increment and iterative multiplication

process or incrementing process. Assume, input is a 4-bit number

Fig. 1. Flow chart representation of algorithm for calculation of factorial of an

Fig. 2. Multiplication methodology for modied factorial computation.

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

for each l from 0 to k/2

3. Vedic mathematics for multiplication

Assuming A and B are two numbers to be multiplied and their

Bi 10i where Ai ,Bi A f0,1,. . .,9g

Multiplication rule can be written as

Eq. (11) can be reformulated by adding and subtracting the

P f10n A B102n g 102n 10n A B AB

From Eq. (16) it is observed that a large number multiplication

where xi, yj e {0, 1}

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

control0 and no incrementing operation is done. When input bits

4. Circuit modules and complete factorial design circuit

In the zero detector circuit, one control signal is generated,

Comparator circuit [28] is required to compare the value for

Circuit level implementation has been carried out using CMOS

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

Fig. 5. Hardware implementation of comparator.

Fig. 6. Hardware implementation of multiplexer based incrementer.

outputs. In general, for n bit inputs, the number of select lines

Fig. 7. Hardware implementation of Adder/Subtractor.

4.5. Logical shifter (LS)

4.7. Radix extraction unit (REU)

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

Fig. 8. Hardware implementation of Logical Shifter.

Fig. 11. Block diagram of complete Vedic multiplier.

Fig. 9. Multiplication Procedure using UT sutra.

(i) Radix Extraction Unit (REU) (ii) Adder/Subtractor (iii) Logical

Fig. 10. Architecture for Radix Extraction Unit.

which is of eight bits. The corresponding encoder output that is

Calculation of factorial of a number has been computed in

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

EDP (10 27)Js

Fig. 12. Block diagram for hardware implementation of factorial calculation.

considered for computation; as a result, 4-bit factorial of a

Propagation Delay (nS)

Transistor level simulation for factorial calculation circuit was

5. Results and discussions

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

Propagation Delay (nS)

Power Consumption (mW)

Fig. 14. Comparison of results for computation of factorial of a number using

P. Saha et al. / Microelectronics Journal 42 (2011) 13431352

[26] P. Saha, A. Banerjee, P. Bhattacharyya, A. Dandapat, High Speed ASIC Design

Das könnte Ihnen auch gefallen