456 Documentation

A Project Report on
EFFICIENT AND POWER OPTIMISED MAC DESIGN USING

RADIX 8 MODIFIED BOOTH
A Dissertation submitted to
JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY, Kakinada.
In the partial fulfillment of the requirements for the award of the degree of
MASTER OF TECHNOLOGY
IN
VLSI & EMBEDDED SYSTEMS
Submitted by
K.VENKATA GOPI
(14HU1D6814)
Under the esteemed Guidance of
Ms. P.SUREKHA, M.Tech.
Assistant Professor
CHEBROLU ENGINEERING COLLEGE

(Approved by AICTE, NEW DELHI & Affiliated to JNTUK, Kakinada, AP)
CHEBROLU (Village & Mandal), GUNTUR-522212
CHEBROLU ENGINEERING COLLEGE
(Approved by AICTE, NEW DELHI & Affiliated to JNTUK, Kakinada, AP)
CHEBROLU (Village & Mandal), GUNTUR-522212
CERTIFICATE
This is to certify that the main project report entitled EFFICIENT AND
POWER OPTIMISED MAC DESIGN USING RADIX 8 MODIFIED BOOTH is a piece of

project work done by K.VENKATA GOPI (14HU1D6814) under the guidance
and supervision of P.SUREKHA at the Department of Electronics and
Communication Engineering, for the degree of Master of Technology in
VLSI & Embedded Systems of JNTU-KAKINADA, AP, INDIA.
The project viva-voice Exam is held on______of_________, 2013.
HEAD OF DEPARTMENT INTERNAL GUIDE
EXTERNAL EXAMINER
DECLARATION BY THE CANDIDATE
I hereby declare that this thesis entitled EFFICIENT AND POWER OPTIMISED
MAC DESIGN USING RADIX 8 MODIFIED BOOTH submitted to Jawaharlal Nehru
Technological University, Kakinada for the award of degree in Master of
Technology is based on the original work carried out in the laboratories of
CHEBROLU ENGINEERING COLLEGE, CHEBROLU (Village & Mandal),
GUNTUR-522212 A.P., INDIA and has not been submitted earlier in part or in
full for any Postgraduate degree or degree of any university.
K.VENKATA GOPI
(14HU1D6814 )
Place:
Date:
ACKNOWLEDGEMENT
We write this acknowledgement with great honour, pride and pleasure to pay our
respects to all who enabled us directly or indirectly in reaching this stage.
We express our sincere thanks to our beloved Secretary& Correspondent

Sri R.V.KRISHNAIAH, for providing support and stimulating environment for completing
the project.
We would especially like to thank the Dr. K. HARI BABU, Principal for providing
us the resource for carrying out the project.
We are grateful to N.VIJAYA SHANKER, Head of the Department of Electronics

and Communication Engineering for his inspiration and above all the moral support and the
constant encouragement in carrying out this project work.
Its our privilege to work under the able guidance of Ms. P.SUREKHA
Asst.Professor, Department of Electronics & Communication Engineering, as our guide. Our
vocabulary is not as rich as his experience to express our deep sense of indebtedness and
whole hearted thanks to our beloved guide for his valuable suggestions, support and guidance
in carrying out this Project Work.
We also thank all the teaching staff of ECE Department. Last but not least, we would
like to thank non-teaching staff members, Department of Electronics and Communication
Engineering for their kindly help.
We also thank all our friends for their invaluable help deduced by them during the
course of project. Finally, we would like to dedicate the whole work to our parents for their
everlasting love and constant encouragement given by them during the period, even during
miles apart.
Place:
Date:
INDEX
CERTIFICATE II
DECLARATION III
ACKNOWLEDGMENT IV
LIST OF FIGURES VIII
ABSTRACT XI
CHAPTER NO CHAPTER NAME PAGE NO
1 INTRODUCTION 02
1.1 MULTIPLICATION ALGORITHM 03
1.2 SERIAL MULTIPLIER 05
1.3 SERIAL/PARALLEL MULTIPLIER 06
1.4 SHIT AND ADD MULTIPLIER 07
1.5 ARRAY MULTIPLIER 08
2 EXISTING SYSTEM 17
3 PROPOSED SYSTEM
2.1 BLOCK DIAGRAM OF PROPOSED SYSTEM 18
2.2 INTRODUCTION 19
2.3 BOOTH MULTIPLIERS 19
2.4 COMPARISON OF BOOTH AND SHIFT ADD METHODS 23
2.5 HARDWARE IMPLEMENTATION OF BOOTH 24
2.6 MODIFIED BOOTH ALGORITHM 28
2.7 MODIFIED BOOTH ALGORITHM ENCODER 28
2.8 PARTIAL PRODUCT GENERATOR 30
2.9 NON REDUNDANT RADIX 4 SIGNED DIGIT ALGORITHM 32
2.10 RADIX 8 MODIFIED BOOTH ALGORITHM 34
2.11 SIGNED EXTENSION CORRECTOR 37

4 CHAPTER -III
4.1 INTRODUCTION TO VHDL 39
4.2 POWER AND FLEXIBILITY 39
4.3 DEVICE INDEPENDENT DESIGN 39
43.4 PORTABILITY 39
4.5 BENCHMARK CAPABILITIES 39
43.6 ASIC MIGRATION 39
4.7 VHDL DESCRIPTION 40
4.8 RESULT 51
4.9 APPLICATIONS 56
4.10 FUTURE SCOPE 56
4.11 CONCLUSION 57
4.12 REFERENCES 58
LIST OF FIGURES
FIGURE NO. FIGURE NAME PAGE NO
1.3 SERIAL\PARALLEL MULTIPLIER 06
1.3.1 PIPELINED VERSION OF AN 8 BIT MULTIPLIER 07
1.4 SHIFT AND ADD MULTIPLIER 08
1.5 CARRY RIPPLE ADDER 10
1.5.1 FUNCTIONAL OPERATION OF CLA 11
1.5.2 2S COMPLIMENT 14
1.5.3 ARRAY MULTIPLIER FOR A 32BIT NUMBER 17
3.1 ABSTRACT FORM OF THE FLEXIBLE DATA PATH 19
3.2 FLEXIBLE COMPUTATIONAL UNIT 19
3.5 HARDWARE IMPLEMENTATION FOR MODIFIED BOOTH 25
3.6 BOOTHS ENCODER 30
3.8 PARTIAL PRODUCT ENCODER 30
3.9 BLOCK DIAGRAM OF NR4SD ENCODING 34
3.10 BLOCK DIAGRAM OF RADIX 8 MODIFIED BOOTH 38

EFFICIENTAND POWER OPTIMISED MAC DESIGN USING RADIX-8 MODIFIED BOOTH 1
ABSTRACT:
In this paper, we introduce architecture of pre encoded multipliers for Digital Signal Processing
applications based on off line encoding of coefficients. To this extend, the Non Redundant
radix Signed Digit (NR4SD) encoding technique, which uses the digit values { 1, 0, +1,
+2} or { 2, 1,0,+1} , is proposed leading to a multiplier design with less complex partial
products implementation. Extensive experimental analysis verifies that the proposed pre
encoded NR4SD multipliers, including the coefficients memory, are more area and power
efficient than the conventional Modified Booth scheme. This project presents a design
methodology for high-speed Booth encoded parallel multiplier. For partial product generation,
we propose a new modified Booth encoding (MBE) scheme to improve the performance of
traditional MBE schemes. For final addition, a new algorithm is developed to construct
multiple-level conditional- sum adder (MLCSMA). The proposed algorithm can optimize final
adder according to the given cell properties and input delay profile. Since, carry select adder is
used in final accumulation, delay is further reduced with huge factor.Radix8 modified booth
encoding algorithm is sued for more area reduction and enhancement.
CHEBROLU ENGINEERING COLLEGE ECE DEPARTMENT.

INTRODUCTION:
Multipliers play an important role in todays digital signal processing and various other
applications. With advances in technology, many researchers have tried and are trying to design
multipliers which offer either of the following design targets high speed, low power
consumption, regularity of layout and hence less area or even combination of them in one
multiplier thus making them suitable for various high speed, low power and compact VLSI
implementation.
The common multiplication method is add and shift algorithm. In parallel multipliers
number of partial products to be added is the main parameter that determines the performance of
the multiplier. To reduce the number of partial products to be added, Modified Booth algorithm is
one of the most popular algorithms. To achieve speed improvements Wallace Tree algorithm can
be used to reduce the number of sequential adding stages. Further by combining both Modified
Booth algorithm and Wallace Tree technique we can see advantage of both algorithms in one
multiplier. However with increasing parallelism, the amount of shifts between the partial products
and intermediate sums to be added will increase which may result in reduced speed, increase in
silicon area due to irregularity of structure and also increased power consumption due to increase
in interconnect resulting from complex routing. On the other hand serial-parallel multipliers
compromise speed to achieve better performance for area and power consumption. The selection
of a parallel or serial multiplier actually depends on the nature of application. In this lecture we
introduce the multiplication algorithms and architecture and compare them in terms of speed, area,
power and combination of these metrics. MULTIMEDIA and Digital Signal Processing (DSP)
applications (e.g., Fast Fourier Transform (FFT), audio/video
CoDecs) carry out a large number of multiplications with coefficients that do not change during
the execution of the application. Since the multiplier is a basic component for implementing
computationally intensive applications, its architecture seriously affects their performance.
Constant coefficients can be encoded to contain the least non-zero digits using the Canonic Signed
Digit (CSD) representation [1]. CSD multipliers comprise the fewest non-zero partial products,
which in turn decreases their switching activity. However, the CSD encoding involves serious
limitations. Folding technique [2], which reduces silicon area by time multiplexing many
operations into single functional units, e.g., adders, multipliers, is not feasible as the CSD-based

multipliers are hard-wired to specific coefficients. In [3], a CSD-based programmable multiplier

design was proposed for groups of pre-determined coefficients that share certain features. The size
of ROM used to store the groups of coefficients is significantly reduced as well as the area and
power consumption of the circuit. However, this multiplier design lacks flexibility since the partial
products generation unit is designed specifically for a group of coefficients and cannot be reused
for another group. Also, this method cannot be easily extended to large groups of pre-determined
coefficients attaining at the same time high efficiency. Modified Booth (MB) encoding tackles the
aforementioned limitations and reduces to half the number of partial products resulting to reduced
area, critical delay and power consumption. However, a dedicated encoding circuit is required
and the partial products generation is more complex. Kim et al. proposed a technique similar to [3],
for designing efficient MB multipliers for groups of pre-determined coefficients with the same
limitations described in the previous paragraph.
1.1 MULTIPLICATION ALGORITHM::
The multiplication algorithm for an N bit multiplicand by N bit multiplier is shown below:
Y= Yn-1 Yn-2 ........................Y2 Y1 Y0 Multiplicand
X= Xn-1 Xn-2 ..................... X2 X1 X0 Multiplier
G e n e r a lly
Y = Y n -1 Y n -2 ........................Y 2 Y 1 Y 0
X = X n -1 X n -2 ..................... X 2 X 1 X 0
=================================================
Y n -1 X 0 Y n -2 X 0 Y n -3 X 0 Y 1X 0 Y 0X 0
Y n -1 X 1 Y n -2 X 1 Y n -3 X 1 Y1X1 Y0X1
Y n -1 X 2 Y n -2 X 2 Y n -3 X 2 Y 1X 2 Y 0X 2

. . . . .
Y n -1 X n -2 Y n -2 X 0 n -2 Y n -3 X n -2 Y 1 X n -2 Y 0 X n -2
Y n -1 X n -1 Y n -2 X 0 n -1 Y n -3 X n -1 Y 1 X n -1 Y 0 X n -1
- - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - -- -- -- -- -- -- -- -- -- -- -- -- -
P 2 n -1 P 2 n -2 P 2n-3 P2 P1 P0

Example 1101 4-bits
1101 4-bits
1101
0000
1101
AND gates are used to generate the Partial Products, PP, If the multiplicand is N-bits and the
Multiplier is M-bits then there is N* M partial product. The way that the partial products are
generated or summed up is the difference between the different architectures of various
multipliers.
Multiplication of binary numbers can be decomposed into additions. Consider the

multiplication of two 8-bit numbers A and B to generate the 16 bit product P.
A7 A6 A5 A4 A3 A2 A1 A0
X B7 B6 B5 B4 B3 B2 B1 B0
-------------------------------------------------
A7.B0 A6.B0 A5.B0 A4.B0 A3.B0 A2.B0 A1.B0 A0.B0
Patial
+ A7.B1 A6.B1 A5.B1 A4.B1 A3.B1 A2.B1 A1.B1 A0.B1 Products to
+ A7.B2 A6.B2 A5.B2 A4.B2 A3.B2 A2.B2 A1.B2 A0.B2 be added
+ A7.B3 A6.B3 A5.B3 A4.B3 A3.B3 A2.B3 A1.B3 A0.B3
+A7.B7 A6.B7 A5.B7 A4.B7 A3.B7 A2.B7 A1.B7 A0.B7
----------------------------------------------------------------------------------------
P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 P0

m 1 n 1
The equation for the addition is: P(m + n) = A(m)B(n) = ai b j 2 i + j .
i =0 j =0
If the LSB of Multiplier is 1, then add the multiplicand into an accumulator.

Shift the multiplier one bit to the right and multiplicand one bit to the left.
Stop when all bits of the multiplier are zero.
From above it is clear that the multiplication has been changed to addition of numbers. If the
Partial Products are added serially then a serial adder is used with least hardware. It is
possible to add all the partial products with one combinational circuit using a parallel
multiplier. However it is possible also, to use compression technique then the number of
partial products can be reduced before addition .is performed.
1.2 Serial Multiplier::
Where area and power is of utmost importance and delay can be tolerated the serial multiplier is
used. This circuit uses one adder to add the m * n partial products. The circuit is shown in the
fig. below for m=n=4. Multiplicand and Multiplier inputs have to be arranged in a special
manner synchronized with circuit behavior as shown on the figure. The inputs could be presented
at different rates depending on the length of the multiplicand and the multiplier. Two clocks are
used, one to clock the data and one for the reset. A first order approximation of the delay is O
(m,n). With this circuit arrangement the delay is given as D =[ (m+1)n + 1 ] tfa.
As shown the individual PP is formed individually. The addition of the PPs are performed as the
intermediate values of PPs addition are stored in the DFF, circulated and added together with the
newly formed PP. This approach is not suitable for large values of M or N.

1.3 Serial/Parallel Multiplier::

The general architecture of the serial/parallel multiplier is shown in the figure below. One
operand is fed to the circuit in parallel while the other is serial. N partial products are formed
each cycle. On successive cycles, each cycle does the addition of one column of the
multiplication table of M*N PPs. The final results are stored in the output register after N+M
y0 y1 y2 y3
x 3 x2 x1 x 0
0 0 0

0 0 0
S0 S0 S0
+ + S0
+ S0
0 0 0
cycles. While the area required is N-1 for M=N.
Fig:1.3 serial/parallel multiplier

pipelined version of an 8 bit multiplier is shown below.
Fig:: 1.3.1pipelined version of an 8 bit multiplier
1.4 Shift and Add Multiplier::

The general architecture of the shift and add multiplier is shown in the figure below for a 32 bit
multiplication. Depending on the value of multiplier LSB bit, a value of the multiplicand is
added and accumulated. At each clock cycle the multiplier is shifted one bit to the right and its
value is tested. If it is a 0, then only a shift operation is performed. If the value is a 1, then the
multiplicand is added to the accumulator and is shifted by one bit to the right. After all the
multiplier bits have been tested the product is in the accumulator. The accumulator is 2N (M+N)
in size and initially the N, LSBs contains the Multiplier. The delay is N cycles maximum. This
circuit has several advantages in asynchronous circuits.

FIG::1.4shift and add multiplier.
1.5 Array Multipliers::
Array multiplier is well known due to its regular structure. Multiplier circuit is based on add and
shift algorithm. Each partial product is generated by the multiplication of the multiplicand with one
multiplier bit. The partial product are shifted according to their bit orders and then added.

The addition can be performed with normal carry propagate adder. N-1 adders are required where
N is the multiplier length.
A3 A2 A1 A0
Inputs
X B3 B2 B1 B0
C B0 x A3 B0 x A2 B0 x A1 B0 x A0
+ B1 x A3 B1 x A2 B1 x A1 B1 x A0
C Sum sum sum Sum
+ B2 x A3 B2 x A2 B2 x A1 B2 x A0 Internal Signals
C sum Sum sum sum
+ B3 x A3 B3 x A2 B3 x A1 B3 x A0
C sum sum Sum sum
Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0 Outputs

An example of 4-bit multiplication method is shown below:
a3 a2 a1 a0
b0
A = a3a2a1a0
B = b3b2b1b0
a3 a2 a1 a0
b1
Ci
Cout Four-bit Adder 0
n
a3 a2 a1 a0
b2
Cout Four-bit Adder Cin 0
a3 a2 a1 a0
b3
Cout Four-bit Adder Cin 0
Product (A*B)
Fig:: 1.5carry ripple adder
Although the method is simple as it can be seen from this example, the addition is done serially
as well as in parallel. To improve on the delay and area the CRAs are replaced with Carry Save

Adders, in which every carry and sum signal is passed to the adders of the next stage. Final
product is obtained in a final adder by any fast adder (usually carry ripple adder). In array
multiplication we need to add, as many partial products as there are multiplier bits.
This arrangements is shown in the figure below
A3 A2 A1 A0
P03 P12 0 P02 P11 0 P01 P10 0 P00
**Pij =Ai Bj
F.A F.A F.A

Total of 16 Aj Bi B0
gates
B1 Ci Si Ci Si Ci Si
B2
0i3 P13 P22 P21 P20
0 j3 B3
F.A F.A F.A

Pij Ci Si Ci Si Ci Si
P23 P32 P31 P30
F.A F.A F.A
Ci Si Ci Si Ci Si
P33 0
F.A F.A F.A
Ci Si Ci Si Ci Si
R5 R2 R1 R0
R7 R6 R4 R3
Fig:: 1.5.1functional arrangement of CLA

Total Area = (N-1) * M * Area FA
Delay= 2(M-1) FA
Now as both multiplicand and multiplier may be positive or negative, 2s complement number
system is used to represent them. If the multiplier operand is positive then essentially the same
technique can be used but care must be taken for sign bit extension.
The reason for dealing with signed number incorrectly is the absence of sign bit expansion in this
multiplier.
a1 a0 a1 a0
X b1 b0 X b1 b0
a1b0 a0b0 a1b0 a1b0 a1b0 a0b0
a1b1 a0b1 a1b1 a1b1 a0b1
Wrong Correct
There is a way to correct this fault, which do not need to expand all of the bits in the partial
product addition.
When 2s complement partial products are added in carry save arithmetic all numbers to be
added in one adder stage have to be of equal bit length. Therefore, the sign bits of the partial
product(s) in the first row and the sum and carry signals of each adder row are extended up to the
most significant sign bit of the number with the largest absolute value to be added in this stage.
The sign bit extension results in a higher capacitive load (fan out) of the sign bit signals
compared to the load of other signals and accordingly slows down the speed of the circuit.
Algorithms exist when adding two partial products (A+B) which will eliminate the need of sign
bit extension
1.Extend sign bit of A by one bit and invert this extended bit.
2. Invert the sign bit of B.
3. Add A and B. Add 1 to one position left of MSB of B

Here is an example of 6 bit sign addition:
a5 a5 a5 a4 a1 a0
+ b5 b5 b4 b1 b0
a5 a5 a4 a1 a0
+ 1 b5 b4 b1 b0
In General we can invert all the sign bits and add a 1 to column n as shown in the diagram
below:
10 9 8 7 6 5 4 3 2 1 0
ADD 1
1
* * * * * 1X X X X X
* * * * 2 X X X X X
* * * 3 X X X X X
INVERT ALL SIGN

BITS
FIG::1.5.2 2s compliment

It is possible however to simplify this further and use the following template. Extend the sign of
the first partial product row by 1 bit and invert this bit. Invert all other sign bits of all partial
products as shown below
Extend sign bit

10 9 8 7 6 5 4 3 2 1 0
and invert
* * * * 1S1XX X X X
* * * * 2 X X X X X
* * * 3 X X X X X
* * XXXXX
Invert all other
sign bits
Below are some examples of this method
Example 1
-210 = 1102
* 310 = 0112
-6 = 11010 This is 2s Complement of 6

By sign extension method
Sign bits
-210 = 1102 110
* 310 = 0112 * 011
-6 11110
1110
000
11010 This is 2s Complement of 6
Now, according to the algorithm,
110
Extended sign
bit and * 011
0110
010
100
Inverted sign
bits
11010 This is 2s Complement of 6

EFFICIENTAND POWER OPTIMISED MAC DESIG
DESIGN USING RADIX-8 MODIFIED BOOTH 16
The Diagram below shows the architecture of a 32 bit array adder. (Please note that the design
de is
modified to take care of 2s complement numbers)
FIG::1.5.3 Array Multiplier for a 32 bit number (2s complement numbers)
CHEBROLU ENGINEERING COLLE

EGE EC
CE DEPARTMENT.
2. EXISTING SYSTEM
Modern embedded systems target high-end application domains requiring efficient
implementations of computationally intensive digital signal processing (DSP) functions. The
incorporation of heterogeneity through specialized hardware accelerators improves performance
and reduces energy consumption. Although application-specific integrated circuits (ASICs) form
the ideal acceleration solution in terms of performance and power, their inflexibility leads to
increased silicon complexity, as multiple instantiated Asics are needed to accelerate various
kernels. Many researchers have proposed the use of domain-specific coarse-grained
reconfigurable accelerators in order to increase ASICs flexibility without significantly
compromising their performance.
The aforementioned reconfigurable architectures exclude arithmetic optimizations during the
architectural synthesis and consider them only at the internal circuit structure of primitive
components, e.g., adders, during the logic synthesis. However, research activities have shown
that the arithmetic optimizations at higher abstraction levels than the structural circuit one
significantly impact on the data path performance.
In, timing-driven optimizations based on carry-save (CS) arithmetic were performed at the post-
Register Transfer Level (RTL) design stage. In, common sub expression elimination in CS
computations is used to optimize linear DSP circuits, developed transformation techniques on the
applications DFG to maximize the use of CS arithmetic prior the actual data path synthesis. The
aforementioned CS optimization approaches target inflexible data path, i.e., ASIC,
implementations.
However, the entire aforementioned solutions feature an inherent limitation, i.e., CS optimization
is bounded to merging only additions/subtractions. A CS to binary conversion is inserted before
each operation that differs from addition/subtraction, e.g., multiplication, thus, allocating
multiple CS to binary conversions that heavily degrades performance due to time-consuming
carry propagations.
Disadvantages
High the area

High the power

PROPOSED SYSTEM
FIG::3.1 Abstract form of the flexible data path
FIG::3.2 Flexible computational unit

3.2 INTRODUCTION:
It is clear that what we had observed in previous multiplication process, the work would be
heavy and time taking process. Previous techniques were suitable if the multiplication done
between to low length functions. If those two functions are big in computation, the techniques
may take very much time to implement and the area consuming for the process is too high,
consuming more area and the power consumed by the hardware also very high.
The process that had been taken to implement the result
was too SLOW. And the power dissipation about this work was too high that would not bear
commercially. In this modern era, time and the work two are more precious in accordance with
the process of consuming power. I mean by using less power we need to process the more work.
This simple template had inspired us to do this work. By proving the flexible aspects
clear that the power would be used less if we use modified multiplication process. It is nothing
but the modified booth algorithm. By using this technique area occupied by the hardware will
reduce and the power consumption will also reduce.
It has been clear that our modified booth algorithm will reduce the no .of partial products which
will participate in the final process to reduce the functional time.
3.3 Booth Multipliers:

It is a powerful algorithm for signed-number multiplication, which treats both positive and
negative numbers uniformly.
For the standard add-shift operation, each multiplier bit generates one multiple of the
multiplicand to be added to the partial product. If the multiplier is very large, then a large
number of multiplicands have to be added. In this case the delay of multiplier is determined
mainly by the number of additions to be performed. If there is a way to reduce the number of the
additions, the performance will get better.
Booth algorithm is a method that will reduce the number of multiplicand multiples. For a given
range of numbers to be represented, a higher representation radix leads to fewer digits. Since a k-
bit binary number can be interpreted as K/2-digit radix-4 number, a K/3-digit radix-8 number,
and so on, it can deal with more than one bit of the multiplier in each cycle by using high radix
multiplication.

This is shown for Radix-4 in the example below.

Multiplicand A=
Multiplier B= ()()
Partial product bits (B1B0)2 A40
(B3B2)2 A41
Product P=
Radix-4 multiplication in dot notation.
As shown in the figure above, if multiplication is done in radix 4, in each step, the partial
product term (Bi+1Bi)2 A needs to be formed and added to the cumulative partial product.
Whereas in radix-2 multiplication, each row of dots in the partial products matrix represents 0 or
a shifted version of A must be included and added.
Table 1below is used to convert a binary number to radix-4 number .
Initially, a 0 is placed to the right most bit of the multiplier. Then 3 bits of the multiplicand is
recoded according to table below or according to the following equation:
Zi = -2xi+1 + xi + xi-1
Example:
0 added
Multiplier is equal to 0 1 0 1 1 10
then a 0 is placed to the right most bit which gives 0 1 0 1 1 10 0
the 3 digits are selected at a time with overlapping left most bit as follows:
-1
00 1 01 1 10 0 -2
-0
+1

Table .1 Radix-4 Booth recoding
Xi+1 X Xi-1 Zi/2
0 0 0 0
0 0 1 1
0 1 0 1
0 1 1 2
1 0 0 -2
1 0 1 -1
1 1 0 -1
1 1 1 0
For example, an unsigned number can be converted into a signed-digit number radix 4:
(10 01 11 01 10 10 11 10)2 = ( 2 2 1 2 1 1 0 2)4The Multiplier bit-pair recoding
is shown in Table .2
Table Multiplier recoding
0 0 0 +0*multiplicand
1 0 0 -2*multiplicand
Here 2*multiplicand is actually the 2s complement of the multiplicand with an equivalent

left shift of one bit position. Also, +2 *multiplicand is the multiplicand shifted left one bit
position which is equivalent to multiplying by 2.

To enter 2*multiplicand into the adder, an (n+1)-bit adder is required. In this case, the
multiplicand is offset one bit to the left to enter into the adder while for the low-order
multiplicand position a 0 is added. Each time the partial product is shifted two bit positions to the
right and the sign is extended to the left.
During each add-shift cycle, different versions of the multiplicand are added to the new partial
product depends on the equation derived from the bit-pair recoding table above.
Lets see some examples:
Example 1:
000011 (+3)
011101 0 (+29)
+2 -1 +1
000000000011
1111111101
00000110
1 000001010111 (+87)
Example 2:
111101 (-3)
011101 0 (+29)
+2 -1 +1
2s complement of 111111111101
0000000011
multiplicand
11111010
1 111110101001 (-87)

Example 3:
111101 (-3)
100011 0 (-29)
-2 +1 -1
Shifted 2s 000000000011
1111111101
complement
00000110
1 000001010111 (+87)
3.4 Comparison of Booth and shift and add methods::

3.5 Hardware implementation of Booth::
Once the partial products are generated then the addition process is very
similar to the array multiplier. Usually carry save adders are used with the final sum added using
a CRA. Since the Booth Method applies to 2s complement arithmetic, care must be taken to
insure sign extensions are in place as shown in red dots in the following diagram.
Fig::3.5 H/W implementation of booth

Several techniques exist that reduces this task with readymade templates.
Once the table of the partial products are drawn, all the rows of the partial products have to be
arithmetically extended to 2*N, where N is the length of the multiplicand. This is necessary to

obtain correct results but it increases the capacitive load, the area and the computational time.
Instead the template above can be used (Copied from book: Advanced Computer Arithmetic
Design, by M.J. Flynn, S F. Oberman, Wiley) to reduce the calculation. In the above template,
there are 16 bit numbers. And the 17th bit is the sign bit. Also, the partial products on each row
are entered as 1complement numbers. If 2complement numbers are used then the S entries on
the right side can be removed. Please note that the S bit is the sign bit of the booth encoding of
that row)
A A
B B
S1 S1 S1 S1 S1 S1 S1 S1 S1 S1
1 S2 S2 S2 S2 S2 S2
S3 S3 S3 S3
S4
P P
Sign template Sign extension
Example of using the template:
Let us multiply 25 * -35. sign bit
A= +25 00011001
B= -35 11011101
Now decode the multiplier 2 1
110111010
-1 -1

Check these values
B= -1 * 43 + 2* 42 -1 * 41 + 1 * 40 = 35
00011001
1 1 0 1 1 1 0 10
00000000011001 * 1
111111100111 * -1
0000110010 * 2
11100111 * -1
11110010010101
This is a ve number . Convert it
Now in order to reduce computation and extra computing units, all the capacitances use the
provided template as below

Using the Template 25 * -35
Sign bit
00011001
Add SS 110111010
Add inverted S
Add Inverted sign and add 1
10000011001 * 1
Add Inverted sign bit 1011100111 * -1
100110010 * 2
No sign bit 1100111 * -1
11110010010101
This is a ve number. Convert it
00001101101011
512 256 64 32 8 2 1 = 875

3.6 Modified Booth Algorithm::
Booth multiplication algorithm consists of three major steps as shown in the structure of booth
algorithm figure that includes generation of partial product called as recoding, reducing the
partial product in two rows, and addition that gives final product.
For a better understanding of modified booth algorithm & for multiplication, we must know
about each block of booth algorithm for multiplication process.
Modified Booth Algorithm Encoder
This modified booth multiplier is used to perform high-speed multiplications using modified
booth algorithm. This modified booth multipliers computation time and the logarithm of the
word length of operands are proportional to each other. We can reduce half the number of partial
product. Radix-4 booth algorithm used here increases the speed of multiplier and reduces the
area of multiplier circuit. In this algorithm, every second column is taken and multiplied by 0 or
+1 or +2 or -1 or -2 instead of multiplying with 0 or 1 after shifting and adding of every column
of the booth multiplier. Thus, half of the partial product can be reduced using this booth
algorithm. Based on the multiplier bits, the process of encoding the multiplicand is performed by
radix-4 booth encoder.
The overlapping is used for comparing three bits at a time. This grouping is started from least
significant bit (LSB), in which only two bits of the booth multiplier are used by the first block
and a zero is assumed as third bit as shown in the figure.
The figure shows the functional operation of the radix-4 booth encoder that consists of eight
different types of states. The outcomes or multiplication of multiplicand with 0, -1, and -2 are
consecutively obtained during these eight states.

Booth Recoding Table for Radix-4
The steps given below represent the radix-4 booth algorithm:
Extend the sign bit 1 position if necessary to ensure that n is even.

Append a 0 to the right of the least significant bit of the booth multiplier.
According to the value of each vector, each partial product will be 0, +y, -y, +2y or -2y.
Fig::3.6 Booths Encoder
Modified booth multipliers (Z) digits can be defined with the following equation:
Zj = q2j + q2j-1 -2q2j+1 with q-1 = 0
The figure shows the modified booth algorithm encoder circuit. Now, the product of any digit of
Z with multiplicand Y may be -2y, -y, 0, y, 2y. But, by performing left shift operation at partial
products generation stage, 2y may be generated. By taking 1s complement to this 2y, negation is
done, and then one is added in appropriate 4-2 compressor. One booth encoder shown in the

figure generates three output signals by taking three consecutive bit inputs so as to represent all
five possibilities -2X, -X, 0, X, 2X.
3.8 Partial Product Generator::
Fig:: 3.8 Partial Product Generator
If we take the partial product as -2y, -y, 0, y, 2y then, we have to modify the general partial
product generator. Now, every partial product point consists of two inputs (consecutive bits)
from multiplicand and, based on the requirement, the output will be generated and its
complements also generated in case if required. The figure shows the partial product generator
circuit.
The 2s complement is taken for negative values of y. There are different types of adders such as
conventional adders, ripple-carry adders, carry-look-ahead adders, and carry select adders. The
carry select adders (CSLA) and carry-look-ahead adders are considered as fastest adders and are
frequently used. The multiplication of y is done by after performing shift operation on y that is
y is shifted to the left by one bit.
Hence, to design n-bit parallel multipliers only n2 partial products are generated by using booth
algorithm. Thus, the propagation delay to run circuit, complexity of the circuit, and power
consumption can be reduced. A simple practical example to understand modified booth
algorithm is shown in the figure below.

Practical Multiplication Example using Modified Booth Algorithm

3.9 NON-REDUNDANT RADIX-4 SIGNED DIGIT ALGORITHM:

In this section, we present the Non-Redundant radix-4 Signed-Digit (NR4SD) encoding
technique. As in MB form, the number of partial products is reduced to half. When encoding the
2s complement number B
A redundant binary representation (RBR) is a numeral system that uses more bits than needed
to represent a single binary digit so that most numbers have several representations. An RBR is
unlike usual binary numeral systems, including two's complement, which use a single bit for
each digit. Many of an RBR's properties differ from those of regular binary representation
systems. Most importantly, an RBR allows addition without using a typical carry.[1] When
compared to non-redundant representation, an RBR makes bitwise logical operation slower, but
arithmetic operations are faster when a greater bit width is used.[2] Usually, each digit has its own
sign that is not necessarily the same as the sign of the number represented. When digits have
signs, that RBR is also a signed-digit representation. An RBR is a place-value notation system
In an RBR, digits are pairs of bits, that is, for every place, an RBR uses a pair of bits. The value
represented by a redundant digit can be found using a translation table. This table indicates the
mathematical value of each possible pair of bits.
As in conventional binary representation, the integer value of a given representation is a

weighted sum of the values of the digits. The weight starts at 1 for the rightmost position and
goes up by a factor of 2 for each next position. Usually, an RBR allows negative values. There is
no single sign bit that tells if a redundantly represented number is positive or negative. Most
integers have several possible representations in an RBR.
Often one of the several possible representations of an integer is chosen as the "canonical" form,
so each integer has only one possible "canonical" representation; non-adjacent form and two's
complement are popular choices for that canonical form.
An integer value can be converted back from an RBR using the following formula, where n is the
number of digit and dk is the interpreted value of the k-th digit, where k starts at 0 at the
rightmost position:
In the proposed system reduce the delay and efficient architecture of the pre-encoder multiplier
design.In this section, we present the Non-Redundant radix-4 Signed-Digit (NR4SD) encoding

technique. As in MB form, the number of partial products is reduced to half. When encoding the
2s complement number B, digits bNRj take one of four values: {2, 1, 0, +1} or bNR+ j {1,
0, +1, +2} at the NR4SD or NR4SD+ algorithm, respectively. Only four different values
are used and not five as in MB algorithm, which leads to 0 j k 2. As we need to cover the
dynamic range of the 2s complement form, the most significant digit is MB encoded (i.e., bMB
k1 {2,1, 0, +1, +2}).The co efficient is used in non redundant radix 4 signed digit form.
This encoding technique is less complex partial product implementation and more area and
power efficient design. Analysis is verifies the proposed system is efficient from the existing
system. The proposed NR4SD encoding scheme uses one of the following sets of digit values:
f 1; 0; +1; +2g or f 2;1; 0; +1g. In order to cover the dynamic range of the 2s complement
form, all digits of the proposed representation are encoded according to NR4SD except the most
significant one that is MB encoded. Using the proposed encoding formula, we pre-encode the
standard coefficients and store them into a ROM in a condensed form (i.e., 2 bits per digit).
Compared to the pre-encoded MB multiplier in which the encoded coefficients need 3 bits per
digit, the proposed NR4SD scheme reduces the memory size. Also, compared to the MB form,
which uses five digit values f 2;1; 0; +1; +2g, the proposed NR4SD encoding uses four digit
values. Thus, the NR4SD-based pre-encoded multipliers include a less complex partial products
generation circuit. We explore the efficiency of the aforementioned pre-encoded multipliers
taking into account the size of the coefficients ROM.

Fig.3.9 Block Diagram of the NR4SD Encoding Scheme at the (a) Digit and (b) Word Level.
digits bNR j take one of four values: f 2; 1; 0; +1g or bNR+ j 2 f 1; 0; +1; +2g at the
NR4SD or NR4SD+ algorithm, respectively. Only four different values are used and not five
as in MB algorithm, which leads to 0 _ j _ k 2. As we need to cover the dynamic range of the
2s complement form, the most significant digit is MB encoded (i.e., bMB k 1 2 f 2; 1; 0;
+1; +2g). The NR4SD and NR4SD+ encoding algorithms are illustrated in detail in above
figures, respectively.
3.10 RADIX-8 MODIFIED BOOTH ALGORITHM:
Booth's algorithm involves repeatedly adding one of two predetermined values to a product P,
and then performing a rightward arithmetic shift on P.
Multiplier architecture comprise of two architectures, i.e., Modified Booth and Wallace tree.
Based on the study of various multiplier architectures, we find that Modified Booth increases the
speed because it reduces partial products to half. Further, the delay in multiplier can be reduced

by using Wallace tree. Power consumption of Wallace tree multiplier is also less as compared to
booth and array. Features of both multipliers can be combined to produce high speed and low
power multiplier. Modified Booth multiplier consists of Modified Booth Recorder (MBR). MBR
have two parts, i.e., Booth Encoder (BE) and Booth Selector (BS). The basic operation of BE is
to decode the multiplier signal and output will be used by BS to generate the partial product. The
partial products are then, added with the Wallace tree adders, similar to the carry save adder
approach. The last row of carry and sum output is added together by carry look- ahead adder
with the carry skewed to the left by position.
Radix-8 Booth encoding is most often used to avoid variable size partial product arrays. Before
designing Radix-8 BE, the multiplier has to be converted into a Radix-8 number by dividing
them into four digits respectively according to Booth Encoder Table given after wards. Prior to
convert the multiplier, a zero is appended into the Least Significant Bit (LSB) of the multiplier.
fig::Radix-8 booth recoding
Radix 8 Booth recoding applies the same algorithm as that of Radix 4, but now we take quartets
of bits instead of triplets. Each quartet is codified as a signed digit using below Table. Radix 8
algorithm reduces the number of partial products to n/3, where n is the number of multiplier bit s.
Thus it allows a ti me gain in the partial products summation Radix-8 recoding applies the same
algorithm as radix-4, but now we take quartets of bits instead of triplets. Each quartet is codified
as a signed-digit using the table

Here we have an odd multiple of the multiplicand, 3Y, which is not immediately available. To:
generate it we need to perform this previous add:2Y+Y=3Y. But we are designing a multiplier
for specific purpose and thereby the multiplicand belongs to a previously known set of numbers
which are stored in a memory chip. We have tried to take advantage of this fact, to ease the
bottleneck of the radix-8 architecture, that is, the generation of 3Y. In this manner we try to attain
a better overall multiplication time, or at least comparable to the time we could obtain using a
radix-4 architecture (with the additional advantage of using a less number of transistors). To
generate 3Y with 21-bit words we only have to add 2Y+Y, that is, to add the number withthe
same number shifted one position to the left.
A product formed by multiplying the multiplicand by one digit of the multiplier when the
multiplier has more than one digit. Partial products are used as intermediate steps in calculating
larger products.

Partial product generator is designed to produce the product by multiplying the multiplicand A
by 0, 1, -1, 2, -2,-3,-4, 3, 4. For product generator, multiply by zero means the multiplicand is
multiplied by 0.Multiply by 1 means the product still remains the same as the multiplicand
value. Multiply by -1 means that the product is the twos complement form of the number.
Multiply by -2 is to shift left one bit the twos complement of the multiplicand value and
multiply by 2 means just shift left the multiplicand by one place. . Multiply by -4 is to shift
left two bit the twos complement of the multiplicand value and multiply by 2 means just shift
left the multiplicand by two place. Here we have an odd multiple of the multiplicand, 3Y, which
is not immediately available. To generate it we need to perform this previous add: 2Y+Y=3Y.
But we are designing a multiplier for specific purpose and thereby the multiplicand belongs to a
previously known set of numbers which are stored in a memory chip. We have tried to take
advantage of this fact, to ease the bottleneck of the radix-8 architecture, that is, the generation of
3Y. In this manner we try to attain a better overall multiplication time, or at least comparable to
the time we could obtain using radix-4 architecture (with the additional advantage of using a less
number of transistors). To generate 3Y with 8-bit words we only have to add 2Y+Y, that is, to
add the number with the same number shifted one position to the left.
3.11 SIGN EXTENSION CORRECTOR:

Sign Extension Corrector is designed to enhance the ability of the booth multiplier to multiply
not only the unsigned number but as well as the signed number. The working principle of sign
extension that converts signed multiplier signed unsigned multiplier as follows. One bit control
signal called signed unsigned(s_u) bit is used to indicate whether the multiplication operation is
signed number or unsigned number .when sign unsign s_u=0, it indicates unsigned number
multiplication and when s_u=1, it indicates signed number multiplication.

An example of radix8 modified booth encoding algorithm is shown in above figure.
FIG::BLOCKDIAGRAM OF RADIX-8 MODIFIEDBOOTH

CHAPTER IV
4.1Why VHDL?
A design engineer in electronic industry uses hardware description language to keep pace
with the productivity of the competitors. With VHDL we can quickly describe and synthesize
circuits of several thousand gates. In addition VHDL provides the capabilities described as
follows:
4.2 Power and flexibility
VHDL has powerful language constructs with which to write succinct code description of
complex control logic. It also has multiple levels of design description for controlling design
implementation. It supports design libraries and creation of reusable components. It provides
Design hierarchies to create modular designs. It is one language fort design and simulation.
4.3 Device Independent design
VHDL permits to create a design without having to first choose a device foe
implementation. With one design description, we can target many device architectures. Without
being familiar with it, we can optimize our design for resource or performance. It permits
multiple style of design description.
4.4 Portability
VHDL portability permits to simulate the same design description that we have
synthesized. Simulating a large design description before synthesizing can save considerable
time. As VHDL is a standard, design description can be taken from one simulator to another, one
synthesis tool to another; one platform to another-means description can be used in multiple
projects.
4.5 Benchmarking capabilities
Deviceindependent design and portability allows benchmarking a design using different
device architectures and different synthesis tool. We can take a complete design description and
synthesize it, create logic for it, evaluate the results and finally choose the device-a CPLD or an
FPGA that fits our requirements.
4.6 ASIC Migration

The efficiency that VHDL generates, allows our product to hit the market quickly if it has
been synthesized on a CPLD or FPGA. When production value reaches appropriate levels,
VHDL facilitates the development of application specific integrated circuit (ASIC). Sometimes,

the exact code used with the PLD can be used with the ASIC and because VHDL is a well-
defined language, we can be assured that out ASIC vendor will deliver a device with expected
functionality.
4.7VHDL DESCRIPTION
In the search of a standard design and documentation for the Very High Speed Integrated
Circuits (VHSIC) program, the United States Department of Defense (DOD) in 1981sponsored a
workshop on Hardware Description Languages (HDL) at Woods Hole, Massachusetts. In 1983,
the DOD established requirements for a standard VHSIC Hardware Description Language
VHDL, its environment and its software was awarded to IBM, Texas Instruments and
Intermetrics corporations.
VHDL 2.0 was released only after the project was begun. The language was significantly
improved correcting the shortcoming of the earlier versions; VHDL 6.0 was released in 1984.
VHDL 1078/1164 formally became the IEEE standard Hardware Description Language in 1987.
A VHDL design is defined as an entity declaration and as an associated architecture
body. The declaration specifies its interface and is used by architecture bodies of design entities
at upper levels of hierarchy. The architecture body describes the operation of a design entity by
specifying its interconnection with other design entities structural description, by its
behavior behavioral description, or by a mixture of both. The VHDL language groups, sub
programs or design entities by use of packages.
For customizing generic descriptions of design entities, configurations are used. VHDL
also supports libraries and contains constructs for accessing packages, design entities or
configurations from various libraries.
Entities and Architectures
Entity Declaration:
The ENTITY declaration declares the name, direction and data type of each port of
component.
Syntax: entity name is
Part ( );
End name:
Architecture Declaration:

The ARCHITECTURE portion of a VHDL description describes the behavior of the

component.
Syntax: architecture [arch] <entity name > of <entity name> is
Begin
The begin that follows the signal declaration marks the start of the architecture body. The
follows a process declaration, marked by the keyword PROCESS and an ensuring BEGIN.
The END statement ending the architecture must be accompanies by the name of the
architecture which must match the name shown in the first of the architecture.
Sequential Processing
Sequential statements are statements that execute serially, one after other. In architecture
for an entity, all statement are concurrent, in VHDL, the process statements can exist in the
architecture where all statements are sequential.
Syntax:
[process-label:] process [(sensitivity list)]
Process-declarative-part;
Begin
Process-statement-part::=
Sequential statements};
End process [process-label];
A process statement has a declaration section and a statement part in declaration section
types, variables, constants, subprograms, etc., can be declared. Statements part contains only
sequential statements which consist of CASE statements, IF THEN ELSE statements, LOOP
statements, etc.
Sensitivity list
This list defines the signals that will cause the statements inside the process statements to
execute whenever one or more elements of the list change value, i.e., list of signal that the
process is sensitive to. Changes in the values of these signals will cause to process to be invoked.
Sequential Statements

Sequential statements exist inside the boundaries of a process statement, as well as in sub
programs. The sequential statements that are generally used are:
IF
CASE
LOOP
ASSERT
WAIT
IF statement
Syntax: IF (condition) THEN
Sequence_of_statements;
[ELSE condition THEN
Sequence of_ statements ;}
[ELSE
Sequence_of_statements;]
END IF;
The IF statement start with the keyword IF and ends with the keywords END IF.
There are also two optional clauses: they are the ELSEIF clause and the ELSE clause. The
conditional construct in all cases is a Boolean expression. This is an expression that evaluates to
either true or false. Whenever the condition evaluates to a true value, the sequence of statements
following are executed. IF condition is true or false the sequence of statements for the ELSE
clause is executed, if one exits. The IF statement can have multiple ELSE IF statements parts,
only one ELSE statement part, between each statement part can exist more than one sequential
statement.
CASE Statement
The CASE statement is used whenever a single expression value can be used to
select between a numbers of actions.
Syntax: CASE expression is
Case_statemant_alternative;
{Case_statemant_alternative ;}
END CASE;
Alternative: WHEN choice=>

Where choice::=
simple_expression
discrete_range
element_simple _name
OTHERS
A CASE statement consists of the keyboard CASE followed by an expression and the
keyboard is. The expression will either return a value that matches one of the choices in a
WHEN statement part or a match an others clause. After these statements are executed, control is
transferred to the statements following the END CASE clause
The CASE statement will execute the proper statement depending on the value of input
instruction. If the value of instruction is one of the choices listed in the WHEN clause is
executed.
LOOP STATEMENT
The LOOP statement is used whenever an operation needs to be operated. LOOP
statements are used when powerful iteration capability is needed to implement a model.
Syntax: [Loop_label:][iteration_scheme]Loop
END LOOP [loop-label];
Where iteration_scheme:: =
WHILE condition
For loop_parametr_specification;
And Loop_parameter_specification::=
Identifier IN discrete_range
The loop statement has optional label, which can be used to identify the LOOP statement
has an optional iteration scheme that determines which kind of LOOP statement is being used.
The iteration scheme includes two types of LOOP statements, a WHILE condition LOOP

statement and a FOR identifier IN discrete range statement. The FOR loop will loop as many
times as specified in the discrete range, unless the loop is excited from the WHILE condition
LOOP statement will loop as long as the condition expression is TRUE.
In some languages, the loop index can be assigned value inside its loop to change
its value. VHDL does not allow any assignment to the index. This also precludes the loop index
existing as the return value of a function, or an out in out parameter of the procedure.
NEXT Statement
There are cases when it is necessary to stop executing the statements inside the
loop for this iteration and go to the next iteration. VHDL includes a construct that will
accomplish this. The NEXT statement allows the designer to stop processing this iteration and
skip to the successor. When the NEXT statement is executed, processing of the model stops at
the current point and is transferred to the beginning of the loop statement. Execution will begin
with the first statement in the loop, but the loop variable will be incremented to the next iteration
value. If iteration limit has been reached, processing will stop else the execution will continue.
EXIT Statement
During the execution of the loop statement, it may be necessary to jump out of the
loop. This can occur because a significant error has occurred during the execution of the model
or all if he processing has already finished early. The VHDL EXIT statement allows the designer
to exit or jump out of a LOOP statement currently in execution. The EXIT statement causes
execution of halt at the location of the EXIT statement. Execution will continue at the following
the LOOP statement.
The exit statement has three basic types of operations. The first involves an EXIT
statement without a loop label, or a WHEN condition. If these conditions are true, then the EXIT
statement will behave as follows: the EXIT statement will exit from the most current LOOP
statement encounters. If an exit statement is inside LOOP that is nested inside a LOOP
statement, the EXIT statement will exit only the inner LOOP statement. Execution will still
remain in the outer LOOP statement.
ASSERT Statement
The ASSERT statement is a very useful statement for reporting textual strings to
the designer. The ASSERT statement checks the value of Boolean expression for true or false. If

the value is true, the statement does nothing. If the value is false, the ASSERT statement will
output a user-defined string to the standard output of the terminal.
The designer can also specify a severity level with which the text string. The four
levels are, in increasing level of severity: note, warning, error and failure. The severity level
allows the designer the capability to classify messages into proper categories.
The note category is full for relaying information to the user about what is currently
happening in the model. Assertions of category warning can be used to alert the designer of
conditions that can cause erroneous behavior. Assertions of severity level error are used to alert
the designer of the conditions that will cause the model to work incorrectly, or not work at all.
Assertions of severity level failure are used to alert the designer of the conditions within the
model that have disastrous effects.
The ASSERT statement is currently ignored by synthesis tools. Since the ASSSERT
statement is mainly for exception handling while writing a model, no hardware is built.
Syntax: ASSERT condition

[REPORT expression]
[SEVERITY expression];
The keyword ASSERT is followed by a Boolean-valued expression called a condition.
The condition determines whether the text expression specified by the REPORT clause is output
or not. If false, the text expression is output; the text expression is not output .
The REPORT and SEVERITY clauses are optional REPORT clause allows the designer
the capability to specify the value of a text expression to output. The SEVERITY clause allows
the designer to specify the severity level of the ASSERT statement. If the report clause is not
specified, the default value for the ASSERT statement is assertion violation. if the severity
clause is not specified, the default value is error.
WAIT Statement
The WAIT statement allows the design the capability of suspending the execution
of process of subprogram. The conditions for resuming execution of the suspended process or
subprogram can be specified by three different means. These are:
WAIT on signal changes
WAIT UNTIL an expression is true
WAIT FOR a specific amount of time

WAIT statement can be used for number of different purposes. The most common
use is for specifying clock inputs to synthesize tools /the WAIT statement specifies the clock for
a process statement that is read by synthesis tool to create sequential logic such as register and
flip-flops. Other uses of WAIT are to delay process execution for an amount of tome or to
modify the sensitivity list of the process dynamically.
WAIT ON signal
The WAIT ON signal clause specifies a list of one for more signals upon which the WAIT
statement will waits for events.
WAIT UNTIL expression
The WAIT UNTIL Boolean-expression clause will suspend execution of the process until
the expression returns a value of true.
WAIT FOR time-expression
The WAIT UNTIL time-expression clause will suspend execution of the process for the
specified by the time expression. After the time specified in the time expression has elapsed,
execution will continue on the statement following the WAIT statement.
Multiple WAIT statement

A single statement can include an ON signal, UNTIL expression and FOR time-
expressions clauses.
Subprograms
In many programming languages, subprograms are used to simplify coding,
modularity, and readability of descriptions. VHDL uses subprograms for these applications for
these applications as well as for those that are more specific to hardware descriptions. Regardless
of the application, behavior softwarelike constructs are allowed in subprograms. VHDL allows
two forms of subprograms, functions and procedures. Functions return value cannot alter the
values of the parameters. A procedure, on the other hand, is used as a statement, and can after the
values of the parameters.
Functions can be declared in VHDL by specifying:
1. The name of the function
2. The input parameters, if any

3. The type of returned value

4. Any declarations required by computation of the returned value.
Producers can also be written in VHDL. A procedure is declared by specifying:
1. The name of the procedure.
2. The input and output parameters, if any
3. Any declaration required by the procedure itself.
4. An algorithm
The main difference between a function and a procedure is that the procedure argument list
will mostly likely have a direction associated with each parameter, while the function argument
list does not, in a procedure, some of the arguments can be made IN, OUT or INOUT while in a
function all arguments are of mode IN by default can be default and can be of mode IN.
Side effects
Procedures have an interesting problem that is not shared by their functions counterparts.
Procedures can cause side effects to occur a side effect is the result of changing the value of an
object inside a procedure when that object was not an argument to the procedure.
Packages
The primary purpose of a package is to encapsulate elements that can be shared (globally)
among two or more decision units. A package is common storage area used to hold data to be
shared among number of entities. Declaring data inside of a package allows the data to be
referenced by other entities; thus, the data can be shared.
A package consists of two parts: a package declaration section and a package body. The
package declaration defines the interface for the package, much the way that the entity defines
the interface for a model. The body specifies the actual behavior of the package in the same
method that the architecture statement does for a mode.
The package declaration section can contain the following declarations:
#Subprogram declaration
#Type, subtype declaration
#Constant, deferred constant declaration
#Signal declaration, creates a global signal
#File declaration
#Alias declaration
#Component declaration

#Attribute declaration, a user-defined attribute

#Disconnection specification
#Use clause
All of the items declared in the package declaration section are visible to any design that
uses the package with a USE clause. The interface to a package consists of any subprograms or
deferred constants declared in the package declaration. The subprogram and deferred constant
must have a corresponding subprogram body and deferred constants value in the package body
or an error will result.
Data Objects
A data objects holds a value of a specified type. Every data objects belong to one of the
following three classes:
Constants
The object of a constant class can hold a single value of a given type. This value is
assigned before simulation and cannot be changed during the course of simulation.
Syntax: constant identifier: type: =value;
Variables
An object variable class can also hold a single value of a given type. However, different
values can be assigned to the object at different times. Variable can be declared only inside a
process.
Syntax: variable identifier: type[:=value];
Signals
An object of the signal class has a past history of values, a current value, and a set of
future values. Signal objects are typically used to model wires and flip-flops which variables and
constants are used to model the behavior of the circuit. Signals cannot be declared in a process
statement.
Syntax: syntax identifier: type [: =value];

Files
Files contain values of a specified type. We use files to read in stimulus and to write data
when using test benches.
Data types
Every object in VHDL belongs to a certain type. A type is a name that has associated with
it a set of values and a set of operations. All the possible types that can exist in the language can
be categorized into three major categories, which are described next.
XILINXS INTRODUCTION
Xilinx Tools is a suite of software tools used for the design of digital circuits
implemented using Xilinx Field Programmable Gate Array (FPGA) or Complex Programmable
Logic Device (CPLD). The design procedure consists of (a) design entry, (b) synthesis and
implementation of the design, (c) functional simulation and (d) testing and verification. Digital
designs can be entered in various ways using the above CAD tools: using a schematic entry tool,
using a hardware description language (HDL) Verilog or VHDL or a combination of both. In
this lab we will only use the design flow that involves the use of Verilog HDL. The CAD tools
enable you to design combinational and sequential circuits starting with Verilog HDL design
specifications. The steps of this design procedure are listed below
Create Verilog design input file(s) using template driven editor.

Compile and implement the Verilog design file(s).
Create the test-vectors and simulate the design (functional simulation)
without using a PLD (FPGA or CPLD).
Assign input/output pins to implement the design on a target device.
Download bit stream to an FPGA or CPLD device.
Test design on FPGA/CPLD device
A Verilog input file in the Xilinx software environment consists of the following segments:
Header : module name, list of input and output ports.
Declarations : input and output ports, registers and wires.
Logic Descriptions: equations, state machines and logic functions.

End : end modules
DESIGN FLOW OVERVIEWS
The standard design flow comprises the following steps
Design Entry and SynthesisIn this step of the design flow, you create your
design using a Xilinx-supported schematic editor, a hardware description
language (HDL) for text-based entry, or both. If you use an HDL for text-based
entry, you must synthesize the HDL file into an EDIF file or, if you are using the
Xilinx Synthesis Technology (XST) GUI, you must synthesize the HDL file into
an NGC file.
Design ImplementationBy implementing to a specific Xilinx architecture, you
convert the logical design file format, such as EDIF, that you created in the design
entry and synthesis stage into a physical file format. The physical information is
contained in the native circuit description (NCD) file for FPGAs and the VM6 file
for CPLDs. Then you create a bit stream file from these files and optionally
program a PROM or EPROM for subsequent programming of your Xilinx device.
Design VerificationUsing a gate-level simulator or cable, you ensure that your design meets
your timing requirements and functions properly. See the MPACT online help for information
about Xilinx download cables and demonstration boards. The full design flow is an iterative
process of entering, implementing, and verifying your design until it is correct and complete. The
Xilinx Development System allows quick design iterations through the design flow cycle.
Because Xilinx devices permit unlimited reprogramming, you do not need to discard devices
when debugging your design in circuit.

CODE:
RADIX4 BOOTH ALGORITHM:
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;
entity modified_booth_pro is
port( clock : in std_logic;
data1 : in std_logic_vector(7 downto 0);
data2 : in std_logic_vector(7 downto 0);
output : out std_logic_vector(15 downto 0) );
end modified_booth_pro;
architecture ar of modified_booth_pro is
component encoder_sel
input : in std_logic_vector(7 downto 0);
op1 : out std_logic_vector(2 downto 0);
op4 : out std_logic_vector(2 downto 0)
);
end component;
component booth_encoder
data_in : in std_logic_vector(7 downto 0);
ip2 : in std_logic_vector(2 downto 0);
end component;
component carry_save
port( a : in std_logic_vector(8 downto 0);
b : in std_logic_vector(8 downto 0);

c : in std_logic_vector(8 downto 0);

d : in std_logic_vector(8 downto 0);
end component;
signal sel1,sel2,sel3,sel4 : std_logic_vector(2 downto 0);

signal pp1,pp2,pp3,pp4 : std_logic_vector(8 downto 0);
begin
sa: encoder_sel port map( clock => clock,

input => data2,
op1 => sel1,
op2 => sel2,
op3 => sel3,
op4 => sel4);
sb: booth_encoder port map( clock => clock,

data_in => data1,
ip2 => sel1,
output => pp1);
sc: booth_encoder port map( clock => clock,

data_in => data1,
ip2 => sel2,
output => pp2);
sd: booth_encoder port map( clock => clock,

data_in => data1,
ip2 => sel3,
output => pp3 );
se: booth_encoder port map( clock => clock,

data_in => data1,

ip2 => sel4,
output => pp4 );
sf: carry_save port map( a => pp1,

b => pp2,
c => pp3,
d => pp4,
output => output );
end ar;
RADIX8 MODIFIED BOOTH:

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_signed.all;
entity BWM_8_8 is
port(
X : in std_logic_vector (7 downto 0);
Y : in std_logic_vector (7 downto 0);
C : out std_logic_vector(15 downto 0));
end BWM_8_8;
architecture BOOTH of BWM_8_8 is
component PARTIAL_PRODUCT
port(
X : in std_logic_vector ( 7 downto 0);
Y : in std_logic_vector ( 7 downto 0);
PP1 : out std_logic_vector(15 downto 0);
PP2 : out std_logic_vector(11 downto 0));
end component;

signal PP1 : std_logic_vector(15 downto 0);

begin
x0 : PARTIAL_PRODUCT
port map (X,Y,PP1,PP2);
PP3 <= PP2&"0000";
C <= PP1+PP3;
end BOOTH;

4.8 RESULT:
RADIX4
RADIX8:

4.9 APPLICATIONS:
Dot product
Matrix multiplication
Polynomial evaluation (e.g., with Horner's rule)
Newton's method for evaluating functions.
Convolutions and artificial neural networks
4.10 FUTURE SCOPE:

As an attempt to develop, arithmetic algorithm and architecture level optimization techniques for
low power high-speed multiplier design, techniques presented in this thesis has achieved good
results. Several future research directions are possible as follows: In order to enhance the
performance, higher order Radix 16 , Radix 32 can be used to accumulate the partial products.
Deep level pipeline architecture can be used for speed improvements. The ability to construct
very small high performance multipliers provides many other interesting possibilities.
Multiplication intensive applications, such as DSP or graphics, could benefit significantly from
several high performance multipliers on the same chip. A single very high throughput
multiplier, or several multipliers working in parallel on the same chip, could open up new
possibilities such as single chip video signal processors.

4.11 CONCLUSION:
In this paper, a new Radix8 modified booth multiplier architecture to execute the multiplication-
accumulation operation, which is the key operation, for digital signal processing and multimedia
information processing efficiently, was proposed. By removing the independent accumulation
process that has the largest delay and merging it to the compression process of the partial
products, the overall multiplier performance has been improved almost twice as much as in the
previous algorithms.

4.12REFERENCES:
[1] Wen-Chang Yeh and Chein-Wei Jen, High-speed Booth encoded parallel multiplier design,
IEEE Trans. on Computers, vol. 49, isseu 7, pp. 692-701, July 2000.
[2] Jung-Yup Kang and Jean-Luc Gaudiot, A simple high-speed multiplier design, IEEE
Trans. on Computers, vol. 55, issue 10, Oct. pp. 1253-1258, 2006.
[3] Shiann-Rong Kuang, Jiun-Ping Wang and Cang-Yuan Guo, Modified Booth multipliers
with a regular partial product array, IEEE Trans. onCircuit and Systems, vol.56, Issue 5, pp.
404-408, May 2009.
[4] Li-rong Wang, Shyh-Jye Jou and Chung-Len Lee, A well-structured modified Booth
multiplier design, Proc. of IEEE VLSI-DAT, pp. 85-88, April 2008.
[5] A. A. Khatibzadeh, K. Raahemifar and M. Ahmadi, A 1.8V 1.1GHz Novel Digital
Multiplier, Proc. of IEEE CCECE, pp. 686-689, May 2005.
[6] S. Hus, V. Venkatraman, S. Mathew, H. Kaul, M. Anders, S. Dighe, W. Burleson and R.
Krishnamurthy, A 2GHZ 13.6mW 12x9b mutiplier for energy efficient FFT accelerators, Proc.
of IEEE ESSCIRC, pp. 199-202, Sept. 2005.
[7] Hwang-Cherng Chow and I-Chyn Wey, A 3.3V 1GHz high speed pipelined Booth
multiplier, Proc. of IEEE ISCAS, vol. 1, pp. 457-460, May 2002.
[8] M. Aguirre-Hernandez and M. Linarse-Aranda, Energy-efficient high-speed CMOS
pipelined multiplier, Proc. of IEEE CCE, pp. 460-464, Nov. 2008.
[9] Yung-chin Liang, Ching-ji Huang and Wei-bin Yang, A 320-MHz 8bit x 8bit pipelined
multiplier in ultra-low supply voltage, Proc. of IEEE A-SSCC, pp. 73-76, Nov. 2008.
[10] S. B. Tatapudi and J. G. Delgado-Frias, Designing pipelined systems with a clock period
approaching pipeline register delay, Proc. of IEEE MWSCAS, vol. 1, pp. 871-874, Aug. 2005.
[11] A. D. Booth, A signed binary multiplication technique, Quarterly J. Mechanical and
Applied Math, vol. 4, pp.236-240, 1951.
[12] M. D. Ercegovac and T. Lang, Digital Arithmetic, Morgan Kaufmann Publishers, Los Altos,
CA 94022, USA, 2003.
[13] C. S. Wallace, A suggestion for a fast multiplier, IEEE Trans. On Computers, vol. BC13,
pp. 14-17, Feb. 1964.
[14] M.D. Ercegovac et al., Fast Multiplication without Carry- Propagate Addition, IEEE
Trans. Computers, vol. 39, no. 11, Nov. 1990.

[15] R.K. Kolagotla et al., VLSI Implementation of a 200-Mhz 16 _ 16 Left-to-Right Carry-Free

Multiplier in 0.35_m CMOS Technology for Next-Generation DSPs, Proc. IEEE 1997 Custom
Integrated Circuits Conf., pp. 469-472, 1997.
[16] P.F. Stelling and V.G. Oklobdzija, Optimal Designs for Multipliers and Multiply-
Accumulators, Proc. 15th IMACS WorldCongress Scientific Computation, Modeling, and
Applied Math., A. Sydow, ed., pp. 739-744, Aug. 1997.
[17] Passport 0.35 micron, 3.3 volt, Optimum Silicon SC Library, CB35OS142, Avant!
Corporation, Mar. 1998.
[18] G. Goto et al., A 4.1ns compact 54 _ 54-b Multiplier UtilizingSign-Select Booth Encoders,
IEEE J. Solid-State Circuits, vol. 32, no. 11, pp. 1,676-1,682, Nov. 1997.
[19] G. Goto et al., A 54 _ 54-b Regularly Structured Tree Multiplier, IEEE J. Solid-State
Circuits, vol. 27, no. 9, Sept. 1992.
[20] R. Fried, Minimizing Energy Dissipation in High-Speed Multipliers, Proc. 1997 Int'l
Symp. Low Power Electronics and Design, pp. 214-219, 1997.
[21] N.H.E. Weste and K. Eshraghian, Principles of CMOS VLSI Design: A Systems
Perspective, second ed., chapter 8, p. 520. Addison Wesley, 1993.
[22] J. Fadavi-Ardekani, M_N Booth Encoded Multiplier Generator Using Optimized Wallace
Trees, IEEE Trans. VLSI Systems, vol. 1, no. 2, June 1993.
[23] A.A. Farooqui et al., General Data-Path Organization of a MAC Unit for VLSI
Implementation of DSP Processors, Proc. 1998 IEEE Int'l Symp. Circuits and Systems, vol. 2,
pp. 260-263, 1998.
[24] K. Hwang, Computer Arithmetic: Principles, Architecture, and Design, chapter 3, p. 81.
John Wiley & Sons, 1976.
[25] K.H. Cheng et al., The Improvement of Conditional Sum Adder for Low Power
Applications, Proc. 11th Ann. IEEE Int'l ASICConf., pp. 131-134, 1998.
[26] S. Waser and M. J. Flynn, Introduction to Arithmetic for Digital Systems Designers. New
York: Holt, Rinehart and Winston, 1982.
[27] A. R. Omondi, Computer Arithmetic Systems. Englewood Cliffs, NJ:Prentice-Hall, 1994.
[28] A. D. Booth, A signed binary multiplication technique, Quart. J.Math., vol. IV, pp. 236
240, 1952
.http://www.ece.rutgers.edu/~bushnell/dsdwebsite/ booth.pdf

[29] C. S. Wallace, A suggestion for a fast multiplier, IEEE Trans. Electron Comput., vol. EC
13, no. 1, pp. 1417, Feb. 1964. http://lapwww.epfl.ch/courses/
comparith/Papers/1_Wallace_mult.pdf
[30] N. R. Shanbag and P. Juneja, Parallel implementation of a 4_4-bitmultiplier using modified
Booths algorithm, IEEE J. Solid-State Circuits, vol. 23, no. 4, pp. 10101013, Aug. 1988.
[31] G. Goto, T. Sato, M. Nakajima, and T. Sukemura, A 54_54 regularstructured tree
multiplier, IEEE J. Solid-State Circuits, vol. 27, no. 9, pp. 12291236, Sep. 1992.
[32] J. Fadavi-Ardekani, M_N Booth encoded multiplier generator using optimizedWallace
trees, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 1, no. 2, pp. 120125, Jun. 1993.
[33] N. Ohkubo, M. Suzuki, T. Shinbo, T. Yamanaka, A. Shimizu, K.Sasaki, and Y. Nakagome,
A 4.4 ns CMOS 54_54 multiplier using pass-transistor multiplexer, IEEE J. Solid-State
Circuits, vol. 30, no. 3, pp. 251 257, Mar. 1995.
http://www.ece.ucdavis.edu/~vojin/CLASSES/EEC280/Web page/papers/Use%20of%20Pass-
Transistor%20Logic/54x54mult-CMOS-Okhubo-CICC94.pdf
[34] A. Tawfik, F. Elguibaly, and P. Agathoklis, New realization and implementation of fixed-
point IIR digital filters, J. Circuits, Syst.,Comput., vol. 7, no. 3, pp. 191209, 1997.

456 Documentation

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

456 Documentation

Hochgeladen von

Copyright:

Verfügbare Formate

A Project Report on

EFFICIENT AND POWER OPTIMISED MAC DESIGN USING

CHEBROLU ENGINEERING COLLEGE

POWER OPTIMISED MAC DESIGN USING RADIX 8 MODIFIED BOOTH is a piece of

The project viva-voice Exam is held on______of_________, 2013.

HEAD OF DEPARTMENT INTERNAL GUIDE

We express our sincere thanks to our beloved Secretary& Correspondent

We are grateful to N.VIJAYA SHANKER, Head of the Department of Electronics

CHAPTER NO CHAPTER NAME PAGE NO

1.1 MULTIPLICATION ALGORITHM 03

1.2 SERIAL MULTIPLIER 05

1.3 SERIAL/PARALLEL MULTIPLIER 06

1.4 SHIT AND ADD MULTIPLIER 07

1.5 ARRAY MULTIPLIER 08

2.1 BLOCK DIAGRAM OF PROPOSED SYSTEM 18

2.3 BOOTH MULTIPLIERS 19

2.4 COMPARISON OF BOOTH AND SHIFT ADD METHODS 23

2.5 HARDWARE IMPLEMENTATION OF BOOTH 24

2.6 MODIFIED BOOTH ALGORITHM 28

2.7 MODIFIED BOOTH ALGORITHM ENCODER 28

2.8 PARTIAL PRODUCT GENERATOR 30

2.9 NON REDUNDANT RADIX 4 SIGNED DIGIT ALGORITHM 32

2.10 RADIX 8 MODIFIED BOOTH ALGORITHM 34

2.11 SIGNED EXTENSION CORRECTOR 37

4.2 POWER AND FLEXIBILITY 39

4.3 DEVICE INDEPENDENT DESIGN 39

4.5 BENCHMARK CAPABILITIES 39

43.6 ASIC MIGRATION 39

4.7 VHDL DESCRIPTION 40

4.10 FUTURE SCOPE 56

FIGURE NO. FIGURE NAME PAGE NO

1.3 SERIAL\PARALLEL MULTIPLIER 06

1.3.1 PIPELINED VERSION OF AN 8 BIT MULTIPLIER 07

1.4 SHIFT AND ADD MULTIPLIER 08

1.5 CARRY RIPPLE ADDER 10

1.5.1 FUNCTIONAL OPERATION OF CLA 11

1.5.3 ARRAY MULTIPLIER FOR A 32BIT NUMBER 17

3.1 ABSTRACT FORM OF THE FLEXIBLE DATA PATH 19

3.2 FLEXIBLE COMPUTATIONAL UNIT 19

3.5 HARDWARE IMPLEMENTATION FOR MODIFIED BOOTH 25

3.6 BOOTHS ENCODER 30

3.8 PARTIAL PRODUCT ENCODER 30

3.9 BLOCK DIAGRAM OF NR4SD ENCODING 34

3.10 BLOCK DIAGRAM OF RADIX 8 MODIFIED BOOTH 38

CHEBROLU ENGINEERING COLLEGE ECE DEPARTMENT.

CHEBROLU ENGINEERING COLLEGE ECE DEPARTMENT.

multipliers are hard-wired to specific coefficients. In [3], a CSD-based programmable multiplier

Y= Yn-1 Yn-2 ........................Y2 Y1 Y0 Multiplicand

X= Xn-1 Xn-2 ..................... X2 X1 X0 Multiplier

CHEBROLU ENGINEERING COLLEGE ECE DEPARTMENT.

Example 1101 4-bits

Multiplication of binary numbers can be decomposed into additions. Consider the

P15 P14 P13 P12 P11 P10 P9 P8 P7 P6 P5 P4 P3 P2 P1 P0

CHEBROLU ENGINEERING COLLEGE ECE DEPARTMENT.

If the LSB of Multiplier is 1, then add the multiplicand into an accumulator.

1.2 Serial Multiplier::

CHEBROLU ENGINEERING COLLEGE ECE DEPARTMENT.

1.3 Serial/Parallel Multiplier::

Fig:1.3 serial/parallel multiplier

CHEBROLU ENGINEERING COLLEGE ECE DEPARTMENT.

pipelined version of an 8 bit multiplier is shown below.

Fig:: 1.3.1pipelined version of an 8 bit multiplier

1.4 Shift and Add Multiplier::

The project viva-voice Exam is held onof___, 2013.