Wal

Search for documents:
Report this document
View the PDF version
Share on Facebook
Implementation of 88 Wallace tree multiplier using VHDL
1.Introduction
Recent advancements in mobile computing and multimedia applications demand for high performance and low-power consuming VLSI (very large scale integrated circuit) Digital Signal Processing (DSP) systems. One of the most important components of DSP systems is a multiplier. Multiplication is basically shift and add operation. Usually in a DSP system, multiplier units cons ume large amount of power and cause most of the delay compared to other units like adders. D epending on size of the inputs (2 2 bit, 4 4, 8 8 etc.,) the number of steps a normal b nary multiplier takes to compute the product i ncreases drastically. Larger the steps of calculation larger will be the delay as well as the power consumption. Also area occupied by the multiplier on a FPGA (Field Programmable Gate Array) increases. Hence various algorithms have been developed in order to achieve lesser complexity in computation involving minimum calculation steps, which in turn can reduce dela y, power and area constraints of multipliers. Digital multipliers can be classified into serial, parallel and serial-parallelmultipliers. Parallel multipliers are again of two types, Array multipliers and Tree multipliers. Wallace tree multiplier as the name itself indicates belongs to the tree multiplier category. Professor Christopher Stewart Wallace (26 October 1933 7 August 2004) an Australian computer scientist (and Physicist who also contributed to a variety of other areas) devised the Wallace Tree Multiplier algorithm in
1964. The Wallace tree multiplier is considerably faster than a simple array multiplier and is an efficient implementation of a digital circuit that multiplies two integers. The Wallace tree has three computation steps: 1. Multiply (that is - A ND) each bit of one of the arguments, by each bit of the other, yielding results. Depend ing on position of the multiplied bits, the wires carry different weights, for example wire of bit carrying result of is 32 (see explanation of weights below).
2. Reduce the number o f partial products to two by layers of full and ha lf adders.
3.Group the wires in two numbers, and add them with a conventional adder.
Fig 1.1: Wallace tree multiplier reduction stages for 8X8 multiplication. The detailed explanation of WTM algorithm is given in chapte r 3. Wallace tree is an implementation of an adder tree designed for minimum propagation delay. Rather than completely adding the partia l products in pairs like the ripple adder tree does, the Wallace tree sums up all the bits of the s me weights in a merged tree usually full adders are used, so that 3 equally weighted bits are combined to produce two bits: one (the carry) wit h weight of n+1 and the other (the sum) with weight n. Each layer of the three therefore red uces the number of vectors by a factor of 3:2 (Another popular scheme obtains a 4:2 reducti on using a different adder style that adds little de lay in an ASIC implementation). The tree has s many layers as in necessary to reduce the num ber of vectors of two (a carry and a sum).A co nventional adder is used to combine these to obtain the final product. These multipliers are also used to implement in FIR filters to reduce redundant computations. Wallace tree multiplier is faster than carry save structure for large multiplier word lengths; it has the disadvantage of b eing very irregular. This complicates the task of coming up with a dense and efficient layout. Wallace tree multipliers are only used in designs where performance is critical and design time is only a secondary consideration.
With the advancement in the field of VLSI and the availability of high performance FPGA kits any complex algorithm can be implemented with not much difficulty. In this project we have built an 8 8 multiplier using VHDL (Very high speed integrated circuit Hardware Description Language). VHDL is a hardware description language that can be used to model a digital system. It contains elements that can be used to describe the behaviour or structure of the digital system and its timing explicitly. The language provides support for modelling the system hierarchically and also supports top-down and bottom-up design methodologies. The system and its subsystems can be described at any level of abstraction ranging from the architecture level to the gate level. Precise simulation semantics are associated with all the language constructs and therefore, models written in this language can be verified using a VHDL simulator (Example: Xilinx ise simulator. We have used 9.2i version of the simulator for development of WTM VHDL code)
3
Bapatla Engineering College, Department of E.I.E
2.Binary Multipliers
2.1Types of Multipliers:
Fig.2.1 Tree diagram for multipliers In many digital signal processing operations such as correlations, convolution, filtering and frequency analysis one needs to perform multiplication. Multiplication algorithms used to illustrate methods of designing different cells so that they fit in to large structure. In order to introduce these designs, simple serial and parallel multipliers will be introduced. The basic form of
multiplication defines as a mathematical operation that at its simplest is an abbreviated process of adding an integer to itself a specified number of items. A number (multiplicand) is added to itself a number of times as specified by another number (multiplier) to form a result (product). The multiplicand is then multiplied by each digit of the multiplier beginning with the rightmost, least significant Digit (LSD). Intermediate results (partial-products) are placed one a top the other, offset by one digit to align digits of the same weight. The final product is determined by summation of all the partial-products. Although most people think of multiplication only in base 10, this technique applies equally to any base, including binary.
4
Bapatla Engineering College, Department of E.I.E Implementation of 88 Wallace tree multiplier using VHDL
2.2 Parallel Multiplication:

The principle of operation of the parallel multipliers usually utilized in the VLSI circuits is rather simple. It has origin from the classical algorithm for the product of two binary numbers. EXAMPLE: 1010Multiplicand 1001Multiplier
1010 0000Partial products 0000 1010 1011010 Result
In general, the breakdown of the multiplication of two strings of n bit is a string of 2n-1bit got from the sum of n partial products. One can distinguish two phases: generation of the partial products and their sum. The acceleration of the process of multiplication is based on two main techniques: Reduction of the number of the partial products, acceleration in the summation of the partial products.
2.3 Binary Multiplication:

In the binary number system the digits, called bits, are limited to the set [0, 1]. The result of multiplying any binary number by a single binary bit is either 0, or the original number. This makes forming the intermediate partial-products simple and efficient. Summing thesepartialproducts is the time consuming task for binary multipliers .One logical approach is to form the partial-products one at a time and sum them as they are generated. Often implemented by software on processors that do not have a hardware multiplier, this technique works fine, but is slow because at least one machine cycle is required to sum each additional partial-product. For applications where this approach does not provide enough performance, multipliers can be implemented directly in hardware.
5
2.4 Hardware Multipliers:

Direct hardware implementations of shift and add multipliers can increase performance over software synthesis, but are still quite slow. The reason is that as each additionalpartial-product is summed a carry must be propagated from the least significant bit (LSB) to the most significant bit (MSB).This carry propagation is time consuming, and must be repeated for each partial product to be summed. One method to increase multiplier performance is by using encoding techniques to reduce the number of partial products to be summed. Booth first proposed just such a technique. The original booths algorithm ships over contiguous strings of 1s by using
the property that: 2n + 2(n-1) + 2(n-2) ++2(n-m) = 2(n+1) 2(n-m). Although booths algorithm produces at most N/2 encoded partial products from an N bit operand, the number of partial products produced varies. This has caused designers to use modified versions of booths algorithm for hardware multipliers. Modified 2 bit booth encoding halves the number of partial products to be summed. Since the resulting encoded partial- products can then be summed using any suitable method, modified 2 bits Booth encoding is used on most modern floating-point chips [LU 881], [MCA 861]. A few designers have even Turned to modified 3 bit Booth encoding, which reduces the number of partial-products .The problem with 3-bit encoding is that the carry-propagateaddition required forming the 3X multiplies often overshadows the potential gains of b3-bit Booth encoding. To achieve even higher performance advanced hardware multiplier architectures search for faster and more efficient methods for summing the partial-products. Most increase performance by eliminating the time consuming carry propagate additions. To accomplish this, they sum the partial-products in a redundant number representation. The advantage of a redundant representation is that two numbers, or partial- products, can be added together without propagating a carry across the entire width of the number. Many redundant number representations are possible. One commonly used representation is known ascarry-save form. In this redundant representation two bits, known as the carry and sum, are used to represent each bit position. When two numbers in carry-save form are added together any carries that result is never propagated more than one bit position. This makes adding two numbers in carry-save from much faster than adding two normal binary numbers where a carry may propagate. One common method that has been developed for summing rows of partial products using a carry-save representation is the array multiplier.
6
2.5 Array Multipliers:

Conventional linear array multipliers consist of rows of carry- save adders (CSA). In a linear array multiplier, as the data propagates down through the array, each row of CSAs adds one additional partial-product to the partial sum. Since the intermediate is kept in a redundant,carrysave form there is no carry propagation. This means that t he delay of an array multiplier is only dependent upon depth of the array, and is independent off the partial-product width. Linear array multipliers are also regular, consisting of replicated rows of CSA s. Their high performance and regular structure have perpetuated the use of array multipliers of VLSI math co-processorsand special purpose DSP chips.
Fig. 2.2 Array multiplier
2.5.1 Ripple Carry Arra y Multipliers: Features:

1.Row ripple form. 2.Unrolled shift-add algorithm. 3.Delay is proportional to N. A ripple carry array multiplier (also called row ripple form) is an unrolled embodiment of the classic shift-add multiplication algorithm. The illustration shows the adder structure used
to combine all the bit produc ts in a 4x4 multiplier. The bit products are the logical and of the bits from each input. They are shown in the form x, y in the drawing. The maximum delay is the path from either LSB inp ut to the MSB of the product, and is the same (ignoring routing delays) regardless of the pat h taken. The delay is approximately 2n.
Fig. 2.3 Ripple carry array multiplier This basic structure is simple to implement in FPGAs, but does not make efficient use of the logic in many FPG As, and is therefore larger and slower than oth er implementations.
2.5.2 Carry Save Array Multipliers Features:

1.Column ripple form 2.Fundamentally same delay and gate count as row ripple form 3.Gate level speed ups available for ASICs 4.Ripple adder can be replaced with faster carry tree adder
5.Regular routing patter n
Fig 2.4 Carry Save Adder Multiplier
Instead of using Ripple carry adder (RCA), we can use Carry look ahead logic and Carry save adder for adding each group of partial product terms because RCA is slowest adder among all other fast adders available. Figure 2.4 shows architecture of carry save array multiplier to add each group of partial products in parallel. The basic algorithm for multiplication process remains same. Only difference between them is in the way of performing addition of partial products and final addition.
2.5.3 Wallace Tree Multiplier: Features:

1.Optimized column adder tree.
2.Combines all partial products in to two vectors (carry and sum). 3.Carry and sum outputs combined using a conventional adder. 4.Delay is log (n). 5.Wallace tree multiplier uses Wallace tree to combine 1 X m partial products. 6.Irregular routing. A detailed description of Wallace tree is presented in next chapter.
9
2.6 Computation Sharing Multiplier:

CSHM consists of precomputer, select units and adders (S&A). The precomputer produces the multiplication of alphabets with input x and the S&A performs add and shift operations required to obtain the final output.
Fig 2.5 Computation Sharing Multiplier
10
3.WTM Algorithm
3.1Algorithm for 4 bit multiplier:
To understand the Wallace tree algorithm consider the multiplication of two 4 bit integers (both positive) using WTM method. Let A=a3a2a1a0 and B=b3b2b1b0 where ai and bibinary digits i.e. they can take the value either 0 or 1. Step 1: Generation of Partial products The first step is similar to normal binary multiplication that we use. This step generates Partial products (PPs).
a3 b3
Partial products
a2 b2
a1 a0 b1 b0
a3b0 a2b0 a1b0 a0b0 a3b1 a2b1 a1b1 a0b1
a3b2a2b2 a1b2 a0b2 a3b3 a2b3a1b3 a0b3
Weight of the column
26
25 24
23 22 21 20
Fig 3.1. 4 bit multiplication Partial Products

The elements a0b1 to a3b3 are called partial products (pp). Each PP has its place weight in powers of 2 as shown in figure 3.1. Weight of a partial product aibj is given by 2x where x=i+j.
11
Step 2: Reduction Stages: Now the generated partial products are added using Half adders and Full adders. Following guide lines are to be followed in the addition process. 1.If the column has only one partial product it can be directly propagated to the output in the same column (sum output of same weight). No reduction is necessary. 2.If the column has two partial products only, a half adder is to be used to generate sum output of same weight and carry output of next weight. 3.If the column has 3 or more PPs, we have to use at least one full adder. As many full adders are used as possible because, a full adder will reduce 3 PPs at a time into sum and carry. 4.Addition of any two PPs results in two outputs: i) The sum bit and ii)The carry bit 5.Sum bit is of weight 2x and carry bit is of weight 2x+1 where x is the weight of partial products of that addition operation. 6.After addition, the sum bit remains in same column for next stage reduction and carry bit propagates to next left column for next stage reduction.
7.When only two rows of PPs are left over, for the final reduction stage, we use a parallel adder to give the final output. (This point is discussed clearly below).
12
First Level Reduction of 4 bit multiplication partial products is shown below
Fig 3.2. Use of full adders and half adders for reduction of PPs
The above figure shows which partial products are given as inputs to which adder (Half or full adders). Bits in same column have same weight in powers of 2. The green rectangles represent half adders and blue ellipses represent full adders. Red circles are directly carried to final output level. The partial products P00 and P33 are not given as inputs to any adder and are marked with red circles since there are no other PPs in those columns except them. See that the column 3 has four PPs. But a Full adder can take a maximum of 3 inputs. Hence the 4th PP (P03marked in grey) is left over and will be added in next reduction level to the sum bit of addition of P 30+P21+P12 and carry bit of previous column addition i.e. P20+P11+P02. Note: Pi j = aibj. The products aibj in Fig: 3.1 are represented as Pi j in Fig: 3.2
13
Second Level Reduction of 4 bit multiplication partial pr oducts is shown below.
Fig 3.3. Second Level of reduction

S1, s2, s3, s4 and s5 are sum bits of full adders and half adders in level-1 r eduction. c1, c2, c3, c4, c5 are carry bits of adders of previous column. Rectangles are half adders and Ovals are full adders similar t o Fig: 3.3. We can see that 4 rows in level-1 are reduced to 3 rows. In level-2 these 3 rows will be reduced to just 2 rows. These 2 rows will be reduced in level-3.
Third Level Reduction of 4 bit multiplication partial products is shown below
Fig 3.4. Third Level of reduction

M1 to M6 are sum bits (M0=P00, M1=S1 are taken directly. Remaining M s are outputs from adders). N2 to N5 are carry bits from previous column additions.
Now the left over two rows are given to a single parallel adder instead of using 4 half adders. This is discussed in detail in the step-3 given below Step-3: Last Stage Reduction For the last stage reduction it is quite easy and convenie nt to use predefined adders provided on FPGA c hips. These adders are designed to efficiently ad d given inputs with less delay. Hence the final stage of Wallace tree multiplier uses parallel adders available on the chips. (While describing the architecture of the multiplier using VHDL language, we convert the final two levels into decimal integers and perform addition using + operator. Zeros are appended to the rows if any column has only one partial product in the two left over rows. This step is equivalent to above process).
Fig: 3.5. Last stage reduction using parallel adder

3.2 Algorithm for 8 bit multiplier:

To understand the Wallace tree algorithm consider the multiplication of two 8 bit integers (both positive) using WTM method. Let A= and B= where aiand bi binary digits i.e. they7 6can5 take4 3the2 value either 0 or 1.
1 0
Step:17 6Generation5 4 3 2 1 of0 partial products
Generation of partial products

6 5 4 3 2 1 0
5 4
1 0
7 0
50
10
0 0
72
6 2
5 2
32
73
63
5 3
4 3
23
7 5
6 5
55
45
3 5
2 5
05
7 6
6 6
5 6
46
36
2 6
1 6
Step:2 Reduction stages
Stage:1
14 130
120
11
100
90
80
50
30
13
120
110
11
101
91
81
51
31
10
90
82
40
20
10
91
70
41
71
72
17
Stage:2
140
13
12
110
100
90
50
140
130 120
110
111
101
80
7 60
40
3 20
100
90
81
7 61
51
91
87
Stage:3
15
14
130
120
11
100
90 8
40
140
130
12
11
100
90
80 70
60
50
40
30
91 8
Stage:4
15
14
13
12
11
10
10
0 0
Final stage:
11
18
S. No 1 2 3 4 5 6 7 8
Size of Multiplier input 2 3 4 5 6 7 8 9
No. Of Reduction Stages 1 1 3 3 3 4 4 4
Including last stage 2 2 4 4 4 5 5 5
9 10 11 12 13 14 15 16
10 11 12 13 14 15 16 17
5 5 5 5 6 6 6 6
6 6 6 6 7 7 7 7
19
17 18 19 20 21 22 23 24
18 19 20 21 22 23 24 25
6 6 6 7 7 7 7 7
7 7 7 8 8 8 8 8
25 26 27 28 29 30 31
26 27 28 29 30 31 32
7 7 7 7 8 8 8
8 8 8 8 9 9 9
Table 3.1 Size of multiplier Versus No of Reduction stages in WTM algorithm
20
4.VHDL description of WTM

4.1 Introduction to VHDL:
VHDL is the abbreviation of Very high speed integrated circuit Hardware Description Language. It is a hardware description language that can be used to model a digital system. It contains elements that can be used to describe the behaviour or structure of any digital system and its timing explicitly. The language provides support for modelling the system
hierarchically and also supports top-down and bottom-up design methodologies. The system and its subsystems can be described at any level of abstraction ranging from the architecture level to the gate level. Precise simulation semantics are associated with all the language constructs and therefore, models written in this language can be verified using a VHDL simulator (Example: Xilinx ise simulator. We have used 9.2i version of the simulator for development of WTM VHDL code).
In the mid 1980s, the U.S. Department of Defence (DoD) and the IEEE (Institute of Electrical and Electronic Engineers) sponsored the development of a highly capable hardwaredescription language called VHDL. This language started out as a documentation and modelling language allowing the behaviour of digital system designs to be precisely specified and simulated. While the language and the simulation environment were important innovations by themselves, VHDLs utility and popularity took a quantum leap with the commercial development of VHDL synthesis tools. These programs can create logic-circuit structures directly from VHDL behavioural descriptions. Using VHDL you can design, simulate and synthesize anything form a simple combinational circuit to a complete microprocessor system on a chip.
4.2 VHDL Structure and Behaviour:

There are two fundamentally different ways of specifying logic. The first method is to specify the structure of the logic and the other method is to specify the behavior of the system. Structure: The structure of a system describes that system in terms of "what is connected to what". The system is thus broken down into smaller units which are connected together to form
21
a whole. The whole unit is called an entity. Inputs and outputs to/from the entity are called ports.
Every VHDL progra m is of two main blocks first is the entity d eclaration and the second is architecture description. Entity declaration specifies the name of digital system being modelled and lists the set of interface ports. Ports are the signals throu gh which the entity communicates with the other models in its external environment. Example: entity HALF_AD DER is Port (A< B: in BIT; SUM, CARRY: out BIT); end HALF_ADD ER;
Fig 4.1: Half adder component of VHDL modelled WTM
Here the half adder system has two input ports, A and B and two output ports SUM and CARRY. The type BIT implies binary data type i.e. the ports can take either 0 or 1 value.
Fig 4.2: Full adder c omponent of VHDL modelled WTM
Here the full adder system has three input ports, A , B and C two output por ts SUM and CARRY. The type BIT implies binary data type i.e. the ports can take either 0 or 1 value
Behaviour: The structural descriptions are extremely useful for building systems from smaller components. Ultimately we need to specify what each component actually does. In this case structural descriptions are of limited use. We need some way of describing the behaviour of the entities that we use. At the very minimum we need to specify the behaviour of the entities at the bottom of the hierarchical structure. VHDL thus provides a means of specifying the behaviour of entities by means of a behavioral description. Behavioral descriptions resemble a programming language. We will discuss syntax as we need it to implement the concepts that follow. Please bear in mind as we go that, despite the similarities, a VHDL description is a hardware description, and that the analogy to a programming language is actually fairly weak.
Architectures: Architecture description gives the complete functioning of the entity. The internal details of the entity are specified using any of the following modelling styles. 1.Structural modelling 2.Data flow modelling 3.Behavioural modelling 4.Combination of any of the above three. For the description of Wallace tree multiplier we have made use of the structural modelling in which entity is described as a set of inter connected components. The entity declaration is given below. entity wallstree is Port ( A,B : in STD_LOGIC_VECTOR (7 downto 0); O: OUT STD_LOGIC_VECTOR( 15 DOWNTO 0)); end wallstree; Here are the declarations of two components used by the Wallace tree multiplier, the half adder and full adder components and a over view of the total structure (Full scale executable VHDL code for WTM is not presented here).
23
architecture Behavioral of wallstree is COMPONENT FA PORT ( A,B,C :IN STD_LOGIC; S,Cout: OUT STD_LOGIC); END COMPONENT; COMPONENT HA PORT ( A,B: IN STD_LOGIC;S,Cout: OUT STD_LOGIC); END COMPONENT; ----------signal descriptions-----Begin
-------round 1 reduction ---------------------------------------round 4 reduction----------------------------------
; ;
End behavioural;
The FA (full adder) component takes three signals as inputs A, B and C and gives two outputs, sum S and carry Cout. Similarly HA (half adder) component takes two inputs and results two outputs. Recall from the algorithm that we make use of these components in reduction stages. As described in the algorithm, we have used four stages of reduction plus a last stage parallel addition in the structural description of WTM for the multiplication process. The partial products are taken as individual signals in order to reduce delays. In the final stage reduction, the two left over rows are converted to integers and are added using + operator.
4.3 VHDL Syntax

Comments: comments are preceded by "--" (two dashes) and extend to the end of the line. Identifiers: Identifiers are names of signals, variables, components etc. etc. The names must all begin with a letter. After the initial letter they may have any combination of letters, numbers and underscores. VHDL is not case sensitive. Numbers: VHDL has support for a very wide range of number representations. Numbers may be in any base between 2 and 16. The hash (#) separator is used to separate the base from the rest of the number. e.g. 2#10101010=10101010 base 2. Note that the base is an integer expressed in base 10.
24
Numbers may have decimal points. These numbers are real numbers with a fixed fractional part.
The VHDL standard supports floating point numbers although some VHDL compilers do not. Floating point operations are extremely expensive to implement because of the inefficiency incurred in synthesizing the floating point operations. e.g. 3.1415 is a fixed point real number.
Remember that VHDL only looks like a programming language. Bear in mind that when you use a floating point number the required operation has to be constructed out of gates when the design is put onto a chip. Numbers may have exponents. The exponents are expressed in decimal and may only be integers. Exponents may be positive or negative. The exponent is signified by an 'E'. The exponent is raised to the base. For example a number expressed in base 2 will be multiplied by 2 to the power of the exponent. Example: 2#1010E1=1010*2^1=10100=20decimal. Characters and Strings: Characters are enclosed in single quotation marks and are equivalent to their ASCII number. e.g. '0'=48 Strings are enclosed in double quotation marks. Example "Samuel" To include a double quotation mark inside quotation marks use it twice "Fred said" "hello"" Strings may be formed in other number bases. These bases are binary octal and hex. Example: B"10101010"=X"AA"=O"252"
25
Data Types: There are a number of data types in VHDL. Not all data types supported by VHDL will be available to us. We will only mention a sunset of those that are supported by QUARTUS.
1. Integer types: These are used for representing numbers with no fractional part. VHDL guarantees that integers will be represented by at least 32 bits. Integers are signed. We can limit the range of integers by using the range keyword. This is useful for limiting the size of the representation and for making your code more compiler independent. In general limit your range to the size that you need. When we limit the range of a type we obtain a Subtype. It is possible to have ascending or descending ranges by using the to or downto keywords. This is particularly useful when defining groups of bits. Enumerated types are supported in QUARTUS. These types are used for convenience. 2. Array types: These have the same meaning that you are used to. Arrays may have any number of dimensions and may be formed from any supported type, or previously defined type. There are two predefined types, strings and bit vectors. The definition for the string type is given here: type string is array (positive range <>) of character;
The definition for bit vectors is this: type bit_vector is array (natural range <>) of bit;
The "<>" angle brackets are called a box. They are placeholders which indicate that the range will be filled in when the types are used.
26
Bit vectors are essentially an array of bits used for binary numbers.
4.4 VHDL Operators

Like any language VHDL has a set of operators. These operators may be divided into three types: Logical, arithmetic and comparison. Logical operators: Logical operators are used to derive logical results from bits and bit_vectors. When used with bit_vectors the operators act in a bitwise manner. When used with bit_vectors, those vectors must be of the same length. The logical operators are as follows: NOT AND OR NAND NOR XOR These should require no further explanation. In addition there is a concatenation operator. This is the ampersand "&". This concatenates two bit vectors. Arithmetic operators: These operators perform basic arithmetic operations on numbers. Normally these operations are only intended for use on integers. Some VHDL suites come with libraries which contain arithmetic functions for use with bit_vectors and std_logic vectors.
The operators are: + addition - subtraction * multiplication
27
/ division The only caution that applies here is that some packages only support multiplication and division by integer powers of two. This in effect reduces multiplication and division to the status of arithmetic shift operations. Comparison operators: These operators are used to compare two scalar or bit_vector arguments and they return 'true ' or 'false'. If bit_vectors are compared then the vectors must be of the same length. The operators are: <smaller than <= smaller than or equal >greater than >= greater than or equal =equal /= not equal Processes: We sometimes want a set of VHDL statements to execute sequentially, one after the other. This allows us to build up a set of instructions that are 'executed' one after the other. The execution of this list is triggered by some event happening, very often on a port of the entity.
It is possible to have more than one process in an architecture. Each process is triggered independently, based on the sensitivity lists, and each process will execute concurrently. This means that the processes will effectively execute in parallel. There is no time correlation between lines in different processes. Note that if different processes attempt to update common signals or ports then the result is undefined as it is not always possible to tell which process caused the last update. Many compilers will issue a warning or error message about this. As an example suppose you wish to implement the following hardware:
28
Effectively we have parallel hardware. We want the addition and subtraction to occur simultaneously if possible and we need to latch the results of these operations in the registers on the rising clock edge. Changing Execution Flow in VHDL: In order for any sort of program to be truly versatile there must exist a means of changing the "program flow". You will have seen examples of this in any programming language that you may have used. VHDL offers similar facilities. Variable assignment: Variables are needed to hold data temporarily. Variables are distinctly different from ports and signals. A value may be assigned to a variable by means of the variable assignment operator ":=". The type of the variable(s) and data must be the same. Overloading: VHDL allows us to overload procedures and functions. This means that we may use the same procedure name for more than one function provided the parameter list differs.
29
The procedure may have a different number of parameters and/or the parameters may be of different types. This allows us to use the same procedure name on different data. We can also overload operators. function "+" (a, b : word_32) return word_32 is begin
return int_to_word_32( word_32_to_int(a) + word_32_to_int(b)); end "+"; Within the body of this function, the addition operator is used to add integers, since its operands are both integers. However, in the expression: X"1000_0010" + X"0000_FFD0" the newly declared function is called, since the operands to the addition operator are both of type word_32. Note that it is also possible to call operators using the prefix notation used for ordinary subprogram calls, for example: "+" (X"1000_0010", X"0000_FFD0") Many functions are overloaded in the VHDL libraries. The ieee.std_logic libraries overload operators for use with the std_logic types and we will see more examples of this later on. Packages: A package is a collection of VHDL constructs. The constructs can be of totally unrelated types, the only commonality being that usually a package implements some particular service and all of the constructs assist in some way. The package also provides a means of hiding the working code behind functions and procedures, which is good for security as well as for "information hiding and abstraction".
30
4.5 The Xilinx ise Design Suite:

Xilinx ISE is HDL designs, which a software tool produced by Xilinx for synthesis enables the developer and analysis of designs, to different to synthesize ("compile") their
perform timing analysis, e xamine RTL diagrams, simulate a design's reaction
stimuli, and configure the target device with the programmer. For the d evelopment of our project, the ise 9.1i version of design suite is used. However the code can be executed using any of the later or previous versions of the software. The environment pr ovides an editor to type in the code, synthesis and verification tool, timing and power analy ses and many other utilities.
Fig 4. 3: Xilinx ise-9.2i project navigator window.
4.6 Simulation of W TM using Modelsim 6

After coding a VH DL program using the ise design suite and syn thesising it we will get to know if there are any syntaxial errors in the program. Also the synthesis tool provides synthesis report with a lot of details on the design (the synthesis report of WTM is provided in
the appendix). But this synth esis does not check if the program does what i t must do. To verify the functioning of the code we have to simulate the code by providing virtual inputs using Simulators. The ise suite provides ise simulator for which we have to devvelop a test bench file, but with Modelsim s oftware can give inputs dynamically. With test bench, we have to offer inputs prior to simulation.
Fig 4.4: Test bench waveform We can see from t he test bench the way inputs A and B are to be given at different times. The simulation result will give output O for inputs at every instant specified in the test bench. Here is an image showing t he model simulator to give you an idea how to verify outputs.
Fig 4.5: Simulation waveform showing inputs a, b and output o The figure presented in the previous page shows the simulated output for the inputs a and b. Here a is taken as 0 0010010 i.e. 18 and b is taken as 00101000 i.e.40 . The output from the simulation is 0000001011010000 which is 720 in decimal form. Hence it is verified that the code is functioning as a prop er multiplier.
4.7 Design Summary:
Fig 4.6: Design summary of Wallace tree multiplier code .
5.Modelsim 6.3f
This is a really quick introduction to modelsim.
5.1 Starting the program

You can start modelsim with the command vsim. Then you probably see something similar to this:
In the left hand window, you can now see the standard library. In the picture, I've opened the library ieee.std_logic_arith, and if you want to see what's in it, then right-click on it and select edit. In this window you will see what you are working with. Notice that there are tabs below it. When we open a project, or start a simulation, more tabs will appear there.
35
In the right window, you can type in commands to modelsim. Most commands you will need to use are also available in the menues. ModelSim seems to have a will of its own, and sometimes, you will find these two windows stacked on top of each other, instead of next to each other.
5.2 Starting a project

In the file menu, choose new>project. Choose a project name and a home directory. Leave work as the default library name. The window "add items to the project" will appear. Here you can create vhdl-files, add existing files, and so on. As an example, we will add the example source from the second lecture. Donwload it and put it in the directory where you have created your project. Click on "Add Existing File". In the dialog that appears, you can locate the file using browse. Then press ok. The file will then appear in the main modelsim window. Close the "add items to the project" window (you must do that before you can do anything in the main window). The main window may now look something like the following:
36
Here you see all files that you have added to your project. Note the four rightmost buttons in the top. The three first of these have the following functions: compile one file,compile all files, and simulate. If you press compile all files, the blue question mark will probably turn into a red X, which means compilation error. Select compile summary in the compile-menu(or double click on the red X) to see the error messages:
37
As you see, there was a large number of errors. You probably wan t to check the code, and see what they mean. You can do that by right clicking on the file (in the main screen), and selecting edit (or double clicking on an error message). This will show you the following window:
In line 7, the compiler expected a semicolon and found entity. Looking at the editwindow, we see that line 7 reads end entity count_pos_edges;. The error is due to the fact that modelsim by default reads all VHDL code as if it was VHDL-87, where one for instance wrote end; instead of end entity ... The solution to this is to right click on the file again, and select properties.
Under the tab VHDL, there is an option labelled Use 1993 Language Syntax. Select that option, and press ok. Now recompile the file. Now you should get a green check mark, indicating successful compilation.
39
5.3 Simulating circuit

When we've successfully compiled our circuit we can simulate it. Press the simulate button (or select simulate in the simulate menu). The following window will appear:
The window shows all loaded libraries. All entities we declare end up in the librarywork, so expand that library by pressing the plus-sign next to it. Then we can select what circuit to simulate. Select add_tester and press ok. Now a new tab will appear in your main window:
40
Here you see a hierarchical view of the architecture we chose to simulate. At top level you see the architecture add_tester(arch) (which means the architecture arch of the entity arch_tester) which we chose to simulate. Apparently it contains instances of two architectures: add4bit(structural) and add4bit(behavioral). The structural version of add4bit contains subentities of its own. Right click on the toplevel architecture, and select Add>Add to wave. The wave window will appear:
41
This window shows how the values of the different signals varies over time (like an oscilloscope). If we like to, we can add the subentities to the wave window too, just like we did with the toplevel entity. By pressing the plus sign on a composite signal (which includes std_logic_vector, signed, unsigned, and similar signals), we can show the values of each individual bit. There are different ways to run the simulation. Here are a few: Select how long time you want to run the simulation for in the small input field on top of the simulation window, and then press the run button next to it. For this testbench, 4 us is an appropriate value. Type in the command run 4 us in the command window. There are some functions in the run submenu in the simulatemenu. After you have run your simulation, you probably want to press the dark magnifying lens in the top of the wave window, to view your whole simulation:
42
Some useful things you can do now include: Hold down the left mouse button somewhere in the waves, to see the values of all waves at that particular point in time. You can find the previos/next event on a signal by clicking on the signal, and then pressing
43
You may also want to look at the edit window during simulation:
Using the buttons
you can step through your VHDL code, one line at a time. This can be
rather useful for debugging.
44
Result and Conclusion

VHDL is used for the implementation of 8 bit Wallace tree multiplier, its functioning is verified using ise simulator and modelsim. On comparison with array multiplier WTM is very advantageous in terms of complexity of computation,propagation delay. Also computation sharing multiplier and carry save adder multiplier which are expected to perform the multiplication operation efficiently are less low performing compared to WTM. The main advantage of WTM is, as the number of bits increase (i.e. 16X16 or 32X32) the delay and power ratings improve further. However WTM has irregular pattern and occupies slightly more area compared to array multipliers but it has best performance for high speed multiplication. The conclusion is WTM is a high performance algorithm in applications where area is not a constraint but power and complexity are vital. RESULTS: Multiplier Time delay(ns) Area(No. of .LUTs) Memory space used(kilo bytes) Wallace tree Array 18.904 27.850 146 123 164904 166568
Future Developments: Having studied and developed a 8 bit WTM for unsigned multiplication, we are aimed at Improving the code to compute even signed bit multiplication. Reducing the redundant signal declarations in the code by defining them as arrays to reduce the code length. Developing high performance adder using VHDL for last stage reduction, instead of using adder available in the FPGAs.
And implementation of FIR (Finite Impulse response filter) by making use of Wallace tree multiplier and optimising its performance.
45
References
1.C. S. Wallace, A Suggestion for a Fast Multiplier, IEEE Transactions on Electronic Computers, February 1964, EC-13:1417. 2.Vijaya Prakash A. M, Dr. MGR, K. S. Gurumurthy, A Novel VLSI Architecture for Low power FIR Filter, International Journal of Advanced Engineering & Application, January 2011, PP 218 - 224. 3.Gary W. Bewick, Fast multiplication algorithms and implementation, The Department Of Electrical Engineering and The Committee on Graduate studies of STANFORD UNIVERSITY, February 1994. PP 8 - 16. 4.J. Bhasker, AVHDL Primer, Third Edition, Pearson Education, 2007, PP-21 to 50, 88 to 101 5.John F. Wakerly, Digital Design Principle and Practices, fourth edition, Prentice Hall Pearson Education, 2009, PP 235-250, PP 786-795 6.http://en.wikipedia.org/wiki/Wallace_tree, http://en.wikipedia.org/wiki/FPGA. 7.Multiplication in FPGAs The performance FPGA DESIGN specalist www.andraka.com.
46
APPENDIX A
Synthesis Report for 88 wallace tree multiplier:
Release 9.2i - xst J.36 Copyright (c) 1995-2007 Xilinx, Inc. All rights reserved.--> Parameter TMPDIR set to ./xst/projnav.tmp CPU : 0.00 / 0.14 s | Elapsed : 0.00 / 0.00 s
--> Parameter xsthdpdir set to ./xst CPU : 0.00 / 0.14 s | Elapsed : 0.00 / 0.00 s
--> Reading design: multifinal.prj
==================================================================== ===== TIMING REPORT
NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
GENERATED AFTER PLACE-and-ROUTE.
47
Clock Information: -----------------No clock signals found in this design
Asynchronous Control Signals Information: ---------------------------------------No asynchronous control signals found in this design
Timing Summary: --------------Speed Grade: -7
Minimum period: No path found Minimum input arrival time before clock: No path found Maximum output required time after clock: No path found Maximum combinational path delay: 18.904ns
Timing Detail: -------------All values displayed in nanoseconds (ns)
==================================================================== =====
48
Timing constraint: Default path analysis Total number of paths / destination ports: 15844 / 16 ------------------------------------------------------------------------Delay: Source: 18.904ns (Levels of Logic = 20) B<0> (PAD)
Destination: O<15> (PAD) Data Path: B<0> to O<15> Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- -----------IBUF:I->O LUT2:I1->O LUT4:I3->O LUT4:I0->O 13 0.7692.295 B_0_IBUF (B_0_IBUF) 2 0.418 1.035PP201 (PP20) 2 0.418 1.035U2/Cout1 (C20) 2 0.418 1.035V2/Cout1 (N30)
LUT4:I0->O LUT4:I1->O LUT4:I1->O MUXF5:I1->O MUXCY:S->O MUXCY:CI->O MUXCY:CI->O MUXCY:CI->O
2 0.418 1.035W2/Cout1 (P40) 2 0.418 1.035X2/Mxor_S_Result1 (A5) 1 0.418 0.000Madd_C_lut<5>1 (N176) 2 0.360 1.035 Madd_C_lut<5>_f5 (O_5_OBUF) 1 0.461 0.000 Madd_C_cy<5> (Madd_C_cy<5>) 1 0.052 0.000Madd_C_cy<6> (Madd_C_cy<6>) 1 0.052 0.000Madd_C_cy<7> (Madd_C_cy<7>) 1 0.052 0.000Madd_C_cy<8> (Madd_C_cy<8>)
49
MUXCY:CI->O MUXCY:CI->O MUXCY:CI->O MUXCY:CI->O MUXCY:CI->O MUXCY:CI->O XORCY:CI->O OBUF:I->O 1
1 1 1 1 1 0
0.052 0.000Madd_C_cy<9> (Madd_C_cy<9>) Madd_C_cy<10> 0.052 0.000(Madd_C_cy<10>) Madd_C_cy<11> 0.052 0.000(Madd_C_cy<11>) Madd_C_cy<12> 0.052 0.000(Madd_C_cy<12>) Madd_C_cy<13> 0.052 0.000(Madd_C_cy<13>) Madd_C_cy<14> 0.052 0.000(Madd_C_cy<14>) 0.5790.828 Madd_C_xor<15> (O_15_OBUF) O_15_OBUF (O<15>)
4.426
---------------------------------------Total 18.904ns (9.571ns logic, 9.333ns route) (50.6% logic, 49.4% route)
==================================================================== ===== CPU : 6.63 / 6.97 s | Elapsed : 7.00 / 7.00 s --> Total memory usage is 164904 kilobytes Number of errors : 0 ( 0 filtered) Number of warnings : 1 ( 0 filtered) Number of infos : 0 ( 0 filtered)
Process "Synthesize" completed successfully
50
APPENDIX B
Synthesis Report for array multiplier:
Release 9.2i - xst J.36 Copyright (c) 1995-2007 Xilinx, Inc. All rights reserved.--> Parameter TMPDIR set to ./xst/projnav.tmp CPU : 0.00 / 0.45 s | Elapsed : 0.00 / 0.00 s
--> Parameter xsthdpdir set to ./xst
CPU : 0.00 / 0.45 s | Elapsed : 0.00 / 0.00 s
--> Reading design: mularray.prj
==================================================================== ===== TIMING REPORT
NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE. FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT GENERATED AFTER PLACE-and-ROUTE.
51
Clock Information: -----------------No clock signals found in this design
Asynchronous Control Signals Information:
---------------------------------------No asynchronous control signals found in this design
Timing Summary: --------------Speed Grade: -7
Minimum period: No path found Minimum input arrival time before clock: No path found Maximum output required time after clock: No path found Maximum combinational path delay: 27.850ns
Timing Detail: -------------All values displayed in nanoseconds (ns)
==================================================================== =====
52
Timing constraint: Default path analysis
Total number of paths / destination ports: 20382 / 16 ------------------------------------------------------------------------Delay: Source: 27.850ns (Levels of Logic = 16) b<1> (PAD)
Destination: prod<15> (PAD) Data Path: b<1> to prod<15> Gate Net Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- -----------IBUF:I->O LUT2:I1->O LUT4:I2->O LUT4:I0->O LUT4:I0->O LUT4:I0->O LUT4:I2->O LUT4:I3->O LUT4:I0->O LUT3:I0->O LUT3:I0->O LUT3:I0->O 16 0.7692.520 b_1_IBUF (b_1_IBUF) 2 0.418 1.035_and01891 (pp<1><1>) 2 0.418 1.035_or00011 (pc<2><1>) 2 0.418 1.035_or00291 (pc<3><1>) 2 0.418 1.035_or00151 (pc<4><1>) 2 0.418 1.035_or00421 (pc<5><1>) 2 0.418 1.035Mxor_ps<6><1>_xo<1>1 (ps<6><1>) 2 0.418 1.035_or00031 (pc<7><0>) 2 0.418 1.035_and02001 (pc<8><1>) 2 0.418 1.035_or00331 (pc<8><2>) 2 0.418 1.035_or00351 (pc<8><3>) 2 0.418 1.035_or00371 (pc<8><4>)
53
LUT3:I0->O LUT3:I0->O LUT4:I3->O OBUF:I->O
2 0.418 1.035_or00381 (pc<8><5>) 2 0.418 1.035_or00391 (pc<8><6>) 1 0.418 0.828_or00401 (pc<8><7>) 4.426prod_15_OBUF (prod<15>)
---------------------------------------Total 27.850ns (11.047ns logic, 16.803ns route)
(39.7% logic, 60.3% route) ==================================================================== ===== CPU : 9.62 / 10.33 s | Elapsed : 10.00 / 10.00 s --> Total memory usage is 166568 kilobytes Number of errors : 0 (0 filtered) Number of warnings : 12( 0 filtered) Number of infos : 0 (0 filtered)
Process "Synthesize" completed successfully.
54
55
Bapatla Engineering College, Department of E.I.E Convert PDF to HTML

Wal

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Wal

Hochgeladen von

Copyright:

Verfügbare Formate

Search for documents:

Report this document

View the PDF version

Implementation of 88 Wallace tree multiplier using VHDL

Implementation of 88 Wallace tree multiplier using VHDL

Implementation of 88 Wallace tree multiplier using VHDL

2.2 Parallel Multiplication:

1010 0000Partial products 0000 1010 1011010 Result

2.3 Binary Multiplication:

2.4 Hardware Multipliers:

Implementation of 88 Wallace tree multiplier using VHDL

2.5 Array Multipliers:

Fig. 2.2 Array multiplier

2.5.1 Ripple Carry Arra y Multipliers: Features:

Implementation of 88 Wallace tree multiplier using VHDL

2.5.2 Carry Save Array Multipliers Features:

Implementation of 88 Wallace tree multiplier using VHDL

Fig 2.4 Carry Save Adder Multiplier

2.5.3 Wallace Tree Multiplier: Features:

Implementation of 88 Wallace tree multiplier using VHDL

2.6 Computation Sharing Multiplier:

Fig 2.5 Computation Sharing Multiplier

Implementation of 88 Wallace tree multiplier using VHDL

a3b0 a2b0 a1b0 a0b0 a3b1 a2b1 a1b1 a0b1

a3b2a2b2 a1b2 a0b2 a3b3 a2b3a1b3 a0b3

Weight of the column

Fig 3.1. 4 bit multiplication Partial Products

Implementation of 88 Wallace tree multiplier using VHDL

First Level Reduction of 4 bit multiplication partial products is shown below

Implementation of 88 Wallace tree multiplier using VHDL

Second Level Reduction of 4 bit multiplication partial pr oducts is shown below.

Fig 3.3. Second Level of reduction

Implementation of 88 Wallace tree multiplier using VHDL

Third Level Reduction of 4 bit multiplication partial products is shown below

Fig 3.4. Third Level of reduction

Implementation of 88 Wallace tree multiplier using VHDL

Fig: 3.5. Last stage reduction using parallel adder

3.2 Algorithm for 8 bit multiplier:

Step:17 6Generation5 4 3 2 1 of0 partial products

Generation of partial products

Step:2 Reduction stages

Size of Multiplier input 2 3 4 5 6 7 8 9

No. Of Reduction Stages 1 1 3 3 3 4 4 4

Including last stage 2 2 4 4 4 5 5 5

Table 3.1 Size of multiplier Versus No of Reduction stages in WTM algorithm

4.VHDL description of WTM

4.2 VHDL Structure and Behaviour:

Implementation of 88 Wallace tree multiplier using VHDL

Fig 4.1: Half adder component of VHDL modelled WTM

Fig 4.2: Full adder c omponent of VHDL modelled WTM

4.3 VHDL Syntax

Implementation of 88 Wallace tree multiplier using VHDL

4.4 VHDL Operators

The operators are: + addition - subtraction * multiplication

Implementation of 88 Wallace tree multiplier using VHDL

Implementation of 88 Wallace tree multiplier using VHDL

4.5 The Xilinx ise Design Suite:

perform timing analysis, e xamine RTL diagrams, simulate a design's reaction

Fig 4. 3: Xilinx ise-9.2i project navigator window.

4.6 Simulation of W TM using Modelsim 6

Implementation of 88 Wallace tree multiplier using VHDL

Implementation of 88 Wallace tree multiplier using VHDL

Implementation of 88 Wallace tree multiplier using VHDL

4.7 Design Summary: