Floating Point Arithmetic Final

FLOATING POINT ARITHMETIC ON FPGA
Puge | 1
PROJECT REPORT
ON

IMPLEMENTATION OF FLOATING POINT
ARITHMETIC ON FPGA

DIGITAL SYSTEM ARCHITECTURE
WINTER SEMESTER-2010

BY
SUBHASH C (200911005)
A N MANOJ KUMAR (200911030)
PARTH GOSWAMI (200911049)


Puge | 2

ACKNOWLEDGEMENT:

We would like to express our deep gratitude to Dr.Rahul Dubey, who not
only gave us this opportunity to work on this project, but also guided and
encouraged us throughout the course. He and TAs of the course, Neeraj
Chasta and Purushothaman, patiently helped us throughout the project. We
take this as opportunity to thank them and our classmates and friends for
extending their support and worked together in a friendly learning
environment. And last but not the least, we would like to thank non-teaching
lab staff who patiently helped us to understand that all kits were working
properly.

By

Subhash C
A N Manoj Kumar
Parth Goswami


Puge | 3

CONTENTS

1. PROBLEM STATEMENT 4
2. ABSTRACT 4
3. INTRODUCTION 5
3.1. FLOATING POINT FORMAT USED 6
3.2. DETECTION OF SPECIAL INPUTS 6
4. FLOATING POINT ADDER/SUBTRACTOR 8
5. FLOATING POINT MULTIPLIER 9
5.1. ARCHITECTURE FOR FLOATING POINT MULTIPLICATION 10
5.2. DESIGNED 4 * 4 BIT MULTIPLIER. 12
6. VERIFICATION PLAN 14
7. SIMULATION RESULTS & RTL SCHEMATICS 15
8. FLOOR PLAN OF DESIGN & MAPPING REPORT 21
9. POWER ANALYSIS USING XPOWER ANALYZER 25
10. CONCLUSION 26
11. FUTURE SCOPE 26
12. REFERENCES 27
13. APPENDIX 28


Puge | 4

1. PROBLEM STATEMENT:
Implement the arithmetic (addition/subtraction & multiplication) for IEEE-754
single precision floating point numbers on FPGA. Display the resultant value
on LCD screen.

2. ABSTRACT:
Floating point operations are hard to implement on FPGAs because of the
complexity of their algorithms. On the other hand, many scientific problems
require floating point arithmetic with high levels of accuracy in their
calculations. Therefore, we have explored FPGA implementations of addition
and multiplication for IEEE-754 single precision floating-point numbers. For
floating point multiplication, in IEEE single precision format, we have to
multiply two 24 bits. As we know that in Spartan 3E, 18 bit multiplier is
already there. The main idea is to replace the existing 18 bit multiplier with a
dedicated 24 bit multiplier designed with small 4 bit multiplier. For floating
point addition, exponent matching and shifting of 24 bit mantissa and sign
logic are coded in behavioral style. Entire our project is divided into 4 modules.
1. Designing of floating point adder/subtractor.
2. Designing of floating point multiplier.
3. Creation of combined control & data paths.
4. I/O interfacing: Interfacing of LCD for displaying the output and
tacking inputs from block RAM.
Prototypes have been implemented on Xilinx Spartan 3E.


Puge | 5

3. INTRODUCTION:

Image and digital signal processing applications require high floating
point calculations throughput, and nowadays FPGAs are being used for
performing these Digital Signal Processing (DSP) operations. Floating point
operations are hard to implement on FPGAs as their algorithms are quite
complex. In order to combat this performance bottleneck, FPGAs vendors
including Xilinx have introduced FPGAs with nearly 254 18x18 bit dedicated
multipliers. These architectures can cater the need of high speed integer
operations but are not suitable for performing floating point operations
especially multiplication. Floating point multiplication is one of the
performance bottlenecks in high speed and low power image and digital signal
processing applications. Recently, there has been significant work on analysis
of high-performance floating-point arithmetic on FPGAs. But so far no one has
addressed the issue of changing the dedicated 18x18 multipliers in FPGAs by
an alternative implementation for improvement in floating point efficiency. It is
a well known concept that the single precision floating point multiplication
algorithm is divided into three main parts corresponding to the three parts of
the single precision format. In FPGAs, the bottleneck of any single precision
floating-point design is the 24x24 bit integer multiplier required for
multiplication of the mantissas. In order to circumvent the aforesaid problems,
we designed floating point multiplication and addition.

The designed architecture can perform both single precision floating
point addition as well as single precision floating point multiplication with a
single dedicated 24x24 bit multiplier block designed with small 4x4 bit
multipliers. The basic idea is to replace the existing 18x18 multipliers in FPGAs
by dedicated 24x24 bit multiplier blocks which are implemented with dedicated
4x4 bit multipliers. This architecture can also be used for integer multiplication
as well.


Puge | 6
3.1. FLOATING POINT FORMAT USED:

As mentioned above, the IEEE Standard for Binary Floating Point
Arithmetic (ANSI/IEEE Std 754-1985) will be used throughout our work. The
single precision format is shown in Figure 1. Numbers in this format are
composed of the following three fields:

1-bit sign, S: A value of 1 indicates that the number is negative, and a 0
indicates a positive number.

Bias-127 exponent, e = E + bias: This gives us an exponent range from
Emin = -126 to Emax = 127.

Fraction, f/mantissa: The fractional part of the number.

The fractional part must not be confused with the significand, which is 1 plus
the fractional part. The leading 1 in the significand is implicit. When
performing arithmetic with this format, the implicit bit is usually made explicit.
To determine the value of a floating point number in this format we use the
following formula:

Value = (-1)
sign
x 2
(exponent-127)
x 1.f22f21f20.....f1f0

Fig 1. Representation of floating point number

3.2. DETECTION OF SPECIAL INPUTS:
In the ieee-754 single precision floating point numbers support three
special inputs
Signed Infinities
The two infinities, + and - , represent the maximum positive and
negative real numbers, respectively, that can be represented in the floating-
point format. Infinity is always represented by a zero significand (fraction and

Puge | 7
integer bit) and the maximum biased exponent allowed in the specified format
(for example, 25510 for the single-real format).
The signs of infinities are observed, and comparisons are possible.
Infinities are always interpreted in the affine sense; that is, is less than any
finite number and + is greater than any finite number. Arithmetic on infinities
is always exact. Exceptions are generated only when the use of infinity as a
source operand constitutes an invalid operation.
Whereas de-normalized numbers represent an underflow condition, the
two infinity numbers represent the result of an overflow condition. Here, the
normalized result of a computation has a biased exponent greater than the
largest allowable exponent for the selected result format.
NaN's
Since NaNs are non-numbers, they are not part of the real number line.
The encoding space for NaNs in the FPU floating-point formats is shown above
the ends of the real number line. This space includes any value with the
maximum allowable biased exponent and a non-zero fraction. (The sign bit is
ignored for NaNs.)
The IEEE standard defines two classes of NaNs: quiet NaNs (QNaNs) and
signaling NaNs (SNaNs). A QNaN is a NaN with the most significant fraction bit
set; an SNaN is a NaN with the most significant fraction bit clear. QNaNs are
allowed to propagate through most arithmetic operations without signaling an
exception. SNaNs generally signal an invalid-operation exception whenever they
appear as operands in arithmetic operations.
Though zero is not a special input, if one of the operands is zero, then
the result is known without performing any operation, so a zero which is
denoted by zero exponent and zero mantissa. One more reason to detect zeroes
is that it is difficult to find the result as adder may interpret it to decimal value
1 after adding the hidden 1 to mantissa.


Puge | 8
4. FLOATING POINT ADDER/SUBTRACTOR:
Floating-point addition has mainly three parts:

1. Adding hidden 1 and Alignment of the mantissas to make exponents
equal.
2. Addition of aligned mantissas.
3. Normalization and rounding the Result.

The initial mantissa is of 23-bits wide. After adding the hidden 1 ,it is
24-bits wide.

First the exponents are compared by subtracting one from the other and
looking at the sign (MSB which is carry) of the result. To equalize the
exponents, the mantissa part of the number with lesser exponent is shifted
right d-times. where d is the absolute value difference between the exponents.

The sign of larger number is anchored. The xor of sign bits of the two
numbers decide the operation (addition/ subtraction) to be performed.

Now, as the shifting may cause loss of some bits and to prevent this to
some extent, generally the length of mantissas to be added is no longer 24-bits.
In our implementation, the mantissas to be added are 25-bits wide. The two
mantissas are added (subtracted) and the most significant 24-bits of the
absolute value of the result form the normalized mantissa for the final packed
floating point result.

Again xor of anchor-sign bit and the sign of result forms the sign bit for
the final packed floating point result.

The remaining part of result is exponent. Before normalizing the result
Value of exponent is same as the anchored exponent which is the larger of two
exponents. In normalization, the leading zeroes are detected and shifted so that
a leading one comes. Exponent also changes accordingly forming the exponent
for the final packed floating point result.

The whole process is explained clearly in the below figure.


Puge | 9

Fig 2. Architecture for floating point adder/subtractor

5. FLOATING POINT MULTIPLIER:

The single precision floating point algorithm is divided into three main
parts corresponding to the three parts of the single precision format. The first
part of the product which is the sign is determined by an exclusive OR function
of the two input signs. The exponent of the product which is the second part is
Calculated by adding the two input exponents. The third part which is the
significand of the product is determined by multiplying the two input
significands each with a 1 concatenated to it.
Below figure shows the architecture and flowchart of the single precision
floating point multiplier. It can be easily observed from the Figure that 24x24

Puge | 10
bit integer multiplier is the main performance bottleneck for high speed and
low power operations. In FPGAs, the availability of the dedicated 18x18
multipliers instead of dedicated 24x24 bit multiply blocks further complicates
this problem.

5.1. DESIGNED ARCHITECTURE FOR MULTIPLICATION IN FPGAS:

We proposed the idea of a combined floating point multiplier and adder
for FPGAs. In this, it is proposed to replace the existing 18x18 bit multipliers in
FPGAs with dedicated blocks of 24x24 bit integer multipliers designed with 4x4
bit multipliers. In the designed architecture, the dedicated 24x24 bit
multiplication block is fragmented to four parallel 12x12 bit multiplication
module, where AH, AL, BH and BL are each of 12 bits. The 12x12
multiplication modules are implemented using small 4x4 bit multipliers. Thus,
the whole 24x24 bit multiplication operation is divided into 36 4x4 multiply
modules working in parallel. The 12 bit numbers A & B to be multiplied are
divided into 4 bits groups A3,A2,A1 and B3,B2,B1 respectively. The flowchart
and the architecture for the multiplier block are shown below.

fig 3. Flowchart for floating point multiplication


Puge | 11

fig 4. Designed architecture for floating point multiplication

Additional Advantages:

The additional advantage of the proposed CIFM is that floating point
multiplication operation can now be performed easily in FPGA without any
resource and performance bottleneck. In the single precision floating point
multiplication, the mantissas are of 23 bits. Thus, 24x24 bit (23 bit mantissa
+1 hidden bit) multiply operation is required for getting the intermediate
product. With the proposed architecture, the 24x24 bit mantissa multiplication
can now be easily performed by passing it to the dedicated 24x24 bit multiply
block, which will generate the product with its dedicated small 4x4 bit
multipliers.


Puge | 12

5.2. DESIGNED 4X4 BIT MULTIPLIER

As evident from the proposed architecture, a high speed low power
dedicated 4x4 bit multiplier will significantly improve the efficiency of the
designed architecture. Thus, a dedicated 4x4 bit multiplier efficient in terms of
area, speed and power is proposed. Figure 5 shows the architecture of the
proposed multiplier. For (4 X 4) bits, 4 partial products are generated, and are
added in parallel. Each two adjacent partial product are subdivided to 2 bit
blocks, where a 2 bit sum is generated by employing a 2-bit parallel adder
appropriately designed by choosing the combination of half adder-half adder,
Half adder - full adder (forming the blocks 1,2,3,4 working in parallel).

This forms the first level of computation. The partial sums thus
generated are added again in block 5 & 6 (parallel adders), working in parallel
by appropriately choosing the combination of half adders and full adders. This
forms the second level of computation. The partial sums generated in the
second level are utilized in the third level (blocks 7 &8) to arrive at the final
product. Hence, there is a significant reduction in the power consumption
since the whole computation has been hierarchically divided to levels. The
reason for this stems from the fact that power is provided only to the level that
is involved in computation and thereby rendering the remaining two levels
switched off (by employing a control circuitry). Working in parallel significantly
improves the speed of the proposed multiplier.

The proposed architecture is highly optimized in terms of area, speed and
power. The proposed architecture is functionally verified in Verilog HDL and
synthesized in Xilinx FPGA. Designed 4 bit multiplier architecture is shown
below.


Puge | 13

Fig 5. Designed 4 bit optimized multiplier

The simulation results, RTL schematics of the designed architecture,
synthesis report and verilog code are shown below.


Puge | 14
6. VERIFICATION PLAN:

white box testing:

We chose inputs to various sub blocks such that all the logic blocks are
ensured to function properly. All the internal signals are verified as follows.

black box testing:

We gave various random inputs even without knowing what the order of
the inputs means and then analyzed the same inputs to know the expected
output and then verified using simulation as shown below.

RANDOM TEST INPUTS ANALYZED EQUIVALENT DECIMALS
BFAE6666 -1.3625
44231762 652.3654
C479C800 -999.125
C2510831 -52.258
00000000 ZERO
7FC00000 NOT A NUMBER
7F800000 +VE INFINITY
FF800000 -VE INFINITY

INDEX CONTROL OUTPUT
4H0 O C45E3641
1 4422C02E
4H1 O C91F2128
1 C3AD613C
4H2 O 474BF446
1 C4836C41
4H3 O BE3B4AEB
1 C251049C


Puge | 15
7. RTL SCHEMATICS AND SIMULATION RESULTS:

RTL SCHEMATICS AND CORRESPONDING TEST SIGNALS OF THE
ARCHITECTURE:

RTL SCHEMATIC FOR TOP MODULE:

LQGex

cQWU
clk
ouW
TOP
MODULE

Puge | 16
BLACKBOX TEST-TOP ARITHMETIC MODULE:

WHITE BOX TESTING:
DATA PATH & CONTROL:

BLOCK DIAGRAM FOR DATA PATH & CONTROL

Puge | 17
RTL SCHEMATIC FOR DATAPATH & CONTROLLER:

TEST FOR DATAPATH & CONTROLLER:


Puge | 18
VARIOUS SIGNALS DESCRIPTION (ADDER/SUBTRACTOR MODULE):
1. A,B: input 32-bit single precision numbers
2. C:output 32-bit single precision number
3. s1,s2,s3, e1,e2,e3 &f1,f2,f3: sign ,exponent and fraction parts of inputs
4. new_f1,new_f2:aligned mantissas
5. de: difference between exponents
6. fr:25-bit result of addition of mantissas
7. fr_us: unsigned 25-bit result
8. f_fr:normalized 24-bit fraction result
9. er,sr:exponent and sign of result

RTL SCHEMATIC FOR ADDER MODULE:

-

Puge | 19
TEST FOR ADDER MODULE:

VARIOUS SIGNALS DESCRIPTION (MULTIPLIER MODULE):
1. IN1,IN2: input 32-bit single precision numbers
2. OUT: output 32-bit single precision number
3. SA,SB,EA,EB,MA,MB: sign ,exponent and mantissa parts of inputs
4. PFPM: 48 bit multiplication result
5. SPFPM: shifted result of multiplication
6. EFPM: exponent result (output of exponent addition module)
7. PFP: 48 bit fraction multiplication result
8. SFP: 1 bit sing of final result
9. EFP: 8 bit exponent of final result
10. MFP: 23 bit mantissa of final result


Puge | 20
RTL SCHEMATIC FOR MULTIPLIER MODULE:

TEST FOR MULTIPLICATION MODULE:


Puge | 21
8. FLOOR PLAN OF DESIGN AND MAPPING REPORT:
FLOOR PLAN WITHOUT PIN CONNECTIONS:


Puge | 22
FLOOR PLAN WITH PIN CONNECTIONS:


Puge | 23
SYNTHESIS REPORT:

======================================================================
* Final Report *
======================================================================
Final Results
RTL Top Level Output File Name : TOPMODULE.ngr
Top Level Output File Name : TOPMODULE
Output Format : NGC
Optimization Goal : Speed
Keep Hierarchy : NO

Design Statistics
# IOs : 38

Cell Usage:
# BELS : 3944
# GND : 3
# INV : 11
# LUT1 : 48
# LUT2 : 417
# LUT3 : 628
# LUT4 : 1777
# MULT_AND : 112
# MUXCY : 375
# MUXF5 : 207
# MUXF6 : 2
# MUXF7 : 1
# VCC : 3
# XORCY : 360
# RAMS : 2
# RAMB16_S36 : 2
# Clock Buffers : 1
# BUFGP : 1
# IO Buffers : 37
# IBUF : 5
# OBUF : 32
======================================================================

Device utilization summary:

Selected Device: 3s500efg320-5
Number of Slices: 1616 out of 4656 34%
Number of 4 input LUTs: 2881 out of 9312 30%
Number of IOs: 38
Number of bonded IOBs: 38 out of 232 16%
Number of BRAMs: 2 out of 20 10%
Number of GCLKs: 1 out of 24 4%
=====================================================================

Puge | 24

TIMING REPORT
======================================================================
Clock Information:
------------------
No clock signals found in this design

Asynchronous Control Signals Information:
----------------------------------------
No asynchronous control signals found in this design

Timing Summary:
---------------
Speed Grade: -5

Minimum period: No path found
Minimum input arrival time before clock: No path found
Maximum output required time after clock: No path found
Maximum combinational path delay: 12.253ns

Timing Detail:
--------------
All values displayed in nanoseconds (ns)

Timing constraint: Default path analysis
Total number of paths / destination ports: 96 / 32
-------------------------------------------------------------------------
Delay: 12.253ns (Levels of Logic = 9)
Source: cntr (PAD)
Destination: out<16> (PAD)

Data Path: cntr to out<16>
Gate Net
Cell: in->out fan-out Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
IBUF: I->O 59 1.106 1.232 cntr_IBUF (cntr_IBUF)
LUT4:I0->O 8 0.612 0.795 FPARITH/OUT<29>41 (FPARITH/N372)
LUT4:I0->O 1 0.612 0.000 FPARITH/OUT<16>211(FPARITH/OUT<16>211)
MUXF5:I1->O 1 0.278 0.509 FPARITH/OUT<16>21_f5(FPARITH/OUT<16>21)
LUT4:I0->O 1 0.612 0.387 FPARITH/OUT<16>40_SW0 (N666)
LUT4:I3->O 1 0.612 0.387 FPARITH/OUT<16>51_SW0 (N668)
OBUF:I->O 3.169 out_16_OBUF (out<16>)
------------------------------------------------------------------------------------------------------
Total 12.253ns (8.225ns logic, 4.028ns route) (67.1% logic, 32.9% route)

======================================================================

Puge | 25

9. POWER ANALYSIS USING XPOWER ANALYZER:

TEMPERATURE ANALYSIS:


Puge | 26
10. CONCLUSION:

We have successfully implemented arithmetic (adder/subtract &
multiplication) for IEEE single precision floating point numbers on FPGA, and
displayed the corresponding output values on LCD as well.

11. FUTURE SCOPE:
As we have used a MUX to select the outputs of two computational
blocks, both adder and multiplier are active though only one of them is needed
to be active at a time. This consumes lot of dynamic power which can be
reduced by disabling one of them when not required
One more addition that can be made to design is that we can skip an
entire 4*4 adder block in multiplier when we have zeroes to add with (this
scenario is very much likely to occur in floating point operations) .


Puge | 27

12. REFERENCES:

1. www.xilinx.com
2. Himanshu Thapliyal, Hamid R. Arabnia, A.P Vinod , combined integer and
floating point multiplication in FPGAs
3. Computer arithmetic: Algorithms and hardware design by Behrooz Parhami
4. Computer arithmetic algorithm by Isreal Koren
5. www.randelshofer.ch/fhw/gri/lcd-init for some part of code in lcd
interfacing.
6. http://babbage.cs.qc.cuny.edu/IEEE-754/Decimal.html for java applets
regarding floating point conversions
7. HITACHI HD44780_LCD data sheet.


Puge | 28

APPENDIX
VERILOG CODE FOR FLOATING POINT ARITHMETIC
// MAIN MODULE FOR FLOATING POINT ARITHEMETIC //
(ADDITION/SUBTRACTION & MULTIPLICATION)
// IF CNTR == 1: ADDITION/SUBTRACTION
// IF CNTR == 0: MULTIPLICATION
///////////////////////////////////////////////////////////

module FLOATINGPOINTARITHEMATIC(input [31:0] IN1, input [31:0] IN2,
input cntr, input [1:0] ZeroAdd, output [31:0] OUT);
wire [31:0] FPADD,FPMUL,OUT1;
wire [31:0] ADDZA = ZeroAdd[1] ? IN2: IN1;
assign OUT = ^ZeroAdd ? ADDZA : OUT1;

// INSTANTIATON OF FLOATINGPOINT ADDER MODULE
fpadder adder(.A(IN1),.B(IN2),.C(FPADD));
// INSTATIATION OF FLOATINGPOINT MULTIPLICATION MODULE
FLOATINGMULTIPLICATION multiplication(.IN1(IN1),.IN2(IN2),.OUT(FPMUL));
// ASSIGNING THE REQUIRED VALUE TO OUTPUT VARIABLE DEPENDING
ON THE CONTROL(CNTR) VALUE
//assign OUT = cntr ? FPADD: FPMUL;
assign OUT1 = cntr ? FPADD: FPMUL;

endmodule

// MODULE FOR FLOATING POINT MULTIPLICATION
module FLOATINGMULTIPLICATION(input [31:0] IN1, input [31:0] IN2, output
[31:0] OUT);
// UNPACKING THE INPUT BITS AND ASSIGNING TO SOME OTHER
TEMPORARY VARIABLES
wire SA = IN1 [31];
wire SB = IN2 [31];
wire [7:0] EA = IN1 [30:23];
wire [7:0] EB = IN2 [30:23];
wire [23:0] MA = { 1'b1,IN1 [22:0] };
wire [23:0] MB = { 1'b1,IN2 [22:0] };
// DECLARATION OF WIRES
wire SFP;
wire [7:0] EFPM,EFP;
wire [47:0] PFPM, PFP;

Puge | 29
// GENERATION OF SIGN BIT USING XOR GATE
xor (SFP,SA,SB);
// INSTANTIATION OF EXPONENT ADDITION MODULE TO ADD
EXPONENTS
EXPONENTADDITION FPEXP(.A(EA),.B(EB),.E(EFPM));
// INSTANTIATIONG 24 BIT MULTIPLIER MODULE TO MULTIPLY FRACTION
PART
MULTIPLIER24BIT FPMUL(.A(MA),.B(MB),.P(PFPM));
// SHIFTING STATEMENTS IF NECCESSARY
wire [1:0] X = PFPM [47:46];
wire S = X[1] ? 1 : 0;
wire [47:0] SPFPM = PFPM >> 1;
assign PFP = S ? SPFPM : PFPM;
assign EFP = S ? EFPM+1 : EFPM;
// OUTPUT OF 24 BIT MULTIPLIER WILL GIVE 48 BIT RESULT
// SO WE ARE TRUNCATING THE LEAST 24 BITS (THE FINAL RESULT IS
APPROXMATION)
wire [22:0] MFP = PFP [45:23];
// PACKING THE RESULTS FOR GENERATING SINGLE PRECITION 32 BIT
OUTPUT
assign OUT = { SFP,EFP,MFP };
endmodule

// MODULE FOR ADDITOIN OF EXPONENTS
// NOTE THAT THE EXPONENTS ARE IN BIAS FORMAT
// SO WE NEED TO ADD THE BIASED EXPONENTS AND SUBTRACT 127 TO
GET PROPER BIAS
module EXPONENTADDITION(input [7:0] A, input [7:0] B, output [7:0] E);
// DECLARATION OF WIRES AND ASSIGNING VALUES AT A TIME
// ADDING THE BIASED EXPONENTS AND SUBTRACTING 127 USING 2'S
COMPLEMENT METHOD
wire [8:0] X = A + B;
parameter Y = 10'b1110000001;
wire [10:0] Z = X + Y;
// FINAL RESULT OF EXPONENT (IN THE BIAS FORMAT)
assign E = Z [7:0];
endmodule

// MODULE FOR 24 BIT MULTIPLIER USING 12 BIT MULTIPLIER
module MULTIPLIER24BIT(input [23:0] A, input [23:0] B, output [47:0] P);
// EACH 24 BIT INPUT IS DIVIDED INTO TWO 12 BIT VALUES
wire [11:0] AL = A [11:0];
wire [11:0] AH = A [23:12];
wire [11:0] BL = B [11:0];

Puge | 30
wire [11:0] BH = B [23:12];
wire [23:0] ALBL,ALBH,AHBL,AHBH;
wire [35:0] PL,PH;
// INSTANTIATION OF 12 BIT MULTIPLIERS BY PORT NAMES
MULTIPLIER12BIT mul1(.A(AL),.B(BL),.P(ALBL));
MULTIPLIER12BIT mul2(.A(AL),.B(BH),.P(ALBH));
MULTIPLIER12BIT mul3(.A(AH),.B(BL),.P(AHBL));
MULTIPLIER12BIT mul4(.A(AH),.B(BH),.P(AHBH));
// INSTANTIATION OF ADDER BLOCKS BY PORT NAMES
ADDER24IN36OUT adder1(.X(ALBL),.Y(AHBL),.W(PL));
ADDER24IN36OUT adder2(.X(ALBH),.Y(AHBH),.W(PH));
ADDER36IN48OUT adder3(.X(PL),.Y(PH),.W(P));
endmodule

// MODULE FOR ADDER INPUT BIT LENGTH IS 24 & OUTPUT BIT LENGTH
IS 36
module ADDER24IN36OUT(input [23:0] X, input [23:0] Y, output [35:0] W);
wire [35:0] XM = { 12'b0,X };
wire [35:0] YM = { Y,12'b0 };
assign W = XM + YM;
endmodule

IS 48
module ADDER36IN48OUT(input [35:0] X, input [35:0] Y, output [47:0] W);
wire [47:0] XM = { 12'b0,X };
wire [47:0] YM = { Y,12'b0 };
assign W = XM + YM;
endmodule

// MODULE FOR 12 BIT MULTIPLIER USING 4 BIT MULTIPLIERS
// EACH 12 BIT INPUT IS DIVIDED INTO THREE 4 BIT VALUES
wire [3:0] A1 = A [3:0];
wire [3:0] A2 = A [7:4];
wire [3:0] A3 = A [11:8];
wire [3:0] B1 = B [3:0];
wire [3:0] B2 = B [7:4];
wire [3:0] B3 = B [11:8];
// DECLARATION OF WIRES
wire [7:0] A1B1,A1B2,A1B3;
wire [7:0] A2B1,A2B2,A2B3;
wire [7:0] A3B1,A3B2,A3B3;
wire [15:0] PL,PM,PH;

Puge | 31
// INSTANTIATION OF 4 BIT MULTIPLIERS BY PORT NAMES
MULTIPLIER4BIT mul1 (.A(A1),.B(B1),.P(A1B1));
// INSTANTIATION OF ADDER BLOCKS BY PORT NAMES
ADDER8IN16OUT adder1 (.X(A1B1),.Y(A2B1),.Z(A3B1),.W(PL));
ADDER8IN16OUT adder2 (.X(A1B2),.Y(A2B2),.Z(A3B2),.W(PM));
ADDER8IN16OUT adder3 (.X(A1B3),.Y(A2B3),.Z(A3B3),.W(PH));
ADDER16IN24OUT adder4 (.X(PL),.Y(PM),.Z(PH),.W(P));
endmodule

// MODULE FOR ADDER INPUT BIT LENGTH IS 8 & OUTPUT BIT LENGTH IS
16
module ADDER8IN16OUT(input [7:0] X, input [7:0] Y, input [7:0] Z, output
[15:0] W);
wire [15:0] XM = { 8'b0,X };
wire [15:0] YM = { 4'b0,Y,4'b0 };
wire [15:0] ZM = { Z,8'b0 };
assign W = XM + YM + ZM;
endmodule

IS 24
module ADDER16IN24OUT(input [15:0] X, input [15:0] Y, input [15:0] Z,
output [23:0] W);
wire [23:0] XM = { 8'b0,X };
wire [23:0] YM = { 4'b0,Y,4'b0 };
wire [23:0] ZM = { Z,8'b0 };
assign W = XM + YM + ZM;
endmodule

// MODULE FOR SIMPLE 4 BIT MULTIPLIER
// DECLARING THE WIRES
wire pp00,pp01,pp02,pp03;
wire hc1,hs2,hc2,hs3,hc3,hs4,hc4,hs5,hc5,hs6,hc6,hc8,hc9;

Puge | 32
wire fs1,fc1,fs2,fc2,fs3,fc3,fs4,fc4,fs5,fc5,fc6,fc7;
// INSTANTIATION OF PARTIAL PRODUCTS BY ORDER
PARTIALPRODUCTS pp
(A,B,pp00,pp01,pp02,pp03,pp10,pp11,pp12,pp13,pp20,pp21,pp22,pp23,pp30,
pp31,pp32,pp33);
assign P[0] = pp00;
// INSTANTIATION OF HALF ADDERS & FULL ADDERS BY PORT NAMES
// LEVEL 1
HA ha1 (.a(pp01),.b(pp10),.s(P[1]),.c(hc1));
FA fa1 (.a(pp11),.b(pp20),.cin(hc1),.s(fs1),.cout(fc1));
HA ha2 (.a(pp21),.b(pp30),.s(hs2),.c(hc2));
HA ha3 (.a(pp31),.b(hc2),.s(hs3),.c(hc3));
FA fa2 (.a(pp13),.b(pp22),.cin(hc4),.s(fs2),.cout(fc2));
HA ha6 (.a(pp33),.b(hc5),.s(hs6),.c(hc6));
// LEVEL 2
HA ha7 (.a(pp02),.b(fs1),.s(P[2]),.c(hc7));
FA fa3 (.a(fc1),.b(hs2),.cin(hc7),.s(fs3),.cout(fc3));
FA fa4 (.a(hs3),.b(fs2),.cin(fc3),.s(fs4),.cout(fc4));
FA fa5 (.a(hc3),.b(fc2),.cin(hs5),.s(fs5),.cout(fc5));
// LEVEL 3
HA ha8 (.a(fs3),.b(hs4),.s(P[3]),.c(hc8));
HA ha9 (.a(fs4),.b(hc8),.s(P[4]),.c(hc9));
FA fa6 (.a(hc9),.b(fc4),.cin(fs5),.s(P[5]),.cout(fc6));
FA fa7 (.a(fc6),.b(fc5),.cin(hs6),.s(P[6]),.cout(fc7));
HA ha10 (.a(hc6),.b(fc7),.s(P[7]));
endmodule

// MODULE FOR GENERATION OF PARTIAL PRODUCTS USING AND GATES
module PARTIALPRODUCTS(input [3:0] x, input [3:0] y, output
pp00,pp01,pp02,pp03,
output pp10,pp11,pp12,pp13,pp20,pp21,pp22,pp23,pp30,pp31,pp32,pp33);
and (pp00,x[0],y[0]),(pp01,x[0],y[1]),(pp02,x[0],y[2]),(pp03,x[0],y[3]),
(pp10,x[1],y[0]),(pp11,x[1],y[1]),(pp12,x[1],y[2]),(pp13,x[1],y[3]),
(pp20,x[2],y[0]),(pp21,x[2],y[1]),(pp22,x[2],y[2]),(pp23,x[2],y[3]),
(pp30,x[3],y[0]),(pp31,x[3],y[1]),(pp32,x[3],y[2]),(pp33,x[3],y[3]);
endmodule

// MODULE FOR HALF ADDER
module HA(input a,b, output s,c);
xor (s,a,b);
and (c,a,b);
endmodule


Puge | 33
// MODULE FOR FULL ADDER
module FA(input a,b,cin, output s,cout);
// DECLARING WIRES
wire s1,c1,c2;
// INSTANTIATION OF HALF ADDER MODULE BY PORT NAME
HA ha1(.a(a),.b(b),.c(c1),.s(s1));
HA ha2(.a(s1),.b(cin),.c(c2),.s(s));
or (cout,c1,c2);
endmodule

// MODULE FOR FLOATINGPOINT ADDER
module fpadder(input [31:0] A,B,output reg [31:0]C
);

reg [24:0] fr,fr_us;
reg [8:0] de;
reg [23:0] new_f1,new_f2,f_fr;
reg f_sel,s,sr;
reg [7:0] er;
integer I;
wire [7:0] e1,e2;
wire [23:0] f1,f2;
wire s1,s2;
assign e1=A[30:23];
assign e2=B[30:23];
assign f1[23]=1'b1;
assign f2[23]=1'b1;
assign f1[22:0]=A[22:0];
assign f2[22:0]=B[22:0];
assign s1=A[31];
assign s2=B[31];
always@(*)
begin
de=e1-e2;
s=s1^s2;
f_sel=1'b1;
if(de[8]==1'b1)
begin
de=~de+9'b1;
f_sel=1'b0;
end
new_f1=f_sel ? f1 :f2;
new_f2=f_sel ? f2 :f1;
er=f_sel?e1+8'b1 :e2+8'b1;
new_f2=new_f2>>de;

Puge | 34
fr=s? new_f1-new_f2 : new_f1+new_f2;
sr=f_sel?s1^(fr[24]&s):s2^(fr[24]&s);
fr_us=(fr[24] & s)? ~fr+25'b1:fr;
f_fr=fr_us[24:1];
I=f_fr[23];
repeat(24)
begin
if(f_fr[23]==1'b0)
begin
f_fr=f_fr<<1'b1;
er=er-8'b1;
end
end
C={sr,er,f_fr[22:0]};

end

endmodule

VERILOG CODE FOR CONVERSION FROM BINARY TO SPECIFIC FORMAT
TO DISPLAY IN LCD

module CONVERSION(input [31:0] A, output [63:0] COUT);
wire [3:0] A8 = A [31:28];
wire [3:0] A7 = A [27:24];
wire [3:0] A6 = A [23:20];
wire [3:0] A5 = A [19:16];
wire [3:0] A4 = A [15:12];
wire [3:0] A3 = A [11:8];
wire [3:0] A2 = A [7:4];
wire [3:0] A1 = A [3:0];

wire [7:0] COUT1,COUT2,COUT3,COUT4,COUT5,COUT6,COUT7,COUT8;

SUBBLOCK block1(.IN(A1),.OUT(COUT1));


Puge | 35
assign COUT = {
COUT8,COUT7,COUT6,COUT5,COUT4,COUT3,COUT2,COUT1 };

endmodule

module SUBBLOCK(input [3:0] IN, output reg [7:0] OUT);
always @ (IN)
begin
case(IN)
4'b0000: OUT <= 8'b00110000;
4'b0001: OUT <= 8'b00110001;
4'b0010: OUT <= 8'b00110010;
4'b0011: OUT <= 8'b00110011;
4'b0100: OUT <= 8'b00110100;
4'b0101: OUT <= 8'b00110101;
4'b0110: OUT <= 8'b00110110;
4'b0111: OUT <= 8'b00110111;
4'b1000: OUT <= 8'b00111000;
4'b1001: OUT <= 8'b00111001;
4'b1010: OUT <= 8'b01000001;
4'b1011: OUT <= 8'b01000010;
4'b1100: OUT <= 8'b01000011;
4'b1101: OUT <= 8'b01000100;
4'b1110: OUT <= 8'b01000101;
4'b1111: OUT <= 8'b01000110;
default: $display("Sir, please enter the correct value...");
endcase
end
endmodule

FINAL VERILOG CODE FOR FLOATING POINT ARITHMETIC

module FPARITHMETICFINAL(input [31:0] IN1, input [31:0] IN2,input CNTR,
output [63:0] OUT);

wire [31:0] FPOUT;

wire Z1 = ~(|IN1);
wire Z2 = ~(|IN2);

wire I1 = ~(|IN1[22:0]) && &IN1[30:23];
wire I2 = ~(|IN2[22:0]) && &IN2[30:23];


Puge | 36
wire SI1 = IN1[31] && I1;
wire SI2 = IN2[31] && I2;

wire NAN1 = &IN1[30:22];
wire NAN2 = &IN2[30:22];

wire SP = Z1 || Z2 || I1 || I2 || NAN1 || NAN2;

wire [63:0] OUT1;

wire [1:0] ZA = CNTR ? {Z1,Z2} : 2'b00 ;

assign OUT = (NAN1 || NAN2) ?
64'b00100000001000000100111001000001010011100010000000100000001
00000: ((I1 || I2) ?
64'b01001110010011110010000011110011001000000111000001101100011
10011: (((Z1 || Z2) && ~CNTR) ?
64'b00100000001000000101101001000101010100100100111100100000001
00000: OUT1));

/*always @ (IN1 or IN2)
begin
if(NAN1 || NAN2)
SPOUT <=
64'b00100000001000000100111001000001010011100010000000100000001
00000;
else if(I1 || I2)
SPOUT <=
64'b01001110010011110010000011110011001000000111000001101100011
10011;
else if((Z1 || Z2) && ~CNTR)
SPOUT <=
64'b00110000001100000011000000110000001100000011000000110000001
10000;
end*/

//assign OUT = SP ? SPOUT: OUT1;

FLOATINGPOINTARITHEMATIC
FPARITH(.IN1(IN1),.IN2(IN2),.ZeroAdd(ZA),.cntr(CNTR),.OUT(FPOUT));
CONVERSION CONV(.A(FPOUT),.COUT(OUT1));

endmodule

Puge | 37

LCD INTERFACING CODE

module top (clk,in,cntr,lcd_rs, lcd_rw, lcd_e, lcd_4, lcd_5, lcd_6, lcd_7);
parameter n = 27;
parameter k = 17;
(* LOC="C9" *) input clk; // synthesis attribute PERIOD clk "100.0 MHz"
(*LOC="n17"*)input cntr;
input [2:0] in;
reg [n-1:0] count=0;
reg lcd_busy=1; // Lumex LCM-S01602DTR/B
reg lcd_stb;
reg [5:0] lcd_code;
reg [6:0] lcd_stuff;
(* LOC="l18" *) output reg lcd_rs;
(* LOC="l17" *) output reg lcd_rw;
(* LOC="m15" *) output reg lcd_7;
(* LOC="p17" *) output reg lcd_6;
(* LOC="r16" *) output reg lcd_5;
(* LOC="r15" *) output reg lcd_4;
(* LOC="m18" *) output reg lcd_e;
wire [63:0] dout1;
TOPMODULE T1(clk,cntr,in,dout1);

always @ (posedge clk) begin
count <= count + 1;
case (count[k+7:k+2])
0: lcd_code <= 6'b000010; // function set
1: lcd_code <= 6'b000010;
2: lcd_code <= 6'b001100;
3: lcd_code <= 6'b000000; // display on/off control
4: lcd_code <= 6'b001100;
5: lcd_code <= 6'b000000; // display clear
6: lcd_code <= 6'b000001;
7: lcd_code <= 6'b000000; // entry mode set
8: lcd_code <= 6'b000110;
9: lcd_code <= 6'h22;
10: lcd_code <= 6'h20;
11: lcd_code <= {2'b10,dout1[63:60]};
12: lcd_code <= {2'b10,dout1[59:56]};
13: lcd_code <= {2'b10,dout1[55:52]};
14: lcd_code <= {2'b10,dout1[51:48]};
15: lcd_code <= {2'b10,dout1[47:44]};
16: lcd_code <= {2'b10,dout1[43:40]};

Puge | 38
17: lcd_code <= {2'b10,dout1[39:36]};
18: lcd_code <= {2'b10,dout1[35:32]};
19: lcd_code <= {2'b10,dout1[31:28]};
20: lcd_code <= {2'b10,dout1[27:24]};
21: lcd_code <= {2'b10,dout1[23:20]};
22: lcd_code <= {2'b10,dout1[19:16]};
23: lcd_code <= {2'b10,dout1[15:12]};
24: lcd_code <= {2'b10,dout1[11:8]};
25: lcd_code <= {2'b10,dout1[7:4]};
26: lcd_code <= {2'b10,dout1[3:0]};
27: lcd_code <= 6'h22;
28: lcd_code <= 6'h20;
29: lcd_code <= 6'h22;
30: lcd_code <= 6'h20;
31: lcd_code <= 6'h22;
32: lcd_code <= 6'h20;
default: lcd_code <= 6'b010000;
endcase
if (lcd_rw)
lcd_busy <= 0;
lcd_stb <= ^count[k+1:k+0] & ~lcd_rw & lcd_busy; // clkrate / 2^(k+2)
lcd_stuff <= {lcd_stb,lcd_code};
{lcd_e,lcd_rs,lcd_rw,lcd_7,lcd_6,lcd_5,lcd_4} <= lcd_stuff;
end
endmodule

Floating Point Arithmetic Final

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Floating Point Arithmetic Final

Hochgeladen von

Copyright:

Verfügbare Formate

FLOATING POINT ARITHMETIC ON FPGA

Das könnte Ihnen auch gefallen