Sie sind auf Seite 1von 10

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

FPGA Implementation of Low Power Pipelined CORDIC


Processor
Ms. Preeya D. Ambulkar#1, Prof. A. B. Kharate#2
PG student, Asso. Professor, HVPMs college of Engineering and Technology, Amravati (M.S)
priyaambulkar9@rediffmail.com

ABSTRACT
Today most of the DSP applications are supported real time transmission process. Digital
illustrations of transmission information are often handled within the same method as text; but
the process rate has got to be abundant quicker. On account of this real time outturn constraint,
standard processors aren't appropriate for contemporary day DSP systems. Some hardware
economical algorithms are, so needed for these high speed applications. These algorithms ought
to be enforced associate degree optimized in hardware thus on modify them to handle real time
information whereas maintaining an optimum trade-off between completely different
performance parameters (speed and power). CORDIC is one such algorithmic program. CORDIC
(Coordinate Rotation Digital Computer) may be a hardware economical shift-and-add algorithmic
program which will be used to calculate varied arithmetic functions. The algorithmic program
incorporates a very easy operation requiring solely shift and add operations. So, this project aims
to implement a CORDIC processor with each rotation mode and vectoring mode on FPGA Spartan6. This project focuses on reducing low power in bit-parallel unrolled CORDIC structures by
modeling the switching activity and the charging/discharging capacitance among the critical
path.
Index Terms: VLSI, FPGA Spartan-6, DSP, DCT, LUT (Look up table), Arithmetic Circuits.

I. INTRODUCTION
For a long time the field of Digital Signal processing has been dominated by Microprocessors.
This can be primarily as a result of the supply designers with the benefits of single cycle
multiply-accumulate instruction furthermore as special addressing modes. Though these
processors are low cost and versatile they are comparatively slow once it involves activity sure
difficult signal process tasks e.g. image compression, digital communication and Video process.
Of late, fast advancements are created within the field of VLSI and IC style. As a result special
purpose processors with custom-architectures have come back up. A higher speed is achieved
by these custom-made hardware solutions at competitive prices. To feature to the current,
numerous easy and hardware-efficient algorithms exist that map well onto these chips and may
be used to enhance speed and flexibility whereas activity the specified signal processing tasks.
One such easy and hardware-efficient algorithmic program is CORDIC, associate degree form for
Coordinate Rotation digital computer, projected by Jack E Volder in1959. It absolutely was
developed to exchange the analog resolver within the B-58 bombers navigation computer.
CORDIC uses solely Shift-and-Add arithmetic with table Look-Up to implement totally different
functions. By creating slight changes to the initial conditions and therefore the LUT values, it will
be wont to expeditiously implement trigonometric, Hyperbolic, Exponential functions,
16 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

Coordinate Transformations etc. victimization constant hardware. Since it uses solely shift-add
arithmetic, VLSI implementation of such associate degree algorithmic program is definitely
realizable. DCT algorithmic program has numerous applications and is wide used for
compression. Implementing DCT victimization CORDIC algorithmic program reduces the amount
of computations throughout process, will increase the accuracy of reconstruction of the image,
and reduces the chip space of implementation of a processor designed for this purpose. This
reduces the overall power consumption. FPGA provides the hardware environment in which
dedicated processors can be tested for their functionality. They perform various high-speed
operations that cannot be realized by a simple microprocessor. The primary advantage that
FPGA offers is On-site 2 programmability. Thus, it forms the ideal platform to implement and
test the functionality of a dedicated processor designed CORDIC algorithm.
The CORDIC is terribly easy and unvarying convergence algorithmic program that reduces
advanced multiplication, greatly simplifying overall hardware quality. This is a pretty choice to
system designers as they still face the challenges of equalization aggressive value and power
targets with the enlarged performance needed in next generation signal processing solutions.
The fundamental principle underlying the CORDIC- primarily based computation, and gift its
unvarying formula for various operating modes and planar coordinate system. CORDIC
algorithmic program has 2 kinds of computing modes rotation and vectoring. The table is
summaries the 2 techniques below.
Table 1. The comparison of rotation mode and vectoring mode

Thus, this project aims to implement CORDIC processor adopting low power pipelined schemes with each
parallel and pipelined on FPGA. CORDIC processor could be a generalized and unified type that is
appropriate to perform rotations in circular, hyperbolic and linear coordinate systems. The unified
formulation includes a replacement variable m that is assigned totally different values for various
coordinate systems. Hardware demand and value of CORDIC processor is a smaller amount as solely shift
registers, adders and look-up table (ROM) are needed. thus number of gates needed in hardware
implementation, like on an FPGA, is minimum as hardware quality is greatly reduced compared to
17 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

different processors like DSP multipliers. This makes it comparatively easy in design. These demands for
low power realization of circuits used in these DSP systems. This project aims to integrate the advantages
of pipelined techniques of low power on CORDIC processor.

II. RELATED WORK


The need of the CORDIC processors is already figured out in the introduction section. It is also cleared
that it is really going to be getting better and better in the future electronics. At present, it is finding its
great use in embedded processors and sure going to capture the general purpose processors market very
soon. Many pros and cons of the CORDIC processors have been discussed by the system designers. The
following section will point out the evolution of the proposed projects work a research till date: In this
paper [6] by Jack E. Volder, the CORDIC computing technique is especially suitable for use in a

special-purpose computer where the majority of the computations involve trigonometric


relationships. In this paper [5] attempts to survey exiting CORDIC and CORDIC like algorithm
with an eye towards implementation in FPGA. And shows that, is available for use in FPGA
based in computing machine, which are the likely basis for the next generation DSP system. In
this article, the completion of 50 years of the invention of CORDIC (COordinate Rotation DIgital
Computer) by Jack E. Volder [6] [4], we present a brief overview of the key developments in the CORDIC
algorithms and architectures along with their potential and upcoming applications. This paper [3]
attempts to explore the different implementations of CORDIC architectures, specific to FPGA devices. The
algorithm is implemented in two different styles: folded and unfolded. Unfolded design is improved
architecturally by pipelining it. Comparisons are then made between these architectures based on area,
speed, throughput and power parameters and logical conclusions are drawn. All three designs have been
coded in VHDL and implemented using Xilinx FPGA synthesis tool. To check the functionality of the
algorithm each of the designs has been simulated for sine and cosine function evaluations. In this paper
[1][2] is the inspiration for this project. It proposes promising solution for high performance high speed
and dynamic power dissipation. The focuses on reducing the power dissipation in bit-parallel
unfolded CORDIC structures by modeling the switching activity and the charging/discharging
capacitances within the critical path. CORDIC is implemented to rotate the given coordinates (X, Y) [2]
with the given angle and also sine and cosine of given angle is found and waveforms are generated
using Xilinx ISE tool. Examples are Discrete Cosine Transformation (DCT) is used for Image Compression
and Video Compression. DCT also improves speed, as compared to other standard Image compression
algorithms like JPEG. CORDIC can be used in communication for efficient generation of amplitude
modulation, Frequency modulation, phase modulation, ASK, FSK, PSK, orthogonal frequency division
multiplexing.
As a result, combining the parallel and pipelined methods of CORDIC processors will lead to low power
consumption with low latency of angle recording.

III. CORDIC ALGORITHM


To implement the project, we need to design the key modules like rotation mode and vectoring mode, to
rotate the given coordinates with the required angle. The below figure gives a simple idea about the
CORDIC algorithm program. Only shifters, registers and adder subtractor are used for the calculations.
Adder/ subtractor are used for the binary addition and subtraction. Shift registers perform the single bit
shifting according to the algorithm. And LUTs (look up tables) are used to set the value of the constants
according to the demand of angle setting for the algorithm.

18 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

Fig . 1. Simple CORDIC Algorithm

IV. CORDIC ARCHITECTURE and APPLICATION


A. Basic CORDIC Architecture
The following diagram explains the basic hardware architecture of a CORDIC processor. It shows the
adders/subtractor and the shift registers. The adders/ subtractor perform the addition/subtraction of
binary numbers. The shift register performs the bit-shift operation in accordance with the algorithm. The
constants corresponding to fixed angle values are obtained from the Look-up table implemented as a
ROM. The current research in the design of high speed VLSI architectures for real-time digital signal
processing (DSP) algorithms has been directed by the advances in the VLSI technology, which have
provided the designers with significant impetus for porting algorithm into architecture. Many of the
algorithms used in DSP and matrix arithmetic require elementary functions such as trigonometric,
inverse trigonometric, logarithm, exponential, multiplication, and division functions.

Fig . 2. Basic CORDIC Architecture


So, there are two types of architecture as given below;
a) Parallel or Cascaded CORDIC architecture:

19 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

In this type of architecture, all the iterations take place in a single clock cycle. CORDIC algorithmic
program will be enforced in a variety of ways in which. A direct mapping of equations mistreatment in
hardware results in associate iterative design. The iterative design could also be either word-serial or
bit-serial, relying on whether or not the functional unit implements the logic for one bit or for one
word. The iterative design has to perform iterations at n times the info rate. The iterative structure
will be unrolled thus that every of the n process components continually perform the same iteration.
Unrolled architectures have 2 benefits; first the shifters will be designed for mounted shifts, which
means that they will be enforced within the wiring. Second, the ROM that holds the constant values for
the z-branch need not to be updated when each iteration. These constants will be hardwired instead of
requiring storage area. The entire CORDIC processor is therefore reduced to associate array of
interconnected adder- subtraction units . The unrolled design will be simply pipelined by inserting
pipeline registers between the adder-subtraction units The architecture is as shown below in figure2
and 3.

Fig.3 Implementation of the parallel CORDIC architecture


It has combinational circuit .It has considerable delay, but processing time is reduced as compared to the
iterative process. Shifters are of fixed size and so can be implemented in the wiring. Constants can be
hardwired instead of requiring storage space.
b) Pipelined CORDIC architecture
As CORDIC iterations are identical, it is very much convenient to map them into pipelined architectures.
The main emphasis in efficient pipelined implementation lies with the minimization of the critical path.
Pipelined CORDIC circuits have been used thereafter for high-throughput implementation of sinusoidal
wave generation, fixed and adaptive filters, discrete orthogonal transforms and other signal processing
applications. A generic architecture of pipelined CORDIC circuit is shown in Figure 4. It consists of stages
of CORDIC units where each of the pipelined stages consists of a basic CORDIC engine of the kind shown
in Figure 2. Since the number of shifts to be performed by the shifters at different stages is fixed (shiftoperation through -bit positions is performed at the nth stage) in case of pipelined CORDIC the shift
operations could be hardwired with adders; and therefore shifters are eliminated in the pipelined
implementation. The critical-path of pipelined CORDIC thus amounts to the time required by the
add/subtract operations in each of the stages. Pipelined architecture uses a structure similar to that of a
Parallel CORDIC. It uses pipeline registers in between each iteration phase. Pipelined CORDIC proves to
be advantageous with continuous input values. For an N bit data CORDIC core, N stage pipeline can give
maximum result. The first output of an N-stage pipelined CORDIC core is obtained after N clock cycles.
20 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

Thereafter, outputs will be generated during every clock cycle. The advantage of pipelined CORDIC core
over parallel and iterative CORDIC cores is its frequency of operation which is much higher when
compared to the latter two structures. Pipeline realizes same throughput as that of parallel core with
improved frequency of operation. Drawback of pipelined structure is the increase in area introduced by
the registers. Pipelined CORDIC implementation is well designed in [8] and [9]. Hence, there is a trade-off
between parallel and pipelined cores based on frequency and area. It is comparatively the most efficient
CORDIC architecture. In this method multiple iterations take place in multiple clock cycles. It is
implemented by inserting registers within the different adder stages. The architecture is given as in
figure 4.

Fig. 4 Implementation Of The Pipeline CORDIC Architecture


Parallel CORDIC can be pipelined by inserting registers between the adders stages. In most FPGA
architectures there are already registers present in each logic cell, so pipeline registers has no hardware
cost. Number of stages after which pipeline register is inserted can be modeled, considering clock
frequency of system. When operating at greater clock period power consumption in later stages reduces
due to lesser switching activity in each clock period. All the above modules are to be synthesized using
Verilog HDL and implemented on FPGA.
a. Polar To Rectangular Conversion
A Logical Expression for sine and cosine computer is a Polar to Cartesian Co-ordinate transformer. The
transformation from Polar to Cartesian is defined by:
X= r cos
Y= r sin
As pointed out above the multiplication by the Magnitude comes for free using the cordic rotator. The
transformation is accomplished by selecting the rotation mode with xo= Polar magnitude, zo= polar
phase, and yo=0. The vector result represents the polar input transformed to Cartesian space. The
transform has a gain equal to the rotator gain, which needs to be accounted for some where in the system
.If the gain is unacceptable, the polar magnitude may be multiplied by the reciprocal of the rotator gain
before it is presented to the CORDIC rotator.

21 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

Fig. 5 Core Structure of Polar to Rectangular conversion


b. IO Ports
Core structure of Polar to Rectangular Conversion is described in the above figure 5. Clock input signal,
Clock Enable and angle are taken as inputs. Sine and Cosine magnitudes are taken as outputs.

Fig. 6 IO Ports of Polar to Rectangular conversion


B. Application
CORDIC contains a variety of applications within the numerous fields. As we all know in CORDIC
algorithmic program no multiplication however solely binary addition and bit-shifting operation ensures
easy VLSI implementation. Hardware demand and value of CORDIC processor is a smaller amount as
solely shift registers, adders and look-up table (ROM) ar needed. thus variety of gates needed in
hardware implementation, like on associate degree FPGA, is minimum as hardware quality is greatly
reduced compared to alternative processors like DSP multipliers. This makes it comparatively easy in
style. Delay concerned throughout process resembles that in the implementation of a division or squarerooting operation. Thus thanks to these benefits CORDIC continues to be employed in numerous
applications and there are variety of field within the gift wherever we are able to take good thing about
these benefits. Following are the most space wherever CORDIC are often used:
a. CORDIC to calculate DCT
Discrete Cosine Transformation (DCT) is the most widely used transformation algorithm. DCT, first
proposed by Ahmed et al, 1974, has got more importance in recent years, especially in the fields of Image
Compression and Video Compression. This chapter focuses on efficient hardware implementation of DCT
by decreasing the number of computations, enhancing the accuracy of reconstruction of the original data,
and decreasing chip area. As a result of which the power consumption also decreases. DCT also improves
speed, as compared to other standard Image compression algorithms like JPEG.
b. CORDIC for robotics and 3D
22 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

CORDIC has additionally been applied to mechanism management, wherever CORDIC circuits function
the purposeful units of a programmable electronic equipment co-processor. Another application of
CORDIC is for mechanics of redundant manipulators. The case of inverse mechanics is enforced with
efficiency in parallel by computing pseudo-inverse through singular price decomposition. Collision
detection is another space wherever CORDIC has been applied to robotics. A CORDIC-based extremely
parallel resolution for collision detection between a golem manipulator and multiple obstacles within the
space is recommended. The collision detection drawback is developed together that involves variety of
coordinate transformations. CORDIC-based process components are wont to with efficiency perform the
coordinate transformations by shift-add operations. The process in graphics like 3D vector rotation,
lighting and vector interpolation are computation-intensive and are geometric in nature. CORDIC design
is thus a natural candidate for cost-efficient implementation of those geometric computations in graphics.
3D vector interpolation is additionally a vital operate in graphics that is needed for good-quality shading
for graphic rendering. It is shown that the variable-precision capability of CORDIC engine may be used to
comprehend an influence aware implementation of the 3D vector interpolator.
c. CORDIC in Communication
CORDIC even have some helpful applications in communication. CORDIC is used for economical
generation of amplitude modulation, frequency modulation, phase modulation, ASK, FSK, PSK, orthogonal
frequency division multiplexing. Thus with these applications CORDIC is employed in software system
outlined Radios that involve modulation and reception of digitally generated waves. It has been given a
pipelined CORDIC-based design for trigonometric function and trigonometric function waves generator
targeted to support modulation and reception in SDR. Compared with alternative techniques, CORDIC
had shown to own advantages once applied to SDR. the most one was CORDIC build it attainable of
making high accuracy waves, even for low frequencies. In the employment of CORDIC in software system
outlined Radios is mentioned with direct digital synthesis that may be a methodology to get waveforms
directly within the digital domain. It show generation of varied modulator systems and additionally cowl
up-/down converters of in-phase and construction signals, full mixers for advanced signals, and section
detection for synchronizers that are typically employed in software package outlined radio.
d. Other Application
The algorithmic program was primarily developed for substitution the analog resolvers by the digital
resolvers for finding period of time navigation issues of B-58 bomber. Then John Walther extended the
essential CORDIC theory to supply answer to and implement a various vary of functions. This formula
finds use in 8087 mathematics coprocessor, the HP-35 calculator, measuring device signal processors,
and robotics. Most calculators particularly those designed by Texas Instruments and Hewlett-Packard
use CORDIC algorithmic program for calculation of transcendental functions.

V. POWER DISSIPATION
The dynamic power dissipation in an FPGA is given by,

Where,
is the amount of logic in the critical path;
is the switching activity;
CL is the load capacitance at a particular node;
23 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

Vsupply is the supply voltage and


Fclk is the clock frequency
Given a fastened provide voltage and clock frequency, power consumption in a mapped FPGA
circuit is determined by switch activities and load capacitance of the assorted nodes among the circuit.
The switch activity on the essential path may be reduced by concealment the high activity nodes among
the look up tables (LUT), thus departure LUTs with little output-switching activities in the mapped
internet list. The basic block in a CORDIC structure is the adder-cum-subtractor structure and a
resultant reduction in power dissipation is accomplishable if this structure is mapped among the LUT.
this can be shown in figure two wherever the practicality of associate degree adder-cum-subtractor
structure is disguised among 2 4-input LUTs. The dynamic power might any be cut back by retiming
the structure to reduce the essential path. This is achieved by inserting pipeline registers on the feed
forward ways. Retiming additionally permits the structure to be operated at a reduced provide voltage,
thereby ensuing in the reduction of static power dissipation as well. In a pipelined system the
essential path is reduced such that the capacitance to be charged/discharged in a very single clock
cycle (Cessential) is reduced by some issue, say M. If the same clock speed, fclk, is maintained then
solely a fraction of the original capacitance (Cessential/M)is being charged/discharged in the same
quantity of your time that was antecedently required to charge/discharge the capacitance Cessential.
In different words the provide voltage, Vsupply, may be reduced to Vsupply, wherever may be a positive
constant but one. the power consumption for the pipelined structure is thus given by

VI. IMPLICATION
The implementation in this work is targeted FPGA families viz. Spartan-6 . solely LX series has been
thought-about because it is apt for general logic applications. The CORDIC engine is designed in 10
stages. The implementation is carried out for associate input quantity length varied from 4 to 32 bits. to
confirm a good comparison, similar check benches are used for all the enforced styles i.e. the input
statistics stay the same in every case. The initial style entry is done victimisation VHDL. The
constraints relating to the amount and offsets are punctually provided and a whole temporal
arrangement closure is ensured. The style synthesis, mapping, translation and simulation are
distributed in Xilinx ISE 12.1 and Xilinx ISIM tool. Power metrics are obtained victimisation Xpower
instrument. As it is observed, till date, that CORDIC processors are getting to expand their existence
within the future high performance. This results in lower measurability. Since the algorithm program
involves solely add and shift operations, it has excellent hardware efficiency and a really least
management overhead. The realization of this project will solve most of the difficulties discussed above .
This project will have following results:
a. Power efficiency:
The projected structure would be simply re-timed to reduce the capacitance associated with the critical
path thereby reducing the dynamic power reduction. in addition the projected modifications within the
standard CORDIC algorithmic program will be operated at a reduced voltage to reduce static power
dissipation additionally.

24 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

International Journal of Advance Foundation and Research in Computer (IJAFRC)


Volume 3, Issue 1, January - 2016. ISSN 2348 4853, Impact Factor 1.317

b. Lower Latency:
CORDIC uses identical shift-add operation for all application. the primary approach principally achieved
by reducing the quality of barrel shifter and additionally by reducing the scaling issue. And reduced
latency realization will be achieved by schemes like angle recording. Thus, the parallel and pipelined
strategies can use the benefits of each strategies resulting in quicker responsive system that is that the
current would like of real time application specific ICs.

VII.

REFERENCES

[1]

Burhan Khurshid & Roohie Naaz Mir Department of CSE National Institute of Technology
Srinagar, J&K india VLSI System, Architectures, Technology and Application(VLSI-SATA) Power
Efficient Implementation of Bit- Parallel Unrolled CORDIC Structures for FPGA Platforms January
2015.

[2]

A Ramya Bharathi & Mr. Md Masood Ahmad GITAM University Hyderabad Rotation of
Coordinates With Given Angle And To Calculate Sine/Cosine Using Cordic AlgorithmIEEE
MAGAZINE volume no.2, issue no. :3 march 2015.www.ijmetmr.com

[3]

Burhan Khurshid, Ghulam Mohd Rather & Hakim Najeeb-ud-din Department of Computer
Science and software Engineering National Institute of Technology srinagar,J&K India,
Performance Analysis of CORDIC Architectures Targeted for FPGA Devices Volume 2, issue 2,
February 2012 www.ijarcsse com

[4]

Pramod K. Meher, Senior Member, IEEE, Javier Valls, Member, IEEE, Tso-Bing Juang, Member,
IEEE,K. Sridharan, Senior Member, IEEE and KoushikMaharatna, Member, IEEE 50 Years of
CORDIC algorithms, Architectures and Applications, IEEE 2009.

[5]

R.Andraka, A survey of CORDIC algorithms for FPGA based computers, FPGA 98, in ACM/SIGDA
International Symposium on Field Programmable Gate Arrays, pp 191-200, 1998.

[6]

J. E. Volder, The CORDIC trigonometric computing technique, IRE Trans. Electronic computing,
volume EC-8, pp 330 334, 1959.

25 | 2016, IJAFRC All Rights Reserved

www.ijfarc.org

Das könnte Ihnen auch gefallen