Sie sind auf Seite 1von 12

2338

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Novel Memory Reference Reduction Methods for FFT Implementations on DSP Processors
Yuke Wang, Yiyan (Felix) Tang, Yingtao Jiang, Member, IEEE, Jin-Gyun Chung, Member, IEEE, Sang-Seob Song, Member, IEEE, and Myoung-Seob Lim, Member, IEEE

AbstractMemory references in digital signal processors (DSP) are expensive due to their long latencies and high power consumption. Implementing fast Fourier transform (FFT) algorithms on DSP involves many memory references to access buttery inputs and twiddle factors. Conventional FFT implementations require redundant memory references to load identical twiddle factors for butteries from different stages in the FFT diagrams. In this paper, we present novel memory reference reduction methods to minimize memory references due to twiddle factors for implementing various different FFT algorithms on DSP. The proposed methods rst group the butteries with identical twiddle factors from different stages in the FFT diagrams and compute them before computing other butteries with different twiddle factors, and then reduce the number of twiddle factor lookups by taking advantage of the properties of twiddle factors. Consequently, each twiddle factor is loaded only once and the number of memory references due to twiddle factors can be minimized. We have applied the proposed methods to implement radix-2 DIF FFT algorithm on TI TMS320C64x DSP. Experimental results show the proposed methods can achieve average of 76.4% reduction in the number of memory references, 53.5% saving of memory spaces due to twiddle factors, and average of 36.5% reduction in the number of clock cycles to compute radix-2 DIF FFT on DSP comparing to the conventional implementation. Similar performance gain is reported for implementing radix-2 DIT FFT algorithms using the new methods. Index TermsDigital signal processor (DSP), fast Fourier transform (FFT), memory reference.

gorithms and systems [1], [2]. For instance, the DFT can be used to calculate a signals frequency response, and to serve as an intermediate step in more elaborate signal processing techniques. can be directly computed by The DFT of a discrete signal

I. INTRODUCTION

N THE eld of digital signal processing, the discrete Fourier transform (DFT) plays an important role in the analysis, design, and implementation of discrete-time signal-processing al-

Manuscript received September 16, 2004; revised June 16, 2006. This research was supported by the Ministry of Information and Communication (MIC), South Korea, under the Information Technology Research Center (ITRC) support program supervised by the Institute of Information Technology Assessment (IITA). The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Shuvra S. Bhattacharyya. Y. Wang is with the Department of Computer Science, Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, TX 75083-0688 USA (e-mail: yuke@utdallas.edu). Y. Tang was with the Department of Computer Science, Erik Jonsson School of Engineering and Computer Science, University of Texas at Dallas, Richardson, TX 75083-0688 USA. He is now with the 3DSP Corporation, Irvine, CA 92618 USA (e-mail: yiyan@utdallas.edu). Y. Jiang is with the Department of Electrical and Computer Engineering, University of Nevada, Las Vegas, Las Vegas, NV 89154-4026 USA (e-mail: yingtao@egr.unlv.edu). J.-G. Chung, S.-S. Song, and M.-S. Lim are with the Division of Electrical and Information Engineering, Chonbuk National University, Jeonbuk 561-756, Korea (e-mail: jgchung@chonbuk.ac.kr; ssong@chonbuk.ac.kr; mslim@chonbuk.ac.kr). Digital Object Identier 10.1109/TSP.2007.892722

where , and are sequences of . complex numbers, and The fast Fourier transforms (FFTs) are a class of efcient algorithms to compute the DFT. The FFT algorithms are based on the principle of decomposing the computation of DFT into sequences of smaller DFTs. The rst efcient FFT algorithm was discovered by Gauss in the 18th century and rediscovered by Cooley and Tukey [3] in 1960s. Later advances in the research of FFT algorithms include the higher radix FFT [4], the mixedradix FFT [5], the prime-factor FFT [6], Winograd (WFTA) FFT [7], the split-radix FFT [8], [9], the recursive FFT [10], and the combination of decimation-in-time (DIT) and decimation-in-frequency (DIF) FFT algorithms [11]. Most of these algorithms illustrate FFT with similar FFT diagrams, which are evolved from the recursive nature of the FFT algorithms and constructed by basic buttery structure, such as the 16-point radix-2 DIT FFT diagram shown in Fig. 1. The complex coefis called the twiddle factor in the cient buttery structure in the FFT diagram. FFT algorithms can be implemented on multiple platforms. For example, FFT algorithms have been implemented on application-specic integrated circuits (ASIC) as FFT processors [12]. Hardware designs of FFT processors are often tailored to t high-speed or low-power specications but lack of exibility. FFT algorithms have also been implemented by software on general-purpose processors as building block of simulation or data processing systems [13]. Software-based implementations on general-purpose processors are exible but typically much slower than hardware implementations based on comparable hardware technologies. Digital signal processors (DSPs) are a specic type of processors optimized for digital signal processing applications such as FIR lters, IIR lters, and FFT. Software implementations of FFT algorithms on DSPs are becoming more popular than ASIC and general-purpose processor-based implementations because they offer excellent tradeoffs among cost, performance, exibility, and implementation complexity. However, to effectively implement FFT algorithms on DSPs is not trivial. It has been recognized that memory references in DSP are expensive due to their long latencies and high power

1053-587X/$25.00 2007 IEEE

WANG et al.: NOVEL MEMORY REFERENCE REDUCTION METHODS FOR FFT IMPLEMENTATIONS ON DSP PROCESSORS

2339

Fig. 1. 16-pt radix-2 DIT FFT diagram. (a) Basic radix-2 DIT FFT buttery. (b) Complete 16-pt radix-2 DIT FFT diagram.

consumption. For example, in the TI TMS320C64x DSP [15], the memory load operation takes ve pipeline execution phases to complete, which corresponds to four delay slots in the execution time. The implementations of FFT algorithms on DSP involve many memory references to access buttery inputs and twiddle factors. In general, an -pt radix-2 FFT diagram can stages, each of which contains a column be divided into butteries. Conventional implementations of FFT algoof rithms compute butteries in the natural order of the FFT diagram, i.e., the order of stages. The butteries within each stage can be computed either in parallel or in serial. Many butteries with identical twiddle factors can be found in multiple stages of the FFT diagrams. For example, seven butteries with the can be found in Stage 2 to Stage 4 of the twiddle factor 16-pt radix-2 DIT FFT diagram in Fig. 1(b). Hence, memory reference methods to load identical twiddle factors only once would reduce total memory reference time and reduce power consumption as well. In this paper, we propose novel memory reference reduction methods to minimize the memory references due to twiddle factors in FFT implementations on DSP. The proposed methods rst group the butteries with identical twiddle factors from different stages of the FFT diagram and compute them before computing other butteries with different twiddle factors. Hence, each twiddle factor is loaded only once and the redundant memory references for identical twiddle factors are removed. The memory reference reduction methods further take advantage of the properties of twiddle factors to reduce the number of twiddle factor lookups so that even more butteries can be computed by loading one twiddle factor from memory. We have applied the memory reference reduction methods to implement the radix-2 DIF and DIT FFT algorithms on TI TMS320C64x DSP. Experimental results show that the number

of memory references and the amount of memory spaces for twiddle factors are greatly reduced, and the number of clock cycles to compute the radix-2 DIF FFT algorithm could also be reduced. Our methods can be applied to other kind of FFT algorithms as well. In the following, Section II gives the background of the DIF/DIT FFT algorithms and the example of a conventional FFT implementation on DSP. Section III describes how to implement radix-2 DIF/DIT FFT algorithms on DSP with the memory reference reduction methods. Experimental results on TI TMS320C64x DSP are shown in Section IV and conclusions are drawn in Section V. II. BACKGROUND In this section, we will rst briey present basic ideas of the two most widely used FFT algorithms: the DIT FFT and the DIF FFT. We will then show the implementation of radix-2 DIF FFT from TIs DSP library [16] as a typical example of conventional implementation for FFT algorithms on DSP. can be directly computed as The DFT of discrete signal (1) , and are sequences of where . complex numbers, and DIT and DIF FFT algorithms are obtained by decomposing and the output sequence in (1) into the input sequence successively smaller subsequences, respectively. For example, the radix-2 DIT and DIF FFT algorithms can be obtained by and into odd and even indexed terms, resplitting spectively. The computation of the of radix-2 DIT and DIF FFT

2340

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 2. (a) Basic radix-2 DIF FFT buttery. (b) Complete 16-pt radix-2 DIF FFT diagram.

algorithm can be represented by radix-2 DIT and DIF FFT diagrams, which are shown in Fig. 1(b) and Fig. 2(b), respectively. The computation order of the butteries in conventional FFT implementations on DSP is based on the partitioning of the FFT diagrams. In general, the FFT diagram can be partitioned into several stages. Each stage contains a constant number of butteries. For example, the -pt radix-2 DIT/DIF FFT diagram can stages, each of which contains be partitioned into butteries. The butteries within a stage have no data dependencies with each other but have data dependencies with butteries in other stages. For example, the butteries in Stage 2 of the FFT diagram in Fig. 2 have no data dependencies with each other but have data dependencies with butteries in both Stage 1 and Stage 3. The butteries in the same stage of the FFT diagram can be further partitioned into groups. Each group contains all butteries sharing identical twiddle factors within the same stage. Particularly, the butteries in the Stage of the -pt radix-2 DIT FFT diagram are divided into groups, while the Stage of -pt radix-2 DIF FFT diagram contains groups. Fig. 3 illustrates the partitioning of the 16-pt radix-2 DIT and DIF FFT diagrams. Based on the partitioning of the radix-2 DIT and DIF FFT diagrams, the butteries can be computed following the index order of the stages and groups. The butteries in the same group are computed from top to bottom. Butteries with identical twiddle factors are computed in multiple stages of the FFT diagrams in Fig. 3. For example, seven butteries with the twiddle factor are computed in Stage 1 to Stage 3 of the 16-pt radix-2 DIT FFT diagram in Fig. 3. Hence, identical twiddle factors are accessed multiple times in conventional FFT implementations. Fig. 4 shows the C code taken from TIs DSP library [15], which implements the -pt radix-2 DIF FFT algorithm, where the value of is given as an input to the C code.

The C code in Fig. 4 shows a three-loop structure: 1) the outer-most loop, the -loop, counts the stages, loops for times; 2) the second outer loop, the -loop, counts the groups within each stage and decides which twiddle factor to be loaded; and 3) the inner-most loop, the -loop, computes the butteries within each group. The and indicate the stage and group number, respectively. The and indicate the upper and lower input indexes of the buttery computed by the inner-most loop and indicates the twiddle factor to be loaded. Since the conventional implementations strictly follow the natural order of the FFT diagram, identical twiddle factors are loaded multiple times when computing butteries from different stages of the FFT diagram. For example, the C code in Fig. 4 loads the twiddle factor at the -loop when computing butteries in both Stage 1 and Stage 2 of the 16-pt radix-2 DIF FFT diagram.

III. FFT IMPLEMENTATIONS WITH THE NOVEL MEMORY REFERENCE REDUCTION METHODS In order to remove redundant memory references due to identical twiddle factors, we propose novel memory reference reduction methods to implement FFT algorithms such that each twiddle factor is loaded only once by grouping butteries with identical twiddle factors together. Furthermore, the proposed methods minimize the number of twiddle factors needed in FFT diagrams by taking advantage of properties of the twiddle factors. The memory reference reduction methods work for implementations of many kinds of FFT algorithms. As examples, we will demonstrate applications of the memory reference reduction methods on the two most popular FFT algorithms: the radix-2 DIF and DIT FFT algorithms.

WANG et al.: NOVEL MEMORY REFERENCE REDUCTION METHODS FOR FFT IMPLEMENTATIONS ON DSP PROCESSORS

2341

Fig. 4. C code of radix-2 DIF FFT from [15].

Fig. 5. factor.

Single buttery at Stage s in radix-2 DIF FFT diagram with twiddle

and

uses the twiddle factor

Fig. 3. Partitioning of the 16-pt radix-2 DIT and DIF FFT. (a) Partitioning of 16-pt radix-2 DIT FFT diagram. (b) Partitioning of 16-pt radix-2 DIF FFT diagram.

A. Grouping of Butteries With Identical Twiddle Factors In this subsection, we will use the radix-2 DIF FFT diagram to demonstrate how to group and compute the butteries with identical twiddle factors from different stages together. For the radix-2 DIF FFT algorithm, a buttery in Stage is and , and twiddle composed with the inputs factor . Fig. 5 shows the buttery at Stage of an -pt radix-2 DIF FFT diagram with the corresponding twiddle factor. For example, in the second stage of a 16-pt radix-2 DIF FFT diagram shown in Fig. 2, the buttery with the input

. Theorem 1: In the Stage of the -pt radix-2 DIF FFT diadifferent twiddle factors that can be repregram, there are sented by , where . Among them, twiddle factors of the form for , or any other late stages. will not show up in stage The butteries within a stage with identical twiddle factors can be grouped and computed in any order without destroying the data dependencies in the original radix-2 DIF FFT diagram. , , For example, the butteries with twiddle factors , and are only found in Stage 1 of the 16-pt radix-2 DIF FFT diagram. These butteries can be grouped and computed in any order without affecting the computations of other butteries. In addition, the butteries with twiddle factors and do not exist in any stage later than Stage 2 of the 16-pt radix-2 DIF FFT diagram. Hence, they can be grouped and com, puted in any order after the butteries with twiddle factors , , and are computed. Similarly, the butteries with can be grouped and computed in any order twiddle factor in the 16-pt radix-2 DIF FFT diagram after the butteries with and are computed. The butteries with twiddle factors do not exist in stages later than Stage 3. the twiddle factor Following this principle, the computation of the -pt radix-2 DIF FFT diagram can be done in steps. Each step groups and computes the butteries with the twiddle factor appears in all stages up to the stage of interest and will not occur in the future stages of the FFT diagram. The butteries within a step can be computed in any order except for the butteries with the . Butteries with the twiddle factor aptwiddle factor pear in all the stages of the FFT diagram and have data dependencies between the stages.

2342

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 6. Grouping butteries with identical twiddle factors together in radix-2 DIF FFT diagram.

Reduced Memory Reference FFT algorithm: Based on Theorem 1, the -pt radix-2 DIF FFT diagram can be steps as the following. computed in Step 1: Compute the butteries with twiddle factors that will not occur after the Stage 1 of the FFT diagram. butteries with twiddle factor Compute the where in the Stage 1 of the -pt radix-2 DIF FFT diagram. Step 2: Compute the butteries with twiddle factors that will not occur after the Stage 2 of the FFT diagram. The butteries with twiddle factors that will not occur after the Stage 2 of the -pt radix-2 DIF FFT diagram butteries in the second stage in the DIF include: 1) butteries in the rst stage. FFT diagram and 2) These butteries are with twiddle factors where . Step : Compute the butteries with twiddle factors that will not occur after the Stage of the FFT diagram, where . The butteries with twiddle factors that will not occur after the Stage of the -pt radix-2 DIF FFT butteries in the Stage , diagram include butteries in the Stage , and butteries in the rst stage of the radix-2 DIF FFT diagram. where These butteries are with twiddle factors . Step : Compute the butteries with twiddle factor . butteries with twiddle factors in an Totally -pt radix-2 DIF FFT diagram are computed. In this way, each twiddle factor is loaded exactly once during the computation of the -pt radix-2 DIF FFT diagram. To illus-

steps, we can redraw the 16-pt radix-2 trate the above DIF FFT diagram from Figs. 26, where the butteries with identical twiddle factors are grouped together. All butteries in Fig. 6 can be computed with the twiddle factor . without multiplications in the Step B. Reduction of the Number of Necessary Lookups of Twiddle Factors The method in Section III-A can reduce the number of memory accesses for each twiddle factor in implementing the -pt radix-2 DIF FFT algorithm. Furthermore, the memory references can be minimized by reducing the number of twiddle factors to be looked up using the properties of the twiddle factors. For example, the butteries in Step 2 of the FFT diagram and . The in Fig. 6 are computed with twiddle factors twiddle factor can be replaced by with a simple derivation

Hence, only the twiddle factor is needed in Step 2. Simiand can be replaced by larly, twiddle factors and , respectively. Hence, only twiddle factors and are necessary in Step 1 of the FFT diagram in Fig. 6. In general, we have the following property:

(2)

WANG et al.: NOVEL MEMORY REFERENCE REDUCTION METHODS FOR FFT IMPLEMENTATIONS ON DSP PROCESSORS

2343

Therefore, we have , which implies that

and

Fig. 7. Computing two butteries together in one stage of the radix-2 DIF FFT diagram.

The twiddle factors where , , . The above property of twiddle factors can be applied to any FFT algorithm and reduce the number of twiddle factors needed to store in memory. More butteries can be computed by loading one twiddle factor then grouping the butteries with identical twiddle factors together. Theorem 2: Considering the two butteries at Stage of radix-2 DIF FFT diagram shown in Fig. 7(a), both butteries can be computed together by loading only one twiddle factor , as shown in Fig. 7(b). in Stage Proof: Based on Fig. 5, input pairs with input to form one buttery, and the twiddle factor used in the buttery is

and

Since , we have

and

if the above equation is

, the result of

are complex numbers with separated real and imaginary parts, which are stored separately in memory. Therefore, by loading and , we can compute both butteries. After the number of necessary twiddle factors is reduced, the FFT diagram in Fig. 6, where the butteries with identical twiddle factors are grouped together, can be further redrawn in Fig. 8. Only three twiddle factors are needed to be looked up in Fig. 8 comparing to seven in the original 16-pt radix-2 DIF FFT diagram in Fig. 2. The method proposed in this subsection to reduce the number of twiddle factors to be looked up and to compute two butteries together by loading one twiddle factor is different from the radix-4 FFT algorithm [1]. The radix-4 FFT algorithm saves the complex multiplication with twiddle factors by combining two adjacent stages in the radix-2 FFT diagram. The method proposed in this subsection does not save the complex multiplication in the radix-2 FFT diagram but saves the number of memory lookups needed for the twiddle factors. Moreover, the proposed method can be applied to radix-4 FFT algorithms to reduce the number of twiddle factor needed to be looked up in the radix-4 FFT diagram. Fig. 9 shows an example of reducing the number of necessary twiddle factors in the 16-pt radix-4 DIF FFT diagram. The twiddle factors in the conventional 16-pt radix-4 DIF FFT diagram shown in Fig. 9(c) are , , , , , and . By taking advantage from the properties of the twiddle factors, the number of twiddle factors needed to be looked up in 16-pt radix-4 DIF FFT diagram are reduced to , , and as shown in Fig. 9(d). C. Application of Memory Reference Reduction Methods on Radix-2 DIT FFT The new memory reference reduction methods can be applied to implement many existing FFT algorithms. As an example, we apply the memory reference reduction methods to

else if becomes

, the result

2344

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 8. 16-pt radix-2 DIF FFT diagram after the proposed methods are applied.

Fig. 9. Reducing the number of twiddle factors needed to be looked up in the 16-pt radix-4 DIF FFT diagram. (a) Radix-4 DIF FFT buttery. (b) Simplied representation of the radix-4 DIF FFT buttery. (c) Conventional 16-pt radix-4 DIF FFT diagram. (d) 16-pt radix-4 DIF FFT diagram with the number of twiddle factors to be looked up reduced.

implement the radix-2 DIT FFT algorithm. Due to the difference between the DIT and DIF FFT diagrams, the method described in Sections III-A and III-B cannot be applied to radix-2

DIT FFT diagram directly. To apply the method described in Section III-A to an -pt radix-2 DIT FFT diagram, the butterare grouped and computed ies with twiddle factor

WANG et al.: NOVEL MEMORY REFERENCE REDUCTION METHODS FOR FFT IMPLEMENTATIONS ON DSP PROCESSORS

2345

Fig. 10. Grouping butteries with identical twiddle factors together in radix-2 DIT FFT diagram.

Fig. 11. 16-pt radix-2 DIT FFT diagram after the proposed methods are applied.

together before computing butteries with other twiddle factors in Step 1. In the following step , the butteries with twiddle factors , where , are computed. By grouping the butteries with identical twiddle factors together, we can redraw the 16-pt radix-2 DIT FFT diagram in Fig. 10. To apply the method described in Section III-B to an -pt radix-2 DIT FFT diagram, we rst compute all the butteries in Stage 1. Then, we compute the with twiddle factor rest butteries with twiddle factor and the butteries with twiddle factor together. At last, the remaining butteries in the FFT diagram are computed following the principle of Section III-B. After the number of twiddle factors to be looked up is reduced, the FFT diagram in Fig. 10 can be further

redrawn in Fig. 11. Only three twiddle factors are needed to be looked up in Fig. 11 comparing to seven in the original 16-pt radix-2 DIT FFT diagram in Fig. 1. IV. PERFORMANCE EVALUATION FOR THE NOVEL MEMORY REFERENCE REDUCTION METHODS The number of memory references due to twiddle factors in conventional implementations of -pt radix-2 DIF or DIT FFT , which equals to the number of the groups in algorithms is the FFT diagram. Grouping the butteries with identical twiddle factors together reduces the number of memory references due to . After the number of to twiddle factors from necessary twiddle factors being minimized by the properties of

2346

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

Fig. 12. DIF FFT code 1 groups the butteries with identical twiddle factors together only.

the twiddle factors, only memory references for twiddle factors are needed to implement -pt radix-2 FFT algorithms. We have applied the memory reference reduction methods to implement the radix-2 DIF and DIT FFT on TI TMS320C64x DSP, which is a xed-point DSP with enhanced very long instruction word (VLIW) architecture. The C64x DSP has eight functional units that can execute a maximum of eight operations in parallel, two register les with each 32 32-bit registers, and 32-bit internal communication bandwidth. Four pieces of C codes are compiled with the maximum compiler effort (-o3) and executed in the TI Code Composer Studio (CCS) v2.1 [14], which is the software development and simulation environment for TI TMS320C64x DSP. The TIs DIF FFT code is the radix-2 DIF FFT code in Fig. 4 taken from TIs DSP library [15]. The DIF FFT code 1 in Fig. 12 only groups the butteries with identical twiddle factors together in radix-2 DIF FFT diagram without reducing the number of twiddle factors needed to be looked up. The DIF FFT code 2 in Fig. 13 is written based on the radix-2 DIF FFT diagram in Fig. 8, where the memory reference reduction methods are applied. Besides the above three codes, the radix-2 DIT FFT code with memory reference reduction methods is shown in Fig. 14, which is based on the radix-2 DIT FFT diagram in Fig. 11. The performance gures of the four codes are compared in Table I, including the number of memory references due to twiddle factors, the amount of memory storage for twiddle factors, and the number

of clock cycles to compute FFT for FFTs with different sizes. The number of clock cycles for all code to compute the FFTs are precisely measured using the break point function in CCS. The experimental results show that the radix-2 DIF FFT algorithm implementation with grouping of the butteries with identical twiddle factors together alone can achieve average of 50.9% reduction in the number of memory references due to twiddle factors and average of 29.7% reduction in the number of clock cycles comparing to the conventional implementation taken from TIs library. Furthermore, when the number of twiddle factors needed to be looked up is also reduced, average of 76.4% reduction in the number of memory references due to twiddle factors, average of 53.5% of memory spaces saving for twiddle factors, and average of 36.5% reduction in the number of clock cycles can be achieved comparing to the conventional implementation taken from TIs library. The performance of the radix-2 DIT FFT algorithm implementation with memory reference reduction methods is slightly better than the radix-2 DIF FFT algorithm implementation. V. CONCLUSION In this paper, we propose novel memory reference reduction methods to minimize the number of memory references due to twiddle factors in FFT implementations on DSP. The proposed methods rst group the butteries with identical twiddle factors from different stages in the FFT diagram and compute them

WANG et al.: NOVEL MEMORY REFERENCE REDUCTION METHODS FOR FFT IMPLEMENTATIONS ON DSP PROCESSORS

2347

Fig. 13. DIF FFT code 2 with the memory reference reduction methods based on Fig. 8.

Fig. 14. DIT FFT code with the memory reference reduction methods based on Fig. 11.

together, and then reduce the total number of necessary twiddle factors by taking advantage from the properties of twiddle factors. Consequently, each twiddle factor is loaded only once

and the number of memory references due to twiddle factors can be minimized. Experimental results show the proposed methods can achieve average of 76.4% reduction in the number

2348

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 55, NO. 5, MAY 2007

TABLE I PERFORMANCE COMPARISON OF THE IMPLEMENTATIONS

[6] D. P. Kolba and T. W. Parks, A prime factor FFT algorithm using high-speed convolution, IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-25, no. 4, pp. 281294, Aug. 1977. [7] S. Winograd, On computing the discrete Fourier transform, Math. Comput., vol. 32, no. 141, pp. 175199, Jan. 1978. [8] P. Duhamel and H. Hollmann, Split radix FFT algorithm, Electron. Lett., vol. 20, pp. 1416, Jan. 5, 1984. [9] D. Takahashi, An extended split-radix FFT algorithm, IEEE Signal Process. Lett., vol. 8, no. 5, pp. 145147, May 2001. [10] A. R. Varkonyi-Koczy, A recursive fast Fourier transform algorithm, IEEE Trans. Circuits Syst. II, vol. 42, no. 9, pp. 614616, Sep. 1995. [11] A. Saidi, Decimation-in-time-frequency FFT algorithm, in Proc. ICASSP, Apr. 1994, pp. III:453III:456. [12] B. M. Baas, A low-power, high-performance, 1024-point FFT processor, IEEE J. Solid-State Circuits, vol. 34, no. 3, pp. 380387, Mar. 1999. [13] Matlab Function ReferenceFFT. Mathworks, Inc. [Online]. Available: http://www.mathworks.com/access/helpdesk/help/techdoc/ref/ fft.shtml?BB=1 [14] TMS320C6000 Programmers Guide (Rev. G), Texas Instrument, Aug. 1, 2002, SPRU198G. [15] TMS320C64x DSP Library Programmers Reference (Rev. B), Texas Instrument, Oct. 23, 2003, SPRU565A.

Yuke Wang received the B.Sc. degree from the University of Science and Technology of China, Hefei, China, in 1989, the M.Sc. degree and the Ph.D. degree from the University of Saskatchewan, Saskatoon, Canada, in 1992 and 1996, respectively. He has held faculty positions at Concordia University, Montreal, QC, Canada, and Florida Atlantic University, Boca Raton. Currently, he is an Associate Professor in the Computer Science Department, University of Texas at Dallas, Richardson. He has also held visiting assistant professor positions in the University of Minnesota, the University of Maryland, and the University of California at Berkeley. His research interests include VLSI design of circuits and systems for DSP and communication, computer-aided design, and computer architectures. He has published more than 20 papers in IEEE/ACM Transactions. Dr. Wang served as an Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, PART II (20022003), as an Editor of the IEEE TRANSACTIONS ON VLSI SYSTEMS (20012002), as an Editor of Applied Signal Processing, and a few other journals.

of memory references, 53.5% saving of memory spaces due to twiddle factors, and average of 36.5% reduction in the number of clock cycles to compute radix-2 DIF FFT on DSP comparing to conventional implementation. ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their careful reading and valuable comments that improved the quality of this paper. A reviewer has also brought to our attention that C. M. Rader of MIT, in 1965, wrote an FFT program which used the idea in Section III-A, but he did not publish it. REFERENCES
[1] C. S. Burrus and T. W. Parks, DFT/FFT and Convolution Algorithms and Implementation. New York: Wiley, 1985. [2] A. V. Oppenheim and C. M. Rader, Discrete-Time Signal Processing, 2nd ed. Upper Saddle River, NJ: Prentice-Hall, 1999, 0137549202. [3] J. W. Cooley and J. W. Tukey, An algorithm for the machine calculation of complex Fourier series, Math. Comput., vol. 19, pp. 297301, 1965. [4] G. D. Bergland, A radix-eight fast-Fourier transform subroutine for real-valued series, IEEE Trans. Electroacoust., vol. AE-17, no. 2, pp. 138144, Jun. 1969. [5] R. C. Singleton, An algorithm for computing the mixed radix fast Fourier transform, IEEE Trans. Audio Electroacoust., vol. AE-17, no. 2, pp. 93103, Jun. 1969.

Yiyan (Felix) Tang received the B.Sc. degree in electrical engineering from South China University of Technology, Guangzhou, China, in 2000, and the M.Sc. in computer engineering and the Ph.D. degree in computer science from the University of Texas at Dallas, Richardson, in 2002 and 2005, respectively. Since 2005, he has been with the 3DSP Corporation, Irvine, CA, where he works on design and implementation of wireless communication systems on digital signal processors. His current research interests lie in efcient and effective design and implementation of wireless communication and signal processing systems on digital signal processors.

Yingtao Jiang (M01) received the B.Eng. degree in biomedical engineering and electronics from Chongqing University, Chongqing, China, the M.A.Sc. degree in electrical engineering from Concordia University, Montreal, QC, Canada, and the Ph.D. degree in computer science from the University of Texas at Dallas, Richardson, in 1993, 1997, and 2001, respectively. He is currently an Assistant Professor in the Department of Electrical and Computer Engineering, University of Nevada, Las Vegas. His research interests include algorithms, VLSI architectures, and circuit-level techniques for the design of DSP, networking, and telecommunications systems, computer architectures, and biomedical signal processing, instrumentation, and medical informatics.

WANG et al.: NOVEL MEMORY REFERENCE REDUCTION METHODS FOR FFT IMPLEMENTATIONS ON DSP PROCESSORS

2349

Jin-Gyun Chung (S90M98) received the B.S. degree in electronic engineering from Chonbuk National University, Chonju, Korea, in 1985 and the M.S. and Ph.D. degrees in electrical engineering from the University of Minnesota, Minneapolis, in 1991 and 1994, respectively. Since 1995, he has been with the Department of Electronic and Information Engineering, Chonbuk National University, where he is currently a Professor. His research interests are in the area of VLSI architectures and algorithms for signal processing and communication systems, which include the design of high-speed and low-power algorithms for arithmetic circuits, OFDM systems, and communication systems for automobiles.

Myoung-Seob Lim (S85M90) received the B.S. degree in electronic engineering from Yeonsei University, Seoul, Korea, in 1980 and the M.S. and Ph.D. degrees in electrical engineering from Yonsei University in 1982 and 1990, respectively. He has worked at the Elecronic Telecommunication Research Institute from 1985 to 1996. Since 1996, he has been with the Department of Electronic and Information Engineering, Chonbuk National University, Jeonbuk, Korea, where he is currently a Professor. His research interests are in the area of design of CDMA and OFDM communication systems, which include the performance analysis, bandwidth efcient modulation, and synchronization, and also CAN for In Vehicle Networks.

Sang-Seob Song (S78M81) received the B.S. degree in electrical engineering from Chonbuk National University in 1978 and the M.S. and Ph.D. degrees in electrical and computer engineering from the Korea Advanced Institute of Science and Technology, Daejeon, Korea, and the University of Manitoba, Winnipeg, MB, Canada, in 1980 and 1990, respectively. Since 1981, he has been with the Department of Electronic and Information Engineering, Chonbuk National University, Jeonbuk, Korea, where he is currently a Professor. His research interests are in the area of high-speed modems which includes channel coding and modulation.

Das könnte Ihnen auch gefallen