Sie sind auf Seite 1von 6

Design and Implementation of the Discrete Wavelet Transform on an FPGA Platform to Process Data Sets of up to Three Dimensions

E. A. Rivera-Juarico, Student Member, IEEE, J. M. Ramrez-Cort s, Senior Member, IEEE, V. Alarcon-Aquino e and J. Escamilla-Ambrosio, Member, IEEE

AbstractThe aim of this work is the design and implementation of a DWT architecture FPGA-based platform for up to three dimensional signal processing. First, lter banks are designed using a distributed arithmetic technique. Then, we design controllers, interfaces and protocols that handle, transmit and sequence all the data during the computing process. Data is sent via USB to the FPGA and the user interface is programmed in MATLAB. A graphical user interface manages the system operation and displays the results on a PC. Designed lters are compared with a fully parallel architecture in relation to the number of gates used, speed, and algorithm performance. KeywordsDaubechies, Distributed Arithmetic, Filter Design.

up to order n exist and are bounded. A set of orthogonal functions with compact support, widely used now in the wavelet eld, and a method to obtain the corresponding lter coefcients was proposed by Ingrid Daubechies [3]. Several interesting properties are listed below. A. Multiresolution A multi-resolution or multi-scale scheme consists of a sequence of successive approximation spaces Vj with the following conditions: ...V2 V1 V0 V1 V2 ... jZ Vj = L2 (R) jZ Vj = {0} (3) (4) (5)

I. I NTRODUCTION Wavelet signal processing is a technique that can be used to analyze the temporal and spectral properties of non-stationary signals. A prototype function called Mother Wavelet () is stretched and shifted to obtain an orthogonal function set that is used to analyze in a multiresolution approach the time-scale energy distribution of signals. Mother Wavelets can be designed to achieve different requirements. Essentially, mother wavelets are oscillating waves with nite energy and short duration (conditions 1 and 2) that are well localized in the time and frequency spaces [1]. The designers can use the well documented mother wavelets functions found in literature [2] or they can propose their own wavelets to use special characteristics to reach their purposes. So, wavelet theory can view as a exible signal processing framework rather than a simple transformation. dt = 0
+

The condition 3 says that V0 is a central space which includes all subspaces Vj with j +Z and belongs in turn to a series of spaces Vj with j Z. In addition the complex conjugate of the union of all spaces should be a function in L2 (R) as indicated by the condition 4. These spaces should be independent, and thus their intersection must be zero (condition 5). In addition, the multi-resolution appearance is a consequence of the following requirements: f Vj f (2j ) V0 (6) (7)

(1) f V0 f ( n) V0 , n Z (2)

c = 2

|()|2 d < ||

One type of mother wavelet can be distinguish by a property named compact support such that all derivatives of function
This work was supported by CONACYT grant No. 290541 E. A. Rivera-Juarico is a MSc by National Institute of Astrophysics, Optics and Electronics, St. Ma. Tonantzintla, PUE, Mex., e-mail: rije@inaoep.mx J. M. Ramrez-Cort s is a titular researcher at the Electronics Department, e National Institute of Astrophysics, Optics, and Electronics, St. Ma. Tonantzintla, PUE, Mex., e-mail: jmram@inaoep.mx. V. Alarcon-Aquino is titular professor at the Electronics Department, Universidad de las Americas, Puebla, Mex., e-mail: vicente.alarcon@udlap.mx J. Escamilla-Ambrosio is a titular professor at the Electronics Department, Universidad Politcnica de Guanajuato, Mex., e-mail: pjorgeea@gmail.com

That is, all spaces are scaled versions of the central space V0 (condition 6) and V0 is invariant under integer translations (condition 7). This implies that if f Vj , then f ( 2j n) Vj , for all n Z. Finally it requires that there be a function V0 such that 0,n ; n Z (8) is a normalized orthogonal basis in V0 , where {j, n} Z, j,n (s) = 2j/2 (2j s n) (9)

This implies that {j,n ; n Z} is a normalized orthogonal basis in Vj for all j Z. The function is called scaling

function multi-resolution analysis and equation 9 is known as dilation equation. Multi-resolution analysis can be described through a pyramidal structure that decomposes a signal into the approximations and details coefcients, creating a level of decomposition. It is required to apply the same scaling function and mother wavelet in each level on scaled versions of the input signal obtained through the previous levels. The multi-resolution leads naturally to a hierarchical and fast computation of wavelet coefcients at different scales of a given function or sequence. B. Filters Banks for Wavelets Suppose, has been obtained the inner product of f with 0,k , i.e. the scale j = 0. Then the calculation of f, j,k for j 1 is as follows: starting with f, 0,n , estimate f, 1,n by, f, j,k =
n

Fig. 1: Wavelet decomposition for three dimensional functions. low pass lter and high pass for each dimension respectively. The whole procedure is shown in g. 1 where partial results are displayed in each dimension ltering: (a) original volume, (b) ltering output in x, (c) ltered output in y, and (d) ltering output in z which represents the nal result of the DWT descomposition three dimensional process at one level. Fig. 2 shows the resulting ltering scheme using lter banks; (a) for x, (b) for y and (c) for z directions.
2 2
LPF

gn2k f, j1,k

(10)

Which can be obtained by the convolution of f, j1,k with gn , for n Z and retaining only pairs of output samples (decimation by a factor of 2). Then f, 1,n is calculated using a similar process of convolution and decimation given by, f, j,k =
n

hn2k f, j1,k

(11)
HPF LPF

This can be found for all the family functions and , which is useful for multiple levels of wavelet coefcients. The entire procedure can be viewed as the computation of approximations of f , together with the difference in information between two successive levels of approximation. The function can be viewed as a high pass lter where we get the wavelet coefcients and the function as a low pass lter which gives us the coefcients of approximation to calculate the next level of decomposition by feeding this output to another pair of lters.This pair of lters are known as analysis lter banks. Then the next level decomposition gives us a representation of f with a version of and scaled by a factor of 2j with j the level of decomposition. Moreover, the multiresolution analysis occurs when there are different levels or scaled versions of the input signal. In order to obtain j decomposition levels, j analysis lter banks are connected in such a way that the approximation coefcients of level j 1 enters the level j with j > 1 , and j = 0 corresponds to the initial level of the original signal f . C. DWT processing on Three Dimensional data sets Processing functions in three dimensions can be considered as a combination of one-dimensional DWT applied to each of the dimensions of the function to be processed. So, what is considered a data volume of N1 N2 N3 , is divided into eight sub-volumes with dimensions N1 N2 N3 each. The following 2 2 2 nomenclature is used for sub-volumes: LLL, LLH, LHL, LHH, HLL, HLH, HHL, HHH. Where L and H denote the outputs of

HPF HPF LPF LPF LPF

2 2 2 2 2 2
(c)

LLL

LLH LHL

f(n)

HPF

LPF

2 2

LHH

LPF

HPF (b)

2 2

HPF HPF

(a)

2 2

HLH HHH

HLL

HHL

Fig. 2: DWT ltering scheme for functions in three dimensions. In a general point of view, n-dimensional functions can be processed by a similar procedure of ltering in every dimension. D. Distributed Arithmetic Equation 12 (linear convolution) can be modied with distributed arithmetic (DA) to process signed numbers in 2s complement [4].
L1

y[n] = f [n] x[n] =


k=0

f [k]x[n k]

(12)

In 2s complement, the most signicant bit is used to distinguish between positive and negative numbers. If |x[n]| < 1, then it can be represented by,
B1

II. M ATERIALS AND M ETHODS A. Target Technology and Tools Field programmable gate array (FPGA) Spartan3E XC3S1200e-4fg320 of Xilinx Inc. over Nexys2 development board of Digilent Inc. was selected as primary target technology. Brief overview of this device is shown in table II. TABLE II: OVERVIEW XC3S1200e-4fg320. OF THE
1200K 19512 2168 8672 136k 504K 28 8 304 124

x[i] = x0 [i] +
b=1

xb [i]2

(13)

With b = 0 representing the most signicant bit (and sign bit) and b = B 1 least signicant bit. The distribution of the architecture for the linear convolution is given by,
N 1

Spartan3E

y[i] =
n=0 N 1

c[n]x[i n]
B1

(14)

y[i] =
n=0 N 1

c[n] x0 [i n] +
b=1 N 1

xb [i n]2b
B1

(15)

y[i] =
n=0 N 1

c[n]x0 [i n] +
n=0 N 1

c[n]
b=1 B1

xb [i n]2b (16)

Gates Equivalent Logic Cells Total CLBs Total Slices Distributed RAM bits Block RAM bits Dedicated Multipliers DCMs Inputs/Outputs Differential Inputs/Outputs pairs

y[i] =
n=0 N 1

c[n]x0 [i n] +
n=0

c[n]
b=1 N 1 B1

xb [i n]2

(17)

y[i] =
n=0 N 1

c[n]x0 [i n] +
n=0 b=1 B1

c[n]xb [i n]2b (18)


N 1

y[i] =
n=0

c[n]x0 [i n] +
b=1

2b
n=0

c[n]xb [i n] (19)
c[n],xb [in]

Design tools used were MATLAB R2010b (Wavelet Toolbox 4.6, Filter Design Toolbox 4.7.1 and Filter Design HDL Coder 2.7) from The MathWorks Inc. for wavelet lter banks design, DA HDL code generation and user interface for running and debugging purposes. Other tools used were ISE WebPack 12.4 from Xilinx Inc. for implementation of designed lters, ADEPT 2.6.1 System and Adept 2.0.1 SDK from Digilent Inc. for programming and data transmission from the register bus implemented on the FPGA system. B. Design Method First, both Daubechies wavelets coefcients lters bank db1 to db8 are loaded into the MATLAB environment by the Wavelet Toolbox and passed as parameters to the Filter Design Toolbox to generate a FIR structures. Table III species the requirements for this system. TABLE III: PARAMETERS OF THE FILTERS
Filter Structure Arithmetic Coefcients Input Output Rounding Mode Over-ow mode Direct-Form FIR Fixed Point 16 bits in range [-1 1) 16 bits in range [-1 1) 16 bits in range [-2 2) Convergent Saturate

LU T contents from c[n], xb [i n] are shown in table I. When processing the vector x0 the accumulator performs a subtraction operation and the result is obtained after B cycles. This process is represented through g. 3. TABLE I: LU T CONTENTS IN GENERAL DA SCHEME
c[n]xb [i n] 0 c[N 1] c[N 2] c[N 2] + c[N 1] . . . c[0] + c[1] + ... + c[n 1] xb [i] 0 0 0 0 . . . 1 xb [i 1] 0 0 0 0 . . . 1 ... ... ... ... ... . . . ... xb [i N + 1] 0 1 0 1 . . . 1

x0[i]

x1[i]

xB-1[i]

x0[i-1]

x1[i-1]

xB-1[i-1]

+/-

x0[i-N+1]

x1[i-N+1] xB-1[i-N+1]

2-1

Fig. 3: Structure based on DA.

Register

LUT

y[i]

+ 0b<B-2

b= B-1

Then, LUT partition is calculated by considering the length of lters and LUTs of 4 inputs.The sum of the partitions must be equal to the length of the lter. Finally, lters are coded in VHDL language by the Filter Design HDL Coder for DA and parallel architectures. Testbench les are also generated with impulse, chirp and white noise signals for both architectures types. Once the lters designed, system is implemented with the ISE WebPack tool. Epp bus with 8 bits registers was implemented with USB-EPP and EPP-Wavelet interfaces along with a set of eight wavelet analysis lters banks, and decimation, data path control and synchronization modules. Each lter bank is similar to lter prototype Wdb1DA shown in g. 4.

Fig. 4: Wavelet analysis lter bank module Wdb1DA with signals and routings.

Fig. 7: Implemented system resulting; USB interface, wavelet interface and wavelet module C. n-dimensional processing Developed management and administration routines provide the pyramidal structure of iterative computation for three dimensional signals. The routines are based on the structures proposed in [5]. Kernel of ltering convolution was named waveletFPGA1Dv2 function. The waveletFPGA1Dv2 function is responsible for sending the data to process into the FPGA and retrieve the results from it. The rst is to reset the wavelet module, select the wavelet to use and prepares the data to be sent byte to byte by a USB channel using dynamic link library DPCUTIL.DLL from the Adept 2.0.1 SDK. System retrieve the data when the counter is even. Latency of the module and the length of the selected lter were considered to process an input vector and obtain all the coefcients that are generated. III. R ESULTS

Fig. 5: Module WdbNda01; Set of eight wavelet analysis lters banks with multiplexing data path modules for wavelet selection. The implementation include a set of eight wavelet analysis lter banks with the necessary control for multiplexing data path as shown in g. 5. Total lters implemented was sixteen, i.e. low pass lter and high pass lter for each bank. Each lter ranges from rst to eighth order. Wavelet Module implemented with synchronization and decimation modules are shown in g. 6. EnablePulse submodule generate a pulse of 16 clock cycles to process every data when enable signal goes high in one clock cycle. Signal generated from this sub-module serves as clock to the decimation sub-modules decimationby2. Module also outputs a decimation state signal. Finally the entire system designed for the target FPGA is illustrated in g. 7.

Both wavelet analysis lter bank architectures kernels were synthesized one by one in ISE WebPack. Results of device utilization for each lter bank is presented in tables IV and V for DA and parallel architectures respectively. Particular synthesis options were setup to ensure both architecture were implemented without special or dedicated FPGA resources like dedicated multipliers or embedded shift register in Spartan3E to ensure portability through different technologies. TABLE IV: SUMMARY ARCHITECTURES
N Flip Flops Total LUTs Logic Routing Shifters Slices 1 232 329 287 40 2 220 2 231 363 322 35 6 230

RESULTS

FOR

DA

WdbNDA lter banks 3 4 5 6 239 252 259 267 396 439 488 515 351 391 425 458 35 34 45 35 10 14 18 22 249 275 297 312

7 277 553 493 34 26 334

8 291 583 519 34 30 355

TABLE V: SUMMARY RESULTS FOR PARALLEL ARCHITECTURES


N Flip Flops Total LUTs - Logic - Route Slices 1 64 803 660 143 430 2 96 1604 1425 179 854 3 128 2082 1856 226 1114 WdbNP Filter Banks 4 5 6 164 192 224 2827 3292 3860 2520 2944 3486 307 348 374 1517 1771 2079 7 256 4240 3858 382 2290 8 288 5281 4836 445 2833

Fig. 6: Implemented wavelet module

The architecture based on DA provides a better relationship in the use of FPGA resources and is even possible to implement the 8 lter banks occupying less than 26 % of the slices available in the FPGA while the parallel architecture exceeds available resources. Simulations of testbenches generated are presented in g. 8, 9, 10 and 11 for high and low pass lters of db8 wavelet in DA and parallel architectures respectively. This test process a vector data with the characteristics of such signals within the input range and compare the outputs obtained by the lter in VHDL with the expected output generated by MATLAB.

Fig. 12 shows the simulation for the db1H lter in DA after processing 2056 words, the computing process is activated at 70 ns after the start of the simulation. The indicator recorded a time of 658.01 microseconds when getting the last word useful output. This is 3 samples after the last word given input latency of 3 samples of design. This is 657.94/2056 = 3.2001e 07 seconds/sample equivalent to 3124905 samples/sec. The timing results are identical to the lter db1L in g. 13. Tests with designed architectures provide similar results in the processing speed that can reach the module.

Fig. 12: Filter Simulation Speed db1H in DA Fig. 8: DAdb8 H module simulation.

Fig. 13: Filter Simulation Speed db1L in DA Fig. 14 shows a one-dimensional test signal of 1024 elements with two Sinc functions with different properties being processed by the FPGA and a function encoded in MATLAB such that compute the same function. The results are displayed in gure 15. In the case of a two-dimensional signal, as shown in g. 16a (a picture of 64x64 pixels), we have the results in g. 16b. All gures selected from db1 wavelet analysis lter bank tests. For three-dimensional signals, we develop a small user interface vis_3Dby2Dplanes (Fig. 17) that allows to process volumes of a series of 3D images in Analyze 5.7 format developed by Mayo Clinic, for magnetic resonance imaging data storing. IV. C ONCLUSION In this work, design and implementation of a 3D DWT based on a FPGA platform have been described. Daubechies wavelet analysis bank lters dbN with N = 1, 2, ..., 8 in fully parallel and DA architectures are compared. As can be seen in the simulations and results, the architecture based on DA obtains a higher efciency than the parallel architecture in terms of resource utilization. The results show that the Daubechies wavelet, one of the most widely used wavelet can be implemented efciently in an FPGA using fewer resources.

Fig. 9: DAdb8 L module simulation.

Fig. 10: Pdb8 H module simulation.

Fig. 11: Pdb8 L module simulation.

Fig. 14: Signal test.

Fig. 17: Viewing sub-volumes of wavelet coefcients in developed graphical user interface.

ACKNOWLEDGMENT The authors acknowledge the nancial support from the Mexican National Council for Science and Technology (CONACYT). R EFERENCES
[1] I. Daubechies, Ten Lectures on Wavelets (CBMS-NSF Regional Conference Series in Applied Mathematics). SIAM: Society for Industrial and Applied Mathematics, 1992. [2] P. S. Addison, The Illustrated Wavelet Transform Handbook - Introductory Theory and Applications in Science, Engineering, Medicine and Finance. Institute of Physics Publishing, 2002. [3] R. M. Rao and A. S. Bopardikar, Wavelet Transforms: Introduction to Theory & Applications. Prentice Hall PTR, 1998. [4] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate Arrays (Signals and Communication Technology). Springer, 2007. [5] S. Cai, K. Li, and I. Selesnick, Matlab implementation of wavelet transform, Electrical Engineering at Polytechnic University, NY, 2003. [6] P. Longa and A. Miri, Area-efcient r lter design on fpgas using distributed arithmetic, in Signal Processing and Information Technology, 2006 IEEE International Symposium on, pp. 248 252, 2006. [7] P. Longa, A. Miri, and M. Bolic, A exible design of lterbank architectures for discrete wavelet transforms, in Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on, vol. 3, pp. III1441 III1444, 2007. [8] M. Sharif, A. Sazish, and A. Amira, An efcient algorithm and architecture for medical image segmentation and tumour detection, in Biomedical Circuits and Systems Conference, 2008. BioCAS 2008. IEEE, pp. 157 160, 2008. [9] R. Jiang and D. Crookes, Fpga implementation of 3d discrete wavelet transform for real-time medical imaging, in Circuit Theory and Design, 2007. ECCTD 2007. 18th European Conference on, pp. 519 522, 2007. [10] S. Ismail, A. Salama, and M. Abu-ElYazeed, Fpga implementation of an efcient 3d-wt temporal decomposition algorithm for video compression, in Signal Processing and Information Technology, 2007 IEEE International Symposium on, pp. 154 159, 2007. [11] Y. Li, C. Peng, D. Yu, and X. Zhang, The implementation methods of high speed r lter on fpga, in Solid-State and Integrated-Circuit Technology, 2008. ICSICT 2008. 9th International Conference on, pp. 2216 2219, oct. 2008. [12] X.-W. Wang, W.-N. Chen, C.-L. Peng, and H. jun You, Hardwaresoftware monitoring techniques for dynamic partial recongurable embedded systems, in Embedded Software and Systems Symposia, 2008. ICESS Symposia 08. International Conference on, pp. 113 119, 2008.

Fig. 15: FPGA and MATLAB computing results for signal test of g. 14.

This expand the framework commonly used in the literature ([6], [7], [8], [9], [10]), which mainly implement one or two wavelet lters. In addition, resource utilization measured in this work extends other studies like [11] by consider words of 16 bits, which usually had worked with up to 8 bits. We have seen that the FPGA technology allows parallel computing, thus, further improvements also include more than one module for process two or more data channels simultaneously and improve in architecture using techniques as dynamic partial recongurable systems as proposed in [12].

(a)

(b)

Fig. 16: Image test and result coefcients from FPGA.

Das könnte Ihnen auch gefallen