Sie sind auf Seite 1von 4

3D Haar Wavelet Transform with Dynamic Partial

Reconfiguration for 3D Medical Image Compression

Afandi Ahmad, Benjamin Krill, Abbes Amira Hassan Rabah
Electronic and Computer Engineering Laboratoire d’Instrumentation
School of Engineering and Design Electronique de Nancy
Brunel University, West London, United Kingdom University Henri Poincare, France
Email: {Afandi.Ahmad, Benjamin.Krill, Abbes.Amira} Email:

Abstract— This paper describes the design and implementation Complexity in data addressing and accessing, massive
of 3D Haar wavelet transform (HWT) with transpose based amount of data to be processed and requirement of sev-
computation and dynamic partial reconfiguration (DPR). As a eral building blocks for its computationally intensive matrix
result of the separability property of the multi-dimensional HWT,
the proposed architecture has been implemented using a cascade transformation operations have resulted a big restriction for
of three N-point 1D HWT and two transpose memory for a 3D hardware implementation in 3D medical image compression.
volume of N ×N ×N , suitable for 3D medical image compression. FPGAs with dynamic partial reconfiguration (DPR) is a
The proposed 3D HWT architectures were implemented on Xilinx promising solution for reducing the hardware required for
Virtex-5 field programmable gate array (FPGA) using VHDL. an efficient design implementation as well as improving the
An in-depth performance analysis and comparison has shown
that DPR based implementation improves both speed and power performance, speed and power consumption of the 3D medical
consumption as well as reducing the hardware required for the image compression system.
system. Despite its complexity, there has recently been an interest
in 3D discrete wavelet transform (DWT) implementation on
I. I NTRODUCTION FPGAs. However, a survey of existing implementations and
architectures indicates that the research is still in its infancy
The nature of medical image processing applications in- as demonstrated by the limited contributions [4], [5].
volves performing complex tasks, mainly matrix transforms, With regards to DPR mechanism, it has been widely studied
repeatedly on a large set of volume data, often under real-time in various fields [6]-[10]. A significant contribution presented
requirements. As an example, the computational complexity in [11] with novel FPGA-based scalable architecture for dis-
for fast Fourier transform (FFT) and the recent developed crete cosine transform (DCT) using DPR and exhibits sig-
curvelet transform is in the order from O(N × logN ) to nificant results for partial reconfiguration process with better
O(N 2 × J) with N is the transform size and J is the saving of power consumption, reduce the processing clock
maximum transform resolution level, and hence are extremely cycles and the reconfiguration overhead. These achievements
computationally intensive for large medical volumes data [1]. motivate a strong justification to further explore the 3D HWT
In order to solve this issue, efficient implementation for implementation with DPR and evaluate their performance in
these operations are pertinent of important and lead to ef- terms of area, power consumption and maximum speed.
ficient solutions for three-dimensional (3D) medical image In this paper, the evaluation of the proposed architectures
compression. Higher compression ratios can be achieved using for 3D HWT with transpose based computation and DPR
multi-resolution analysis where the 3D wavelet transform is mechanism on FPGA that are suitable for 3D medical im-
widely applied due to its features of perfect reconstruction age compression is discussed. Comparative studies for both
property and lack of blocking artifacts. In this research, Haar architectures in terms of area, power consumption, maximum
wavelet transform (HWT) as the simplest of all wavelets has speed and the influence of the transform size on the hardware
been chosen as a result of the following features: conceptually performance are also presented. The structure of the paper
simple, fast, memory efficient, and it is exactly reversible is organised as follows. Section II presents the proposed
without the edge effects which are a problem with other architecture of 3D HWT with DPR mechanism. Experimental
wavelet transforms [2]. results, comparison and analysis are described in Section III.
Reconfigurable hardware, especially field programmable Section IV concludes this paper.
gate arrays (FPGAs) offers significant potential for the efficient
implementation of a wide range of computationally intensive II. P ROPOSED A RCHITECTURE FOR 3D HWT WITH DPR
signal and image processing algorithms and applications, from In this section, the proposed system architecture as depicted
simple low-resolution and low bandwidth (multimedia, picture in Fig. 1(a) to (e) is briefly explained, including the implemen-
phone) to very high-resolution and high-bandwidth (medical tation of 3D wavelet compression and decompression system,
imaging, HDTV) applications [3]. the computation process of 3D HWT with transpose based

978-1-4244-4918-7/09/$25.00 ©2009 IEEE 137

Fig. 1. Proposed system architecture framework. (a) and (b) Block diagrams of the 3D wavelet compression/decompression. (c) Computation of 3D HWT
coefficients using transpose based computation. (d) Proposed top level architecture for 3D HWT using DPR (e) Transpose module implementation without
DPR mechanism.

computation, top level architecture for 3D HWT with DPR storage resources, optimises system performance, and meets
and the transpose module implementation without DPR. the design goals.
A. 3D HWT and Transpose B. DPR System Architecture and Implementation
Computation of 3D HWT is performed as follows. The input There are two areas in the DPR framework: reconfigurable
to the first one-dimensional (1D) HWT is read row by row, and static. The reconfigurable areas have been used for 1D
and the 1D HWT is performed on each input vector as they HWT and different transposition modules, while the static
are provided. The calculated values are sent to the transpose area consists of the data fetch unit and the memory controller
module T1 which calculated the memory addresses for the (Wishbone compliant). Fig. 1(d) illustrates the details of the
transposition and stores the data into memory. The transpose working system for the implementation of 3D HWT with DPR.
T1 acts as a memory forwarder and performs matrix transpose, The DPR module connections are performed with simple
since row vectors are provided by the 1D HWT. bus interfaces. Data fetch unit and HWT DPR area are
After transposition of the resultant matrix, another 1D HWT connected with a defined data bit width bus, a request line
is performed on the coefficients which are stored in memory and back signal free. The fetch unit sends data and the request
to yield the two-dimensional (2D) HWT coefficients. This is to the HWT core as long the free signal is active. HWT and
the conventional row-column 2D HWT computation. The 2D transposition module are connected with the defined data bit
HWT computation is performed on each sub-image S0 to S7 width bus and an enable signal. Each cycle where the enable
for N = 8, where S0 is the first sub-image and S7 is the signal is active data will be transposed and written into the
eighth sub-image of the input volume. The output coefficients memory.
of the 2D DWT are sent to the second transpose, T2 . As The proposed system is implemented with the current partial
described before all coefficients are stored into memory also reconfiguration suites, ISE 9.2PR and PlanAhead 10.1 from
the transpositions of T2 are stored after transformation into Xilinx [12]. It uses the module based DPR where configuration
memory. frames are reconfigured and busmacros are used to connect
Instead of using the logic and other embedded resources for the DPR areas with the static area [13]. This methodology
the transpose implementation, optimisation of block random has the restriction that all design files and reconfigurable
access memory (BRAM) has been considered in this work. modules must be available to the build environment to build
This approach significantly improves utilisations of available partial modules. The main advantage of DPR is that an

implementation of a given design can be integrated into a
smaller FPGA. This reduces cost, package size and power.
Also power consumption and logic size can be reduced by
cascading calculation modules.
In the 3D HWT case, the transposition module and the
1D HWT module can be changed. The transposition module
will be changed during image calculation three times for
each sub-image. First transposition T1 performs the row to
column transposition which are active till a sub-image is
transposed. After the T1 sub-image transposition the DPR
area is reconfigured with the T2 transposition which saves
the sub-images and these operations will be repeated for all
sub-images. After all sub-images are computed and transposed
with T2 , the transposition DPR is reconfigured with the straight
transposition and the last 1D HWT is performed on all T2
sub-images. The HWT DPR area can be reconfigured to
switch between different transform sizes. The transform size
N dependency is propagated from the HWT module to all
connected modules, and offers the advantage that no other Fig. 2. Comparison of original and reconstructed CT, MRI and PET images
logic changes are necessary. for the first slices.
On the other hand, Fig. 1(e) illustrates the implementation
of transpose module without DPR with all modules have to
be combined and connected with multiplexer, hence lead to
higher area resources demand.
A. Medical Images Simulation Parameters Proposed 3D HWT
Fig. 2(a) to (i) illustrate the best quality and compression
comparison for the first medical volumes slices of original Without DPR With DPR
and the reconstructed slices for computerised tomography Area (Slices) 21,047 (30.45%) 20,779 (30.06%)
(CT), medical resonance imaging (MRI) and positron emission Power consumption (mW) 1964.14 1689.84
tomography (PET) images using 3D HWT in a medical im- Maximum frequency (MHz) 288.02 347.92
age compression system with context-based adaptive variable
length coding (CAVLC).
B. FPGA Implementation of a full bitstream and the configuration time is also reduced
Both architectures were implemented using VHDL on Xil- by 86.88%. In summary, by comparing the file sizes of the
inx University Program XUPV5-LX110T Development Sys- bitstreams, partial reconfiguration has more efficient bitstream
tem. This development platform comes with on-board memory, and as proven, smaller bitstream decreases the configuration
industry standard connectivity interfaces and equipped with time.
Table I lists the overall performance results for both pro- C. Discussions
posed architectures. The implementation of 3D HWT with In order to evaluate the relationship of the transform sizes
DPR mechanism provides significant results with better saving towards the area, power consumption and maximum speed,
of area and reduce the power consumption by 1.27% and there are four different transform sizes (N = 8, 16, 32, 64 and
13.96% respectively. In terms of maximum frequency, DPR 128) which have been used for the FPGA implementation.
mechanism yielding 17.216% better maximum frequency than Various transform sizes used are reflecting the various size of
without DPR. volumes data in 3D medical imaging.
Concerning the generated bitstreams files and configuration Influence of transform size on area, power consumption
times required, a full bitstream of 3,889,941 bytes is required and maximum frequency is depicted in Fig. 3. For ease
for 3D HWT configuration and the shortest configuration of visualisation, the graphs are plotted on a log scale to
time needed is also the worst at 4.8 ms. On the contrary, the base 10. Results indicate that the proposed 3D HWT
full partial bitstreams generated are significantly smaller and with transpose based computation requires more area, while
hence reducing the storage space required to store the various by using DPR mechanism the area saving can be achieved
bitstreams. The results show that the file size of transform size between 2.75% to 12.87%. In terms of power consumption,
(N = 64) for full partial bitstreams is reduced about 86.95% non-partial reconfiguration consumes up to 1377.96 mW for N

Fig. 3. Influence of transform size on area, power consumption and maximum

= 64 and it saves by 4.20% to 18.81% by performing partial

Moreover, in order to visualise the impact of non-partial
and partial reconfiguration for the proposed architecture, chip
layouts on different FPGA devices of Virtex-5 are shown
in Fig. 4. With DPR mechanism, the area for static and
reconfigurable area can be specified and it can be clearly seen Fig. 4. Comparison of chip layout for different Virtex-5 devices for N = 64.
in the layouts generated.
Comparative study for both non-partial and partial reconfig-
uration processes shows an important conclusion concerning [4] M. Jiang and D. Crookes, “Area-Efficient High-Speed 3D DWT Proces-
sor Architecture”, Electronics Letter, vol. 43, pp. 502–503, 2007.
the advantages offered by DPR especially in processing large [5] M. Jiang and D. Crookes, “FPGA Implementation of 3D Discrete
medical volumes. Analysis for the performance achieved for Wavelet Transform for Real-time Medical Imaging”, in Proc. 18th Euro-
different parameters such as area utilised, power consumed and pean Conf. on Circuit Theory and Design (ECCTD 2007), Seville, Spain,
pp. 519–522, 2007.
maximum frequency achieved clearly reveals that with DPR, [6] M. Majer, J. Teich, A. Ahmadinia and C. Bobda, “The Erlangen Slot Ma-
complex designs can be implemented on limited hardware chine: A Dynamically Reconfigurable FPGA-based Computer”, Journal
resources and hence lead to better performance achievements. of VLSI Signal Processing, vol. 47, pp. 15–31, 2007.
[7] C. Claus, J. Zeppenfeld, F. Muller and W. Stechele, “Using Partial-Run-
IV. C ONCLUSIONS Time Reconfigurable Hardware to Accelerate Video Processing in Driver
Assistance System”, in Proc. Conference Design, Automation, Test and
Two architectures for 3D HWT have been proposed in this Exhibition in Europe (DATE ’07), Nice, France, pp. 1–6, 2007.
paper based on transpose computation and partial reconfigura- [8] L. Braun, K. Paulsson, H. Kromer, M. Hubner and J. Becker, “Data
Path Driven Waveform-like Reconfiguration”, in Proc. International Con-
tion. Comparative study for both non-partial and partial recon- ference on Field Programmable Logic and Applications (FPL 2008),
figuration processes shows interesting conclusions concerning Heidelberg, Germany, pp. 607–610, 2008.
the advantages offered by DPR and lead to a promising solu- [9] A. Shoa and S. Shirani, “Run-Time Reconfigurable Systems for Digital
Signal Processing Applications: A Survey”, Journal of VLSI Signal
tion for implementing computationally intensive applications Processing, vol. 39, pp. 213–235, 2005.
such as 3D medical image compression. Using DPR, several [10] P. Manet, D. Maufroid, L. Tosi, G. Gailliard, O. Mulertt, M. D. Ciano,
large systems are mapped to small hardware resources and J. -D. Legat, D. Aulagnier, C. Gamrat, R. Liberati, V. L. Barba, P.
Cuvelier, B. Rousseau and P. Bertrand, “An Evaluation of Dynamic
the area, power and maximum frequency are optimised and Partial Reconfiguration for Signal and Image Processing in Professional
improved. Electronics Applications”, EURASIP J. Embedded Syst., vol.2008, pp.
1–11, 2008.
R EFERENCES [11] J. Huang, M. Parris, J. Lee and R. F. DeMara, “Scalable FPGA-based
[1] I. S. Uzun., “Design and FPGA Implementation of Matrix Transforms for Architecture for DCT Computation Using Dynamic Partial Reconfigura-
Image and Video Processing”, PhD Thesis, School of Computer Science, tion”, ACM Trans. on Embedded Comput. Syst., vol. V, pp. 1–18, 2008.
The Queen’s University of Belfast 2006. [12] Xilinx INC v2.1, “Partial Reconfiguration Design with PlanAhead”,
[2] A. Khashman and K. Dimililer, “Image Compression using Neural 2008.
Networks and Haar Wavelet”, WSEAS Trans. Sig. Proc., vol. 4, pp. 330– [13] Lysaght, P. and Blodget, B. and Mason, J. and Young, J. and Bridgford,
339, 2008. B., “Invited Paper: Enhanced Architectures, Design Methodologies and
[3] A. Ahmad, K. K. Loo and J. Cosmas, “VLSI Architecture Design CAD Tools for Dynamic Reconfiguration of Xilinx FPGAs”, in Field
Approaches for Real-time Video Processing”, WSEAS Trans. Cir. and Programmable Logic and Applications, 2006. FPL ’06. International
Sys., vol. 7, pp. 855–868, 2008. Conference on, Madrid, Spain, pp. 1–6, 2006.