Sie sind auf Seite 1von 90

Modified DA based DWT-IDWT on FPGA for Image Compression

CHAPTER 1

INTRODUCTION

Dept of E&C, Sir MVIT, Bengaluru

Page 1

Modified DA based DWT-IDWT on FPGA for Image Compression

1.1 IMAGE
An image (from Latin imago) is an artifact, for example a two-dimensional picture, that has a similar appearance to some subjectusually a physical object or a person Images may be two-dimensional, such as a photograph, screen display, and as well as a three-dimensional, such as a statue. They may be captured by optical devices such as cameras, mirrors, lenses, telescopes, microscopes, etc. and natural objects and phenomena, such as the human eye or water surfaces. The word image is also used in the broader sense of any two-dimensional figure such as a map, a graph, a pie chart, or an abstract painting. In this wider sense, images can also be rendered manually, such as by drawing, painting, carving, rendered automatically by printing or computer graphics technology, or developed by a combination of methods, especially in a pseudo-photograph. A volatile image is one that exists only for a short period of time. This may be a reflection of an object by a mirror, a projection of a camera obscura, or a scene displayed on a cathode ray tube. A fixed image, also called a hard copy, is one that has been recorded on a material object, such as paper or textile by photography or digital processes[1].

1.1.1 STILL IMAGE


A still image is a single static image, as distinguished from a moving image (see below). This phrase is used in photography, visual media and the computer industry to emphasize that one is not talking about movies, or in very precise or pedantic technical writing such as a standard. A film still is a photograph taken on the set of a movie or television program during production, used for promotional purposes.

Dept of E&C, Sir MVIT, Bengaluru

Page 2

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 1.1 : Still Image.

1.1.2 MOVING IMAGE


A moving image is typically a movie (film), or video, including digital video. It could also be an animated display such as a zoetrope.

1.1.3 IMAGE FILE SIZE


Image file sizeexpressed as the number of bytesincreases with the number of pixels composing an image, and the colour depth of the pixels. The greater the number of rows and columns, the greater the image resolution, and the larger the file. Also, each pixel of an image increases in size when its colour depth increasesan 8-bit pixel (1 byte) stores 256 colors, a 24-bit pixel (3 bytes) stores 16 million colaors, the latter known as true color[1].

1.2 IMAGE COMPRESSION


Image compression, the art science of reducing the amount of data required to representation image, is one of the most useful and commercially successful technologies in tke field of digital image processing. The number of images that are compressed and decompressed daily is staggering, and the compressions and decompressions are virtually invisible to the user. Anyone who owns a digital camera, surfs the web, or watches the latest Hollywood movies on digital video disks(dvds) benefits from the algorithms and standards discussed in this section[2]. Compression is basically of two types: Lossy Compression Lossless Compression. Dept of E&C, Sir MVIT, Bengaluru Page 3

Modified DA based DWT-IDWT on FPGA for Image Compression Lossy compression of data concedes a certain loss of accuracy in exchange for greatly increased compression. An image reconstructed following lossy compression contains degradation relative to the original. Often this is because the compression scheme completely discards redundant information. Under normal viewing conditions no visible is loss is perceived. It proves effective when applied to graphics images and digitized voice. Lossless compression consists of those techniques guaranteed to generate an exact duplicate of the input data stream after a compress or expand cycle. Here the reconstructed image after compression is numerically identical to the original image. Lossless compression can only achieve a modest amount of compression. This is the type of compression used when storing data base records, spread sheets or word processing files[2].

1.2.1 NEED FOR THE COMPRESSION


To better understand the need for compact image representations, consider the amount of data required to represent a two-hour standard definition(SD) television movie using 720*480*24 bit pixel arrays. A digital movie (or video) is a sequence of video frames in which each is a full-color still image. Because video players must display the frames sequentially at rates near 30 fps (frames per second),SD digital video data must be accessed at (30 frames/sec)*(720*480pixels/frame)*(3 bytes/pixel)=31,104,000bytes/sec And a two-hour movie consists of (31,104,000 bytes/sec)*(3600 sec/hour)*(2 hours)=2.24*10^11 bytes Or 224 GB (giga bytes) of data. Twenty seven 8.5 GB dual layer DVDs (assuming conventional 12 cm disks) are needed to store it. To put a two-hour movie on a single DVD, each frame must be compressed-on average by a factor of 26.3.The compression must be even higher for High Definition(HD)television. where image resolutions reach 1929*1080*24 bit/image[1].

Dept of E&C, Sir MVIT, Bengaluru

Page 4

Modified DA based DWT-IDWT on FPGA for Image Compression

1.3 OVERVIEW

Figure 1.2 Block Diagram

1.3.1 Experimental Setup

Figure 1.3: Experimental Setup

Dept of E&C, Sir MVIT, Bengaluru

Page 5

Modified DA based DWT-IDWT on FPGA for Image Compression

1.3.2 RESOURCES USED: Xilinx IST Matlab Virtex 2 pro FPGA Development Kit Desktop PC Interfacing Model

1.3.3 OBJECTIVE
1) To carry out literature survey on a) Image and Image Compression b) Need for Compression c) JPEG Standard d) DWT-IDWT e) DA Arithmetic f) Real Time Setup for Image Compression 2) To develop system level block diagram for Image Compression and DWT-IDWT processor 3) To develop software reference level for Image Compression and analyse the results for multiple test images 4) To design and implement DA DWT-IDWT processor and analyze its performance w.r.t area, time and power on FPGA 5) To design Modified DA DWT-IDWT processor and analyses its performance

6) To implement the proposed architecture on FPGA and verify the results in real time
experimental setup

1.4 APPLICATIONS:
Although the Fourier transform has been the mainstay of transform-based digital signal processing since time immemorial, a more recent transformation, called the wavelet transform, is making strides in DSP applications following some of its unique advantages. Wavelets have their energy concentrated in time. Sinusoids (Fourier Transform) are useful in analyzing periodic and time-invariant phenomena, while wavelets are well suited for the analysis of transient, time-varying signals. Since most of the real-life Dept of E&C, Sir MVIT, Bengaluru Page 6

Modified DA based DWT-IDWT on FPGA for Image Compression signals encountered are time varying in nature, the Wavelet Transform suits very well for many applications[4].

1.4.1 Wavelets in Audio


DWT can be used to analyze temporal and spectral properties of non-stationary signals such as audio. Unlike the Fourier transform, whose basic functions are sinusoids, wavelet transforms are based on small waves, called wavelets, of varying frequency and limited duration. That reveals not only what notes (or frequencies) to play but also when to play them. Conventional Fourier transforms, on the other hand, provide only the notes or frequency information; temporal information is lost in transformation process. Some of audio applications where DWT could offer considerable improvement are extraction of beat attributes from music signals and automatic classification of nonspeech audio signal using statistical pattern recognition. Shrinking of transform coefficients towards zero in wavelet domain is one of the wavelet techniques, which offers advantage of removal of noise in wide variety of signal types while preserving nonsmooth features.

1.4.2 Wavelets in Video


Wavelet basis functions are obtained from single wavelet by transformation and scaling of mother wavelets. Also, multi-resolution concept, satisfied by almost all useful wavelet functions, makes it very useful in analyzing real world signals. Multi-resolution theory is concerned with the representation and analysis of signals at more than one resolution. The multi-resolution of videos has an advantage of scalability. i.e. possibility to transmit the same sequence at different resolution as highresolution television, videophone and videoconferencing. DWT offers better

approximation at half the width and half as wide translation steps. This is conceptually similar to improving frequency resolution by doubling the number of harmonics in Fourier series expansion. While DCT-based image coders like JPEG perform very well at moderate bit rates, at low bit rates the image quality degrades rapidly because of the blocking artifacts introduced by the block based DCT transform. JPEG-2000 is an emerging standard in Dept of E&C, Sir MVIT, Bengaluru Page 7

Modified DA based DWT-IDWT on FPGA for Image Compression image processing that uses DWT to achieve far superior image quality at very low bit rates because of overlapping basis functions and better energy compaction property of wavelet transformation.

1.4.3 Wavelets in Wireless applications


The analysis, design and measurement of antennas have been extremely important in the development and success of wireless communication and applications. Unfortunately mathematical simulations of antenna are extremely complex and require extensive computation and large amount of memory. Use of wavelets in conjunction with other techniques in the numerical methods involved in solving the current distribution on the antenna offers many advantages. The use of wavelets in such simulations propose reduction in computation, aids in reducing errors as well as enables us to get closer to the true values of such computation. With the recent developments in wireless communication technologies, video streaming and the image compression techniques are very important for wireless application to transmit multimedia content over wireless channels. As wireless channels are very noisy and have narrow bandwidth, higher compression is required for both image and video signals, use of wavelet transform as image compression technique in wireless applications could be a good choice because of its advantage of providing better compression at higher bit rates.

1.4.4 Wavelets in Neural Networks


Neural Networks (NN) have emerged as a powerful tool for data mining applications due to their ability to learn patterns and relationships in complex, multidimensional data sets. The effectiveness of any NN-based solution is largely dependent on a range of factors such as scalability of the network, generalization capability, dimensionality of the parameter space and host of other factors and often restrict the effectiveness of the NN. As such, any methods, which are able to increase the quality or accessibility of the input data, will be invaluable. It is here that wavelets are likely to be extremely useful. NNs are useful in conjunction with wavelets, with the latter serving as a preprocessing tool that transforms hidden patterns into a more recognizable form suitable for use as a training set Dept of E&C, Sir MVIT, Bengaluru Page 8

Modified DA based DWT-IDWT on FPGA for Image Compression

CHAPTER 2

IMAGE COMPRESSION STANDARD

Dept of E&C, Sir MVIT, Bengaluru

Page 9

Modified DA based DWT-IDWT on FPGA for Image Compression

2.1 NEED FOR A COMPRESSION STANDARD


With the rapid developments of imaging technology, image compression and coding tools and techniques, it is necessary to evolve coding standards so that there is compatibility and interoperability between the image communication and storage products manufactured by different vendors. Without the availability of standards, encoders and decoders can not communicate with each other; Most commonly used standards are JPEG and JPEG2000[3].

2.2 JPEG
The aim of JPEG compression is to take full-color (and gray-scale) "real-world" scenes and reduce the file size of images for storage and transmission. While capacity and bandwidth have improved dramatically over the last decade, the increased size of images makes JPEG still relevant for digital cameras users and websites. This standard doesn't define exactly how to implement this process, but is sufficiently wide that images from any program can be viewed. The most common version in use is that produced by the Independent JPEG Group or IJG[3].

2.2.1 Need for JPEG


To make your image files smaller, and to store 24-bit-per-pixel color data instead of 8-bit-per-pixel data. Advantage of JPEG is that it stores full color

information:24bits/pixel

2.2.2 JPEG STANDARD


In computing, JPEG (named after the Joint Photographic Experts Group who created the standard) is a commonly used method of lossy compression for photographic images. The degree of compression can be adjusted, allowing a selectable trade off between storage size and image quality. JPEG (.jpeg, .jfif, .jpg and .jpe) is a standard image compression format developed by and named after the Joint Photographic Experts Group. Dept of E&C, Sir MVIT, Bengaluru Page 10

Modified DA based DWT-IDWT on FPGA for Image Compression It is one of the two most common formats for storing and sending images on the Web. JPEG images are full-color images, meaning they are capable of storing 24 bits-perpixel and using 16 million colors[3]. JPEG is best for compressing full-color or gray-scale images, including photographs and graphic images. The JPEG format is unique in the aspect that images are compressed based on the human eye. Because the human eye does not pick up subtle color distinctions and high frequency brightness variations, data can be removed without completely changing the image. However, as this data is removed the quality of the image decreases. This is the reason JPEG compression is considered lossy. Edges in a typical JPEG image - split by red, green and blue channels

Figure 2.1:Image describing JPEG standard. As with all image compression formats, JPEG has both its advantages and disadvantages:

2.2.3 ADVANTAGES OF JPEG


Large compression ratios = shorter file transfer time Full-color information Great for photographs, graphic artwork, banner ads, etc

2.2.4 DISADVANTAGES OF JPEG


Loss of image quality Sharp edges tend to come out blurry Longer page load time than the GIF Format

Dept of E&C, Sir MVIT, Bengaluru

Page 11

Modified DA based DWT-IDWT on FPGA for Image Compression


JPEG uses a lossy compression algorithm so you will lose some detail when

converting other formats like BMP to a JPEG


If you have an illustrated image or a vector image, don't use JPEG because the

edges of lines may get blurred.

2.2.5 EMERGENCE OF A JPEG 2000


JPEG 2000 addresses most of the problems: The biggest problem is that JPEGs are lossy - when an image is converted to JPEG, some of the information in the image is lost. Professional photographers tend to avoid working repeatedly with JPEG images as continually loading and saving the image causes the image to lose quality. JPEGs don't support layers - most photo manipulation software use layers; to save images as JPEGs the image has to be "flattened". JPEGs only support 8 bit images. Modern digital cameras can operate in 12, 14 or 16 bit mode but if the images are saved as JPEGs, the extra information is discarded

2.3 JPEG 2000


The JPEG-2000 image compression system has a rate-distortion advantage over the original JPEG.JPEG-2000 is an emerging standard for still image compression. As digital imagery becomes more common place and of higher quality, there is the need to manipulate more and more data Thus, image compression must not only reduce the necessary storage and bandwidth requirements, but also allow extraction for editing, processing, and targeting particular devices and applications. More importantly, it also allows extraction of different resolutions, pixel fidelities, and regions of interest, components, and more, all from a single compressed bit stream. This allows an application to manipulate or transmit only the essential information for any target device from any JPEG 2000 compressed source image.

Dept of E&C, Sir MVIT, Bengaluru

Page 12

Modified DA based DWT-IDWT on FPGA for Image Compression 2.3.1 FEATURES OF JPEG-2000 State-of-the-art low bit-rate compression performance Progressive transmission by quality, resolution, component, or spatial Locality. Lossy and lossless compression (with lossless decompression available Naturally through all types of progression) Random (spatial) access to the bit stream Pan and zoom (with decompression of only a subset of the compressed data) Compressed domain processing (e.g., rotation and cropping) Region of interest coding by progression Limited memory implementations. The aims of JPEG 2000 are not only improved compression performance over JPEG but also adding (or improving) features such as scalability and edit ability.Very low and very high compression rates are supported in JPEG 2000. In fact, the graceful ability of the design to handle a very large range of effective bit rates is one of the strengths of JPEG 2000. While there is a modest increase in compression performance of JPEG 2000 compared to JPEG, the main advantage offered by JPEG 2000 is the significant flexibility of the code stream[3].

Figure 2.2 : COMPRESSION (ENCODING AND DECODING) Conventional methods of lossless compression such as Zip reversibly reduce file sizes while preserving information by compacting regularities in the data. Jpeg compression goes one step further, by organizing regularities in the visual perception of an image and using lossy compression to reduce the file size of the image. This process involves a small but irreversible loss of quality as discussed in the errors below.

Dept of E&C, Sir MVIT, Bengaluru

Page 13

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 2.3: Edges in a typical image - zoomed in to see the pixels. After compression most of the edges are still present, with some artifacts The main steps are as follows (some require heavy maths) Standard color space is 256 levels of Red, Green, Blue (16.7 million RGB colors) Color space separation (YCbCr) from RGB e.g. Y (luminance) = 0.299 * R + 0.587 * G + 0.114 *B Spatial separation into 8X8 pixels blocks Sub-sampling (if required) of chroma and Cr (colors) in 16X16 pixel blocks Discrete Cosine Function (DCF) of the spatial frequencies in each 8X8 block Quantization of the spatial frequency matrix Lossless compression of the resulting matrix For illustrative purposes large images are not needed, since the entire JPEG compression takes place inside 8X8 (or 16X16) pixel blocks. Note that a JPEG cannot be compressed further using Zip or any other process of lossless compression, since this is already done as the last step of the JPEG encoding. Note the predominance of green and blue pixels, with few red pixels The green channel is closest to what the eye sees, with blue having next most artifacts Decoding an image from a JPEG is the reverse of this process, and does not need elaboration here.

Dept of E&C, Sir MVIT, Bengaluru

Page 14

Modified DA based DWT-IDWT on FPGA for Image Compression

2.4 Implications
JPEG-2000 is unlikely to replace JPEG in low complexity applications at bit rates in the range where JPEG performs well. However, for applications requiring either higher quality or lower bitrates, or any of the features provided, JPEG-2000 should be a welcome standard. JPEG-2000 provides better rate-distortion performance, for any given rate, than The original JPEG standard. However, the largest improvements are observed at very high and very low bitrates. The improvements in the near visually lossless realm are more modest (approximately 20%). Thus, widespread adoption of the new standard will likely be based on the JPEG-2000 feature set. While JPEG provided different methods of generating progressive bit streams, with JPEG-2000 the progression is simply a matter of the order the compressed bytes are stored in a file.

Dept of E&C, Sir MVIT, Bengaluru

Page 15

Modified DA based DWT-IDWT on FPGA for Image Compression

CHAPTER 3

DISCRETE WAVELET TRANSFORM

Dept of E&C, Sir MVIT, Bengaluru

Page 16

Modified DA based DWT-IDWT on FPGA for Image Compression

3.1 INTRODUCTION
The transform of a signal is just another form of representing the signal. It does not change the information content present in the signal. The Wavelet Transform provides a time-frequency representation of the signal. It was developed to overcome the short coming of the Short Time Fourier Transform (STFT), which can also be used to analyze non-stationary signals. While STFT gives a constant resolution at all frequencies, the Wavelet Transform uses multi-resolution technique by which different frequencies are analyzed with different resolutions. A wave is an oscillating function of time or space and is periodic. In contrast, wavelets are localized waves. They have their energy concentrated in time or space and are suited to analysis of transient signals. While Fourier Transform and STFT use waves to analyze signals, the Wavelet Transform uses wavelets of finite energy.

Figure3.1 Demonstration of (a) a Wave and (b) a Wavelet The wavelet analysis is done similar to the STFT analysis. The signal to be analyzed is multiplied with a wavelet function just as it is multiplied with a window function in STFT, and then the transform is computed for each segment generated[4].

However, unlike STFT, in Wavelet Transform, the width of the wavelet function changes with each spectral component. The Wavelet Transform, at high frequencies, gives good time resolution and poor frequency resolution, while at low frequencies, the Wavelet Transform gives good frequency resolution and poor time resolution.

Dept of E&C, Sir MVIT, Bengaluru

Page 17

Modified DA based DWT-IDWT on FPGA for Image Compression

3.2

CONTINUOUS WAVELET TRANSFORM AND WAVELET SERIES


The Continuous Wavelet Transform (CWT) is provided by equation 2.1, where

x(t) is the signal to be analyzed. (t) is the mother wavelet or the basis function. All the wavelet functions used in the transformation are derived from the mother wavelet through translation (shifting) and scaling (dilation or compression).

............(3.1) The mother wavelet used to generate all the basis functions is designed based on some desired characteristics associated with that function. The translation parameter relates to the location of the wavelet function as it is shifted through the signal. Thus, it corresponds to the time information in the Wavelet Transform. The scale parameter s is defined as |1/frequency| and corresponds to frequency information. Scaling either dilates (expands) or compresses a signal. Large scales (low frequencies) dilate the signal and provide detailed information hidden in the signal, while small scales (high frequencies) compress the signal and provide global information about the signal. Notice that the Wavelet Transform merely performs the convolution operation of the signal and the basis function. The above analysis becomes very useful as in most practical applications, high frequencies (low scales) do not last for a long duration, but instead, appear as short bursts, while low frequencies (high scales) usually last for entire duration of the signal. The Wavelet Series is obtained by discretizing CWT. This aids in computation of CWT using computers and is obtained by sampling the time-scale plane. The sampling rate can be changed accordingly with scale change without violating the Nyquist criterion. Nyquist criterion states that, the minimum sampling rate that allows reconstruction of the original signal is 2 radians, where is the highest frequency in the signal. Therefore, as the scale goes higher (lower frequencies), the sampling rate can be decreased thus reducing the number of computations[4].

Dept of E&C, Sir MVIT, Bengaluru

Page 18

Modified DA based DWT-IDWT on FPGA for Image Compression

3.3 DWT
The Wavelet Series is just a sampled version of CWT and its computation may consume significant amount of time and resources, depending on the resolution required. The Discrete Wavelet Transform (DWT), which is based on sub-band coding is found to yield a fast computation of Wavelet Transform. It is easy to implement and reduces the computation time and resources required.

The foundations of DWT go back to 1976 when techniques to decompose discrete time signals were devised . Similar work was done in speech signal coding which was named as sub-band coding. In 1983, a technique similar to sub-band coding was developed which was named pyramidal coding. Later many improvements were made to these coding schemes which resulted in efficient multi-resolution analysis schemes.

In CWT, the signals are analyzed using a set of basis functions which relate to each other by simple scaling and translation. In the case of DWT, a time-scale representation of the digital signal is obtained using digital filtering techniques. The signal to be analyzed is passed through filters with different cutoff frequencies at different scales.

3.4 Filter Banks


3.4.1 Multi-Resolution Analysis using Filter Banks
Filters are one of the most widely used signal processing functions. Wavelets can be realized by iteration of filters with rescaling. The resolution of the signal, which is a measure of the amount of detail information in the signal, is determined by the filtering operations, and the scale is determined by upsampling and downsampling (subsampling) operations.

The DWT is computed by successive lowpass and highpass filtering of the discrete time-domain signal as shown in figure 2.2. This is called the Mallat algorithm or Mallat-tree decomposition. Its significance is in the manner it connects the continuousDept of E&C, Sir MVIT, Bengaluru Page 19

Modified DA based DWT-IDWT on FPGA for Image Compression time mutiresolution to discrete-time filters. In the figure, the signal is denoted by the sequence x[n], where n is an integer. The low pass filter is denoted by G while the high
0

pass filter is denoted by H . At each level, the high pass filter produces detail information,
0

d[n], while the low pass filter associated with scaling function produces coarse approximations, a[n].

Figure 3.2: Three level decomposition tree At each decomposition level, the half band filters produce signals spanning only half the frequency band. This doubles the frequency resolution as the uncertainity in frequency is reduced by half. In accordance with Nyquists rule if the original signal has a highest frequency of , which requires a sampling frequency of 2 radians, then it now has a highest frequency of /2 radians. It can now be sampled at a frequency of radians thus discarding half the samples with no loss of information. This decimation by 2 halves the time resolution as the entire signal is now represented by only half the number of samples. Thus, while the half band low pass filtering removes half of the frequencies and thus halves the resolution, the decimation by 2 doubles the scale.

With this approach, the time resolution becomes arbitrarily good at high frequencies, while the frequency resolution becomes arbitrarily good at low frequencies. The time-frequency plane is thus resolved as shown in figure 1.1(d) of Chapter 1. The filtering and decimation process is continued until the desired level is reached. The maximum number of levels depends on the length of the signal. The DWT of the original signal is then obtained by concatenating all the coefficients, a[n] and d[n], starting from the last level of decomposition.

Dept of E&C, Sir MVIT, Bengaluru

Page 20

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 3.3: Three level reconstruction tree Figure 3.3 shows the reconstruction of the original signal from the wavelet coefficients. Basically, the reconstruction is the reverse process of decomposition. The approximation and detail coefficients at every level are up-sampled by two, passed through the low pass and high pass synthesis filters and then added. This process is continued through the same number of levels as in the decomposition process to obtain the original signal. The Mallat algorithm works equally well if the analysis filters, G and H , are exchanged with the synthesis filters,
0 0

G .
11

3.4.2 Conditions for Perfect Reconstruction


In most Wavelet Transform applications, it is required that the original signal be synthesized from the wavelet coefficients. To achieve perfect reconstruction the analysis and synthesis filters have to satisfy certain conditions. Let G (z) and G (z) be the low pass
0 1

analysis and synthesis filters, respectively and H (z) and H (z) the high pass analysis and
0 1

synthesis filters respectively. Then the filters have to satisfy the following two conditions as given in equation :

G (-z) =G (z) + H (-z). H (z) = 0


0 1 0 1 -d

(3.2) (3.3)

G (z) =G (z) + H (z). H (z) = 2z


0 1 0 1

The first condition implies that the reconstruction is aliasing-free and the second condition implies that the amplitude distortion has amplitude of one. It can be observed that the perfect reconstruction condition does not change if we switch the analysis and synthesis filters.

Dept of E&C, Sir MVIT, Bengaluru

Page 21

Modified DA based DWT-IDWT on FPGA for Image Compression There are a number of filters which satisfy these conditions. But not all of them give accurate Wavelet Transforms, especially when the filter coefficients are quantized. The accuracy of the Wavelet Transform can be determined after reconstruction by calculating the Signal to Noise Ratio (SNR) of the signal. Some applications like pattern recognition do not need reconstruction, and in such applications, the above conditions need not apply.

3.4.3 Classification of Wavelets


We can classify wavelets into two classes: (a) orthogonal and (b) biorthogonal. Based on the application, either of them can be used.

(a)Features of orthogonal wavelet filter banks The coefficients of orthogonal filters are real numbers. The filters are of the same length and are not symmetric. The low pass filter, G and the high pass filter, H are
0 0

related to each other by


-N -1

H (z) = z
0

G (-z )
0

(3.3)

The two filters are alternated flip of each other. The alternating flip automatically gives double-shift orthogonality between the lowpass and highpass filters, i.e., the scalar product of the filters, for a shift by two is zero. i.e., G[k] H[k-2l] = 0, where k,lZ . Filters that satisfy equation are known as Conjugate Mirror Filters (CMF). Perfect reconstruction is possible with alternating flip. Also, for perfect reconstruction, the synthesis filters are identical to the analysis filters except for a time reversal. Orthogonal filters offer a high number of vanishing moments. This property is useful in many signal and image processing applications. They have regular structure which leads to easy implementation and scalable architecture.

(b)Features of biorthogonal wavelet filter banks In the case of the biorthogonal wavelet filters, the low pass and the high pass filters do not have the same length. The low pass filter is always symmetric, while the high pass filter could be either symmetric or anti-symmetric. The coefficients of the filters are either real numbers or integers. For perfect reconstruction, biorthogonal filter bank has all odd length or all even length filters. The two analysis filters can be symmetric with odd length or one symmetric and the other antisymmetric with even length. Also, the two sets of analysis and synthesis Dept of E&C, Sir MVIT, Bengaluru Page 22

Modified DA based DWT-IDWT on FPGA for Image Compression filters must be dual. The linear phase biorthogonal filters are the most popular filters for data compression applications.

3.5 Wavelet Families


There are a number of basis functions that can be used as the mother wavelet for Wavelet Transformation. Since the mother wavelet produces all wavelet functions used in the transformation through translation and scaling, it determines the characteristics of the resulting Wavelet Transform. Therefore, the details of the particular application should be taken into account and the appropriate mother wavelet should be chosen in order to use the Wavelet Transform effectively[4].

[7]

[7]

[8]

[9]

[e]

[f]

[ g]
Figure 3.4 Wavelet families (a) Haar (b) Daubechies4 (c) Coiflet1 (d) Symlet2 (e) Meyer (f) Morlet (g) Mexican Hat.

Dept of E&C, Sir MVIT, Bengaluru

Page 23

Modified DA based DWT-IDWT on FPGA for Image Compression

CHAPTER 4

Overview of DWT Algorithm and DA for DWT

Dept of E&C, Sir MVIT, Bengaluru

Page 24

Modified DA based DWT-IDWT on FPGA for Image Compression

4.1 DWT of an image


A low pass filter and a high pass filter are chosen, such that they exactly Halve the frequency range between themselves. The filter pass is called the analysis filter pair. First the low pass filter is applied for each row of data, thereby getting the low frequency components of the row. But since the low pass filter is a half band filter, the output data contains frequencies only in the first half of the original frequency range. So they can be sub sampled by two, so that the output data now contains only half the original number of samples. Now the high pass filter is applied for the same row of data, and similarly the high pass components are separated and placed by the side of the low pass components. This procedure is done for all rows. Next, the filtering is done for each column of the intermediate data. The resulting two dimensional array of coefficients contains four bands of data, each labeled as LL(low- Low), HL (high-low), LH (Low-High) and HH (High-High). The LL band can be decomposed once again in the same manner, thereby producing even more sub bands. This can be done up to any level, thereby resulting in a pyramidal decomposition as shown.

The LL band at the highest level can be classified as most important and the other detail bands can be classified as of lesser importance, with the degree of importance decreasing from the top of the pyramid to the bands at the bottom[5].

Dept of E&C, Sir MVIT, Bengaluru

Page 25

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 4.1:Image encoding.

4.2 INVERSE DWT OF AN IMAGE.


Just as a forward transform is used to separate the image data into various classes of importance a reverse transform is used to reassemble the various classes of data into a reconstructed image. A pair of high pass and low pass filters is used here also. Then filter pair is called the synthesis filter pair. The filtering procedure is just the opposite. We start from the topmost level, apply the filters column wise first and then row wise and proceed to the next level, till we reach the first level.

In this section the theoretical background and algorithm development is discussed. The first recorded mention of what is now called a "wavelet" seems to be in 1909, in a thesis by Alfred Haar. An image is represented as a two dimensional (2D) array of coefficients, each coefficient representing the brightness level in that point[5].

When looking from a higher perspective, it is not possible to differentiate between coefficients as more important ones, and lesser important ones. But thinking more intuitively, it is possible. Most natural images have smooth color variations, with the fine details being represented as sharp edges in between the smooth variations.

Technically, the smooth variations in color can be termed as low frequency components and the sharp variations as high frequency components. The low frequency components (smooth variations) constitute the base of an image, and the high frequency Dept of E&C, Sir MVIT, Bengaluru Page 26

Modified DA based DWT-IDWT on FPGA for Image Compression components (the edges which give the detail) add upon them to refine the image, thereby giving a detailed image. Hence the averages/smooth variations are demanding more importance than the details.

In wavelet analysis, A signal can be separated into approximations or averages and detail or coefficients. Averages are the high-scale, low frequency components of the signal. The details are the low scale, high frequency components. If we perform forward transform on a real digital signal, we wind up with twice as much data as we started with. Thats why after filtering down sampling has to be done.

The inverse process is how those components can be assembled back into the original signal without loss of information. This process is called reconstruction or synthesis. The mathematical manipulation that affects synthesis is called the inverse discrete wavelet transform. The original signal, is reconstructed from the wavelet coefficients. Where wavelet analysis involves filtering and down sampling, the wavelet reconstruction process consists of up sampling and filtering. The DWT algorithm consists of Forward DWT (FDWT) and Inverse DWT (IDWT) which are shown in fig.4.2 respectively.

Dept of E&C, Sir MVIT, Bengaluru

Page 27

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 4.2:Two dimensional decomposition.

Figure 4.3:Two Dimensional IDWT

Dept of E&C, Sir MVIT, Bengaluru

Page 28

Modified DA based DWT-IDWT on FPGA for Image Compression The FDWT can be performed on a signal using different types of filters such as db7, db4 or Haar. The Forward transform can be done in two ways, such as matrix multiply method and linear equations. In the FDWT, each step calculates a set of wavelet averages (approximation or smooth values) and a set of details. If a data set s0, s1, ... sN1 contains N elements, there will be N/2 averages and N/2 detail values. The averages are stored in the upper half and the details are stored in the lower half of the N element array.

4.3. DISTRIBUTED ARITHMETIC FOR DWT


With the rapid progress of VLSI design technologies, many processors based on audio and image signal processing have been developed recently. The two-dimensional discrete wavelet transform (2D DWT) plays a major role in image/video compression standard, such as JPEG2000 and MPEG4. Wavelets decompose the signal at one level of approximation and detail signals at the next level. Thus subsequent levels can add more details to the information content. Presently, research on the DWT is attracting a great deal of attention. In addition to audio and image compression, the DWT has important applications in many areas, such as computer graphics, numerical analysis, radar target distinguishing and so forth. The architecture of the 2D DWT is mainly composed of the multi-rate filters. Because extensive computation is involved in the practical applications, e.g., digital cameras, high efficiency and low-cost hardware is indispensable. These applications require real-time manipulation of digital images. Because this, fast algorithms and specific circuits for DWT have been developed. Among the methods for two-dimensional DWT, the indirect method based on row-column decomposition is the best adapted to a hardware implementation. Distributed arithmetic (DA) was proposed about two decades ago and has since used widely in VLSI implementations of DSP architectures. Most of these applications are computation intensive with multiplication and/or addition being the predominant operation. The main advantage of distributed arithmetic approach is that it speeds up the multiply process by pre-computing all the possible medium values and storing these values in a ROM. The input data can then be used to directly address the memory and the result. In this section, we only consider the separable 2-D DWT. We proposed an efficient 2D DWT architecture based on distributed arithmetic. This architecture only uses RAM in the

Dept of E&C, Sir MVIT, Bengaluru

Page 29

Modified DA based DWT-IDWT on FPGA for Image Compression proposed architecture instead of ROM because the size of ROM grows exponentially when the number of inputs and internal precision increase. Distributed arithmetic and row-column decomposition reduce the hardware amount and enhance the speed performance. The basic architecture deals with the separable 2D DWT, whose mathematical formulas are defined as follows. In the decomposition, the wavelet coefficients of any stage can be calculated from DWT of the previous stage. The following expression shows how the k-th scaling wavelet coefficients Xh(n,j+1) and Xg(n,j+1) are obtained at (j+1) stage. , + 1 = , + 1 = , 2 , ( 2) (4.1) (4.2)

Figure 1, shows a classical one level implementation of analysis and synthesis of the DWT system using filter bank structure. The input signal x(n) is filtered by the analysis process using the low pass h and the high pass g filters. The symbols 2 and 2 are up sampling and down sampling by a factor of two for decimating the filter results. The synthesis process is dual of its analysis process[5].

Figure 4.4: One level implementation using filter bank

To derive Distributed Architecture for DWT, consider the following sum of products:
=1

(4.3)

Where Ak is the fixed coefficient of the filter bank and Xk is the input samples. The decomposed expression of (1) in form of DA can be written as equation 2:

Dept of E&C, Sir MVIT, Bengaluru

Page 30

Modified DA based DWT-IDWT on FPGA for Image Compression

Note that in equation (2), A is the distributed arithmetic matrix of fixed coefficients Aki, where k = 1, 2, ...,L; i=1, 2, ...,N-1, with Ak N-1 is the MSB and Ak 0 is the LSB . It should be noted again that, in Distributed Architecture for DWT, the bits of the coefficients are distributed unlike conventional DA, where the bits of the input data words are distributed. Furthermore, Distributed Architecture matrix contains only 0 and 1, which means the computation of Y can be carried out just by shifting and adding of the input vectors. Matrix A is very important to DA architecture of DWT since its structure can lead to savings in hardware to implement the computations. It only consists of 0's and 1's. Therefore, we refer to matrix A as the Adder Butterflies. Overall, by using DA

architecture of DWT, inner product of vectors (1) can be implemented generally with basic adder cells.

Consider the four high pass filter coefficients as [2 3 4 2] And,the image bits as [X0,X1,X2.X7] The first image bit X0 enters the system filter and the sum of the product(sop) output is given as Y0 Y0=2X0+3X-1+4X-2 Now X1 enters and Y1 is Y1=3X0+2X1 Similarly Y2=4X0+3X1+2X2 Y3=2X0+3X1+4X2+2X3 Y4=2X1+4X2+3X3+2X4 Y5=2X0+4X1+3X4+2X5 Y6=. Y7=. Dept of E&C, Sir MVIT, Bengaluru Page 31

Modified DA based DWT-IDWT on FPGA for Image Compression Now let us take the input samples as [1 2 3 4 5 6 7 8] for easy computation and configuring and realizing the distributive arithmetic architecture H=[2 3 4 5],the filter coefficients And the input samples as X=[1 2 3 4 5 6 7 8] And the computation is done as shown below [2 3 4 5] 87654321 87654321 87654321 87654321 . Y0=2*1 Y1=3*1+2*2 Y2=4*1+3*2+2*3 Y3=5*1+4*2+3*3+2*4 Y4=5*2+4*3+3*4+2*5 Y5=.. .. .. Now Y3 can be re-written as Y3=5*[0 0 1]+4*[0 1 0]+3*[0 1 1]+2*[1 0 0] =5 [0*22+0*21+1*20] + 4 [0*22+1*21+0*20] + 3 [0*22+1*21+1*20] + 2 [1*22+0*21+0*20] 05 05 15 + + + 04 14 04 Y3= + * 22 + 21 + *20 + + 03 13 13 + + + 02 02 02 Similarly the input samples can be lasted till fourth bit in contrast with the earlier example,where in we used 3-bits for each samplein other words each input sample is represented by the 4-bits Dept of E&C, Sir MVIT, Bengaluru Page 32

Modified DA based DWT-IDWT on FPGA for Image Compression Lets consider another example to demonstrate the syntax of the above mentioned equation for efficient realization.i.e, H=[2 3 4 5] X=[9 7 5 8] The generalized or simple output representation is given as 15 + 04 + 03 + 12 05 + 14 *23 + + 13 + 02 05 + 04 *22 + + 13 + 02 05 + 14 *21+ + 13 + 12

Y=

*20

Now we can realize that, a total of 24 (or 16) coefficients can be stored in the rom. On being developed the simplified representation of the sum of the product (sop) equation Y,we move further to design the rough (prototype) architecture of the DA. It consists of the SISOs and the ROM Where the number of SISO registers depend upon the filters employed for particular application. 1-bit of data is serially fed for each clock pulse into the SISO register and shifting operation (i.e, either left or right shift) is performed.at the end of the operation 1-bit output is serially fed out of the register. ROM contains the mappable-coefficients.In other words the LSBs(least significant bits) of all the input samples are mapped over to ROM for corresponding coefficients.If LSBs match altogether with the ROM contents,then the corresponding coefficient will be given as output

Dept of E&C, Sir MVIT, Bengaluru

Page 33

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 4.5: Showing the mapping the serial out on rom coefficients The above prototype has the following reviews It takes 3-clock cycles to load 1-single SISO At the 4th clock 1-bit of SISO0 will be right-shifted into SISO1 Therefore,a total of 3*4=12 clock cycle is needed to load the shifters The next 3-clocks are needed to map the LABs of shifters on to ROM.and generate 1-output.i,e, to compute the first output by parallel mapping of serial outputs. So a total of 21-cycles are required to generate first 3-outputs. Another input sample enters at SISI0 for the next 3-clocks and SISO3 contents are replaced by contents of SISO2 The distributed arithmetic architecture is incomplete without the section discussed below The output of the ROM is given to the ADDER ADDER contents are summed with the ACCUMULATOR contents.Accumulator is initialized to zero at first. The output of the Adder is right-shifted and stored in Accumulator. The protype along with Adder,Accumulator and Shifter shows the perfect Distributed Arithmetic Architecture.This is diagrammatically represented as shown below

Dept of E&C, Sir MVIT, Bengaluru

Page 34

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 4.6: General Distributive Arithmetic Architecture

Dept of E&C, Sir MVIT, Bengaluru

Page 35

Modified DA based DWT-IDWT on FPGA for Image Compression

CHAPTER 5

IMAGE COMPRESSION

Dept of E&C, Sir MVIT, Bengaluru

Page 36

Modified DA based DWT-IDWT on FPGA for Image Compression

5.1 PROBLEM STATEMENT


Distributed Arithmatic Architecture can be used for 9/7 tap filters in 2dimensional discrete wavelet transform. The 9-tap High-pass filter with the DA Architecture has the following salient features It has 9-SISOs,each of 8-bits The First 8*9=72 cycles are for loading all SISOs 8-cycles for generating the first output Next 8-cycles to load the first SISO Next 8-cycles to compute Total=8+8+8=24 cycles are required to compute the first 3-outputs The first output is fed to Adder,which is summed with accumulator contents.i.e,zero The output is right shifted and fed to Accumulator. And the cycle continues The 7-tap low pass filter with the DA Architecture has the following salient features It has 7-SISOs,each of 8-bits The First 8*7=56 cycles are for loading all SISOs 5-cycles for generating the first output Next 5-cycles to load the first SISO Next 5-cycles to compute Total=5+5+5=15 cycles are required to compute the first 3-outputs The first output is fed to Adder,which is summed with accumulator contents.i.e,zero The output is right shifted and fed to Accumulator. And the cycle continues

Dept of E&C, Sir MVIT, Bengaluru

Page 37

Modified DA based DWT-IDWT on FPGA for Image Compression

8-BIT SIS0

ROM-MEMORY MAP

DISTRIBUTED ARITHMETIC

ADDER

ACCUM ULATOR

SHIFTER

9-SISO Figure 5.1: 9-tap high pass filter with DA-architecture

Dept of E&C, Sir MVIT, Bengaluru

Page 38

Modified DA based DWT-IDWT on FPGA for Image Compression 8-BIT SIS0 ROM-MEMORY MAP

DISTRIBUTED ARITHMETIC

ADDER

27
ACCUM ULATOR SHIFTER

7-SISO Figure 5.2: 7-tap low- pass filter with DA-architecture

5.2 PROPOSED ARCHITECTURE


The architecture is based on popular Daubechies 9/7 filter bank (floating point) used in JPEG2000 and MPEG4. The floating-point 9/7 forward transform uses two analysis filter h (high-pass) and g (low-pass). Without loss of generality we assume accuracy up to 5 decimal places, hence the coefficients are shown in equation 3. The finite precession of the hardware limits the accurate representation of the floating-point number; hence for the purpose of implementation we will represent coefficients with accuracy of 13 bits. The assumption is reasonable as 13 bits representation gives high enough accuracy for the fixed-point implementation. Dept of E&C, Sir MVIT, Bengaluru Page 39

Modified DA based DWT-IDWT on FPGA for Image Compression The 9/7 tap high and low pass FIR filter are in the following :

Y(2i+1)=(-0.45656)*[X(2i-2)+X(2i+4)] +(0.028772*[X(2i-1)+X(2i++3)] +0.295636*[X92i)+X(2i+2)] +(-0.55743)*X(2i+1); Y(2i)=(0.0266749)*[X(2i-4)+X(2i+4)] +(-0.016864)*[X(2i-3)+X(2i+3)] +(-0.078223)*[X(2i-2)+X(2i+2)] +(0.260864)*[X(2i-1)+X(2i+1)] +(.002949)*[X(2i)];


So the coefficient matrixes are as the following:

h = [(-0.045636 )(0.028772) (0.295636) (-0.557543 )]; g = [(0.026749) (-0.016864 ) (-0.078223 )(0.266864 )(0.602949 )];
Then the coefficient matrix (9/7 tap high and low pass FIR filter) can be distributed in to 13 bits (coefficient word length), so h and g can also be written as[5]:

h = [(2(212 ) 211 . . . (21 ) 20 ] g=[(2(212 ) 211 . . . (21 ) 20 ] A and A are represented as following:

A A

(5.1) (5.2)

Dept of E&C, Sir MVIT, Bengaluru

Page 40

Modified DA based DWT-IDWT on FPGA for Image Compression

CHAPTER 6

SOFTWARE REFERENCE MODEL

Dept of E&C, Sir MVIT, Bengaluru

Page 41

Modified DA based DWT-IDWT on FPGA for Image Compression

6.1 MATLAB
6.1.1 OVERVIEW OF MATLAB
MATLAB is a high-performance language for technical computing. It integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation. Typical uses include: Math and computation Algorithm development Data acquisition Modeling, simulation, and prototyping Data analysis, exploration, and visualization Scientific and engineering graphics Application development, including graphical user interface building MATLAB is an interactive system whose basic data element is an array that does not require dimensioning. This allows you to solve many technical computing problems, especially those with matrix and vector formulations, in a fraction of the time it would take to write a program in a scalar no interactive language such as C or FORTRAN. The name MATLAB stands for matrix laboratory. MATLAB was originally written to provide easy access to matrix software developed by the LINPACK and EISPACK projects. Today, MATLAB engines incorporate the LAPACK and BLAS libraries, embedding the state of the art in software for matrix computation. MATLAB has evolved over a period of years with input from many users. In university environments, it is the standard instructional tool for introductory and advanced courses in mathematics, engineering, and science. In industry, MATLAB is the tool of choice for high-productivity research, development, and analysis[6].

Dept of E&C, Sir MVIT, Bengaluru

Page 42

Modified DA based DWT-IDWT on FPGA for Image Compression

6.2 MATLAB SYSTEM


The MATLAB system consists of these main parts:

6.2.1 DESKTOP TOOLS AND DEVELOPMENT ENVIRONMENT


This is the set of tools and facilities that help you use MATLAB functions and files. Many of these tools are graphical user interfaces. It includes the MATLAB desktop and Command Window, a command history, an editor and debugger, a code analyzer and other reports, and browsers for viewing help, the workspace, files, and the search path.

6.2.2 MATLAB MATHEMATICAL FUNCTION LIBRARY


This is a vast collection of computational algorithms ranging from elementary functions, like sum, sine, cosine, and complex arithmetic, to more sophisticated functions like matrix inverse, matrix Eigen values, Bessel functions, and fast Fourier transforms.

6.2.3 MATLAB LANGUAGE


This is a high-level matrix/array language with control flow statements, functions, data structures, input/output, and object-oriented programming features. It allows both programming in the small to rapidly create quick and dirty throw-away programs, and programming in the large to create large and complex application programs.

6.2.4 GRAPHICS
MATLAB has extensive facilities for displaying vectors and matrices as graphs, as well as annotating and printing these graphs. It includes high-level functions for twodimensional and three-dimensional data visualization, image processing, animation, and presentation graphics. It also includes low-level functions that allow you to fully customize the appearance of graphics as well as to build complete graphical user interfaces on your MATLAB applications.

Dept of E&C, Sir MVIT, Bengaluru

Page 43

Modified DA based DWT-IDWT on FPGA for Image Compression

6.2.5 MATLAB EXTERNAL INTERFACES


This is a library that allows you to write C and FORTRAN programs that interact with MATLAB. It includes facilities for calling routines from MATLAB (dynamic linking), calling MATLAB as a computational engine, and for reading and writing MATfiles.

6.3 IMAGE PROCESSING TOOLBOX


6.3.1 INTRODUCTION
Image Processing Toolbox is a collection of functions that extend the capability of the MATLAB numeric computing environment. The toolbox supports a wide range of image processing operations, including Spatial image transformations Morphological operations Neighborhood and block operations Linear filtering and filter design Transforms Image analysis and enhancement Image registration Region of interest operations Many of the toolbox functions are MATLAB M-files, a series of MATLAB statements that implement specialized image processing algorithms. We can view the MATLAB code for these functions using the statement type function_name We can extend the capabilities of Image Processing Toolbox by writing your own M-files, or by using the toolbox in combination with other toolboxes, such as Signal Processing Toolbox and Wavelet Toolbox.

Dept of E&C, Sir MVIT, Bengaluru

Page 44

Modified DA based DWT-IDWT on FPGA for Image Compression

6.3.2 READ AND DISPLAY AN IMAGE


First, clear the MATLAB workspace of any variables and close open figure windows. Close all To read an image, use the imread command. The example reads one of the sample images included with Image Processing Toolbox, pout.tif, and stores it in an array named I. I = imread ('pout.tif'); Now display the image. The toolbox includes two image display functions: imshow and imtool. Imshow is the toolbox's fundamental image display function. Imtool starts the Image Tool which presents an integrated environment for displaying images and performing some common image processing tasks. The Image Tool provides all the image display capabilities of imshow but also provides access to several other tools for navigating and exploring images, such as scroll bars, the Pixel Region tool, Image Information tool, and the Contrast Adjustment tool.

6.3.3 IMAGE APPEARANCE IN THE WORKSPACE


To see how the imread function stores the image data in the workspace, check the Workspace browser in the MATLAB desktop. The Workspace browser displays information about all the variables you create during a MATLAB session. The imread function returned the image data in the variable I, which is a 291-by-240 element array of uint8 data. MATLAB can store images as uint8, uint16, or double arrays.

6.3.4 IMPROVING IMAGE CONTRAST


pout.tif is a somewhat low contrast image. To see the distribution of intensities in pout.tif, we can create a histogram by calling the imhist function. figure, imhist(I) The intensity range is rather narrow. It does not cover the potential range of [0, 255], and is missing the high and low values that would result in good contrast. The toolbox provides several ways to improve the contrast in an image.

One way is to call the histeq function to spread the intensity values over the full range of the image, a process called histogram equalization.I2 = histeq(I);Display the new equalized image, I2, in a new figure window. figure, imshow(I2)

Dept of E&C, Sir MVIT, Bengaluru

Page 45

Modified DA based DWT-IDWT on FPGA for Image Compression

6.4 PSNR AND MSE FOR IMAGES


6.4.1 PSNR
Compute peak signal-to-noise ratio (PSNR) between images. The PSNR block computes the peak signal-to-noise ratio, in decibels, between two images[6]. This ratio is often used as a quality measurement between the original and a compressed image. The higher the PSNR, the better the quality of the compressed image[1].

6.4.2 MSE
In statistics, the mean square error or MSE of an estimator is one of many ways to quantify the difference between an estimator and the true value of the quantity being estimated.

MSE is a risk function, corresponding to the expected value of the squared error loss or quadratic loss. MSE measures the average of the square of the "error." The error is the amount by which the estimator differs from the quantity to be estimated.

Dept of E&C, Sir MVIT, Bengaluru

Page 46

Modified DA based DWT-IDWT on FPGA for Image Compression

CHAPTER 7:

FPGA IMPLEMENTATION

Dept of E&C, Sir MVIT, Bengaluru

Page 47

Modified DA based DWT-IDWT on FPGA for Image Compression

7.1 FPGA basic design Flow Overview:


The ISE design flow comprises the following steps: design entry, design synthesis, design implementation, and Xilinx device programming. Design verification, which includes both functional verification and timing verification, takes places at different points during the design flow. This section describes what to do during each step. For additional details on each design step, click a box in the following figure.

Figure 7.1:FPGA Basic Design Flow

7.2 Design Summary:


Design entry is the first step in the ISE design flow. During design entry, you create your source files based on your design objectives. You can create your top-level design file using a Hardware Description Language (HDL), such as VHDL, Verilog, or ABEL, or using a schematic. You specify your top-level module type when you create your project as described in Creating a Project[9].

Dept of E&C, Sir MVIT, Bengaluru

Page 48

Modified DA based DWT-IDWT on FPGA for Image Compression You can use multiple formats for the lower-level source files in your design. Different source types are available, depending on your project properties (top-level module type, device type, synthesis tool, and language). You can create these source files in Project Navigator, as described in Creating a Source File. Some source types launch additional tools to help you create the file, as described in Source File Types.

Table 7.1: Design Summary image_inte Project Status Project File: image_inte.ise Module Name: Target Device: Product Version: video xc2vp30-7ff896 ISE 10.1 WebPACK Current State:

Programming File Generated No Errors 703 Warnings (676 new, 0 filtered) All Signals Completely Routed All Constraints Met

Errors: Warnings: Routing Results: Timing Constraints: Final Timing Score:

Design Goal: Balanced Design Strategy: Xilinx Default (unlocked)

0 (Timing Report)

image_inte Partition Summary No partition information was found.

[-]

Device Utilization Summary [-] Logic Utilization Used Available Utilization Note(s) Number of Slice Flip Flops 113 27,392 1% Number of 4 input LUTs 333 27,392 1% Logic Distribution Number of occupied Slices 203 13,696 1% Number of Slices containing only related logic 203 203 100% Number of Slices containing unrelated logic 0 203 0% 378 27,392 1% Total Number of 4 input LUTs Number used as logic 333 Number used as a route-thru 45 Number of bonded IOBs 31 556 5% Number of RAMB16s 15 136 11% Number of BUFGMUXs 2 16 12% Number of DCMs 1 8 12% Dept of E&C, Sir MVIT, Bengaluru Page 49

Modified DA based DWT-IDWT on FPGA for Image Compression Performance Summary Final Timing Score: Routing Results: Timing Constraints: 0 All Signals Completely Routed All Constraints Met [-] Pinout Data: Clock Data: [-] Pinout Report Clock Report

Report Name

Infos 25 Infos Wed 9. Jun 676 Warnings (676 new, 0 Synthesis Report Current 0 (24 00:02:30 2010 filtered) new, 0 filtered) Translation Wed 9. Jun 24 Warnings (0 new, 0 Current 0 0 Report 00:03:10 2010 filtered) 3 Infos Wed 9. Jun 2 Warnings (0 new, 0 (0 new, Map Report Current 0 00:03:52 2010 filtered) 0 filtered) 2 Infos Place and Route Wed 9. Jun 1 Warning (0 new, 0 (0 new, Current 0 Report 00:05:20 2010 filtered) 0 filtered) 3 Infos Static Timing Wed 9. Jun (0 new, Current 0 0 Report 00:05:50 2010 0 filtered) 2 Infos Wed 9. Jun (0 new, Bitgen Report Current 0 0 00:06:38 2010 0 filtered) Table 7.1(Contd): Design Summary

Detailed Reports Status Generated Errors Warnings

7.3 Timing Constraints:


The ISE software allows you to enter timing constraints that describe the timing performance requirements of the design. Providing a concise set of constraints achieves the following:
Allows the software to create a design that meets your requirements. Allows you to compare the constraints to the performance of the resulting

design, using the timing reports output by the ISE software. By analyzing the timing reports, you can identify the paths in the design that may require Dept of E&C, Sir MVIT, Bengaluru Page 50

Modified DA based DWT-IDWT on FPGA for Image Compression coding modifications, placement directives, or additional constraints to achieve timing closure. Increases the performance of the ISE software by reducing the memory and runtime requirements[9]. Timing Constraints Timing Score

Met Constraint

Check

Worst Case Slack

Best Case Timing Achievable Errors

Yes Yes Yes Yes Yes Yes Yes Yes

Autotimespec constraint for clock SETUP N/A 3.018ns net dwt1/dw_2d/d1/s1 HOLD 0.701ns Autotimespec constraint for clock SETUP N/A 23.268ns net vga_out_pixel_clock_OBUF HOLD 0.562ns Autotimespec constraint for clock SETUP N/A 1.863ns net dwt1/dw_2d/clkd3 HOLD 0.635ns Autotimespec constraint for clock SETUP N/A 2.949ns net dwt1/dw_2d/d2/s1 HOLD 0.701ns Autotimespec constraint for clock SETUP N/A 3.035ns net dwt1/dw_2d/d2/s HOLD 0.721ns Autotimespec constraint for clock SETUP N/A 3.138ns net dwt1/dw_2d/d3/s1 HOLD 0.712ns Autotimespec constraint for clock SETUP N/A 3.297ns net dwt1/dw_2d/d3/s HOLD 0.713ns Autotimespec constraint for clock SETUP N/A 3.445ns net dwt1/dw_2d/d1/s HOLD 0.855ns Table 7.2: Timing Constraints

N/A 0 N/A 0 N/A 0 N/A 0 N/A 0 N/A 0 N/A 0 N/A 0

00 00 00 00 00 00 00 00

7.4 Clock Report This report contains information on the resource utilization of each clock region and lists any clock conflicts between global clock buffers in a clock region. Clock Report Clock Net vga_out_pixel_clock_OBUF dwt1/dw_2d/clkd3 dwt1/dw_2d/d2/s dwt1/dw_2d/d3/s dwt1/dw_2d/d1/s dwt1/dw_2d/d1/s1 dwt1/dw_2d/d2/s1 dwt1/dw_2d/d3/s1 Resource Locked Fanout Net Max Skew(ns) Delay(ns) 0.233 1.212 0.024 1.122 0.020 1.006 0.014 1.121 0.048 1.039 0.038 2.192 0.145 2.480 0.046 2.239 Page 51

BUFGMUX0P No 443 BUFGMUX4P No 50 BUFGMUX6P No 36 BUFGMUX5P No 36 BUFGMUX3P No 36 Local 63 Local 62 Local 62 Table 7.3: Clock Report

Dept of E&C, Sir MVIT, Bengaluru

Modified DA based DWT-IDWT on FPGA for Image Compression

7.5 Synthesis Report:


After design entry and optional simulation, you run synthesis. In the Sources tab, select Synthesis/Implementation from the Design View drop-down list, and select the top module. In the Processes tab, double-click Synthesize. The ISE software includes Xilinx Synthesis Technology (XST), which synthesizes VHDL, Verilog, or mixed language designs to create Xilinx-specific netlist files known as NGC files. Unlike output from other vendors, which consists of an EDIF file with an associated NCF file, NGC files contain both logical design data and constraints. XST places the NGC file in your project directory and the file is accepted as input to the Translate (NGDBuild) step of the Implement Design process. To specify XST as your synthesis tool, you must set the Synthesis Tool Project Property to XST, as described in Changing Project, Source, and Snapshot Properties[9]. Table 7.4: Synthesis Report ---- Source Parameters Input File Name Input Format Ignore Synthesis Constraint File ---- Target Parameters Output File Name Output Format Target Device ---- Source Options Top Module Name Automatic FSM Extraction FSM Encoding Algorithm Safe Implementation FSM Style RAM Extraction RAM Style ROM Extraction Mux Style Decoder Extraction Priority Encoder Extraction Shift Register Extraction Logical Shifter Extraction XOR Collapsing Dept of E&C, Sir MVIT, Bengaluru : "video.prj" : mixed : NO

: "video" : NGC : xc2vp30-7ff896

: video : YES : Auto : No : lut : Yes : Auto : Yes : Auto : YES : YES : YES : YES : YES Page 52

Modified DA based DWT-IDWT on FPGA for Image Compression ROM Style Mux Extraction Resource Sharing Asynchronous To Synchronous Multiplier Style Automatic Register Balancing ---- Target Options Add IO Buffers Global Maximum Fanout Add Generic Clock Buffer(BUFG) : 16 Register Duplication Slice Packing Optimize Instantiated Primitives Convert Tristates To Logic Use Clock Enable Use Synchronous Set Use Synchronous Reset Pack IO Registers into IOBs Equivalent register Removal ---- General Options Optimization Goal Optimization Effort Library Search Order Keep Hierarchy : Auto : YES : YES : NO : auto : No

: YES : 500 :16 : YES : YES : NO : Yes : Yes : Yes : Yes : auto : YES

: Speed :1 : video.lso : NO : Netlist Hierarchy as_optimized RTL Output : Yes Global Optimization : AllClockNets Read Cores : YES Write Timing Constraints : NO Cross Clock Analysis : NO Hierarchy Separator :/ Bus Delimiter : <> Case Specifier : maintain Slice Utilization Ratio : 100 BRAM Utilization Ratio : 100 Verilog 2001 : YES Auto BRAM Packing : NO Slice Utilization Ratio Delta :5 Table 7.4(Contd): Synthesis Report

Dept of E&C, Sir MVIT, Bengaluru

Page 53

Modified DA based DWT-IDWT on FPGA for Image Compression

7.6 RTL Schematic:


The synthesized design can be viewed as a schematic in the register transfer level (RTL) viewer. This view displays gates and elements independently of the targeted Xilinx device.

Figure 7.2 : RTL Schematic

The schematic shows a representation of the pre-optimized design in terms of generic symbols, such as adders, multipliers, counters, AND gates, and OR gates, which are independent of the targeted Xilinx device. Viewing this schematic may help you discover design issues early in the design process. [9]

Figure 7.3: Pictorial view of RTL schematic

Dept of E&C, Sir MVIT, Bengaluru

Page 54

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 7.4: Technology Schematic Overview The synthesized design can be viewed as a schematic in a technology schematic viewer. This view displays gates and elements as they will appear on the Xilinx device.

Figure 7.5: Technology Schematic

7.7 Implement Design:


Translate: The Translate process merges all of the input net-lists and design constraints and outputs a Xilinx native generic database (NGD) file, which describes the logical design reduced to Xilinx primitives. See the following table for details. [9]

Dept of E&C, Sir MVIT, Bengaluru

Page 55

Modified DA based DWT-IDWT on FPGA for Image Compression Translate Process

Command line tool Tcl command Input files

NGDBuild process run "Translate" EDIF, SEDIF, EDN, EDF, NGC, UCF, NCF, URF, NMC, BMM

Output files Process properties Tools available after running process

BLD (report), NGD Translate Properties

Constraints Editor, Floorplan Editor, Floorplanner, PACE Note Each of these tools modifies the UCF file. When you rerun Translate with the updated UCF, the NGD file is updated. Table 7.5: Translate Process

NGDBUILD Design Results Summary: Number of errors : 0 Number of warnings : 25 Total memory usage is 102260 kilobytes

7.7.1 Floor plan design after Translate


The general steps in the basic flow are as follows: Design is created, synthesized, and transformed into an NGD file. The NGD file includes location constraints that originated in your design source, a UCF, or an NCF. The file may also include references or instances of IP macros. Floorplan Editor reads the NGD file, reads the design hierarchy, pulls in data for any IP macros, and creates a representation of your design. While reading the NGD file, Floorplan Editor interprets any I/O standards applied to buffers connected to I/Os and displays them in the Design Objects tab window. Floorplan Editor modifies one or more UCFs[9]. Note Floorplan Editor does not create the UCF. If you dont already have one, you must first create at least one UCF using the Project Navigator New Source or Add Source Dept of E&C, Sir MVIT, Bengaluru Page 56

Modified DA based DWT-IDWT on FPGA for Image Compression functions. The UCFs are then input to NGDBuild and the remainder of the Xilinx implementation flow is completed.When the initial constraints are from your design source or an NCF, these constraints cannot be removed when a UCF is used as Floorplan Editor output. They can only be overridden by constraints applied in Floorplan Editor and finally be saved in a UCF.

7.8 Map Report:


The Map process maps the logic defined by an NGD file into FPGA elements, such as CLBs and IOBs. The output design is a native circuit description (NCD) file that physically represents the design mapped to the components in the Xilinx FPGA. See the following table for details. [9] Map Process

Command line tools Tcl command Input files

MAP process run "Map" NGD, NMC, NCD, NGM Note The NCD and NGM files are for guiding.

Output files Process Properties Tools available after running process

NCD, PCF, NGM, MRP (report), GRF, MAP, PSR Map Properties Floorplanner, FPGA Editor, Timing Analyzer Table 7.6: Map Process Table 7.7: Map Report(Below)

Target Device Target Package Target Speed Design Summary Number of errors Number of warnings Logic Utilization: Number of Slice Flip Flops Number of 4 input LUTs Dept of E&C, Sir MVIT, Bengaluru

: xc2vp30 : ff896 : -7

: :

0 2

: 113 out of 27,392 1% :339 out of 27,392 1% Page 57

Modified DA based DWT-IDWT on FPGA for Image Compression Logic Distribution: Number of occupied Slices Number of Slices containing only related logic Number of Slices containing unrelated logic Total Number of 4 input LUTs Number used as logic Number used as a route-thru Number of bonded IOBs Number of RAMB16s Number of BUFGMUXs Number of DCMs Peak Memory Usage Total REAL time to MAP completion Total CPU time to MAP completion Table 7.7(Contd): Map Report : 200 out of 13,696 1% : 200 out of 200 100% : 0 out of 200 0% : 371 out of 27,392 1% : 339 : 32 : 31 out of 556 5% : 15 out of 136 11% : 2 out of 16 12% : 1 out of 8 12% : 231 MB : 11 secs : 8 secs

7.9 Place and Route:


The Place and Route process takes a mapped NCD file, places and routes the design, and produces an NCD file that is used as input for bitstream generation.
Place and Route Process

Command line tools Tcl command Input files

PAR process run "Place & Route" NCD, PCF Note In addition to the NCD file from MAP, PAR also accepts an NCD file for guiding.

Output files Process Properties Tools available after running process

NCD, PAR (report), PAD, CSV, TXT, GRF, DLY Place & Route Properties Floorplanner, FPGA Editor, Timing Analyzer, TRACE, XPower Analyzer

Table 7.8: Place and Route Process Dept of E&C, Sir MVIT, Bengaluru Page 58

Modified DA based DWT-IDWT on FPGA for Image Compression

Device Utilization Summary: Number of BUFGMUXs Number of DCMs 2 out of 16 1 out of 8 12% 12%

Number of External IOBs Number of LOCed IOBs Number of RAMB16s Number of SLICEs

31 out of 556 5% 31 out of 31 100% 15 out of 136 11% 200 out of 13696 1%

Overall effort level (-ol)

Standard

Placer effort level (-pl)

High

Placer cost table entry (-t)

Router effort level (-rl) Standard REAL time consumed by placer 24 secs CPU time consumed by placer 21 secs Table 7.9: Place and Route

Dept of E&C, Sir MVIT, Bengaluru

Page 59

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 7.6: View of the design after routed in place and route[9]

Data in X-power analyser

Table 7.10: X-power analyzer[9]

Dept of E&C, Sir MVIT, Bengaluru

Page 60

Modified DA based DWT-IDWT on FPGA for Image Compression 7.10 Configure target device: Target Device Properties The following properties are available for the Configure Target Device process for a CPLD or FPGA device. iMPACT Project File The iMPACT Project File (IPF) contains information from a previous session of iMPACT. If you specify an IPF file in this property and run the Configure Target Device process, the target device will be configured according to the settings in the specified IPF file. If Default is specified here, the target device will be configured according to the settings in the default IPF file, <ISE_image_inte>.ipf. Port to be used (Advanced): Here we use USB, specifies the port you would like to use for configuration. Auto-default causes the software to search every port for a connection, automatically detect an available cable, and connect to it.Run Generate Target PROM/ACE FileIf selected, the Configure Target Device process will automatically run the Generate Target PROM/ACE File process to generate a PROM or ACE file before configuring the target device.The file will be generated using the information from the .ipf file specified in the iMPACT Project File property. When Automatically Generate Target PROM/ACE File is set to True (checkbox is checked), the PROM or ACE file is generated in the background before the target device is configured. This is useful for quick PROM or System ACE file regeneration when a bitstream has changed.[9]

Dept of E&C, Sir MVIT, Bengaluru

Page 61

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 7.7: Output Simulation Window

Figure 7.8: Snapshot1 of Image Compression Chip(internal view 1)

Dept of E&C, Sir MVIT, Bengaluru

Page 62

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure 7.9: Image Compression Chip (internal view 2)

Figure 7.10: Image Compression Chip Internal View 3

Dept of E&C, Sir MVIT, Bengaluru

Page 63

Modified DA based DWT-IDWT on FPGA for Image Compression

RESULT:

[a] Original image

[b] Reconstructed image

The original image and the reconstructed image are compared with respect to PSNR(db) and MSE and the observation made is that, the original and the reconstructed image are similar to each other. This validates our result.

Dept of E&C, Sir MVIT, Bengaluru

Page 64

Modified DA based DWT-IDWT on FPGA for Image Compression

CHAPTER 8

Conclusion and Scope for Future Work

Dept of E&C, Sir MVIT, Bengaluru

Page 65

Modified DA based DWT-IDWT on FPGA for Image Compression

8.1 Conclusion
An image compression algorithm was simulated using Matlab to comprehend the process of image compression. Modifications on the padding style showed reduction in the error, because it offers a better reproduction of image at its edges. It also supports faithful reproduction of the image, keeping the size of the transform coefficient matrix equal to the image size. For the VLSI implementation of an image compression encoder, Verilog HDL was chosen. The proposed theoretical benefits of DA are realizing the full potential of FPGA architecture for hardware implementation and achieving large parallelism. The relative area and speed efficiencies of DA turns out to be good on hardware implementation on FPGA. DA approach can achieve near to maximum clock rates possible with a given FPGA technology using only basic 4-LUT based blocks and the fast ripple carry chains while the multi stage modulo adders required in RNS implementation are slow, even for small word lengths, and as such the accumulator stage becomes the performance bottleneck. It has also been observed that implementation of large adders in FPGAs with fast carry chains is quite fast and the adder delay scales up less than linearly with increasing word lengths. In light of the implementation results it is clear that DA based architectures have an area, speed and simplicity advantage over any other method based on implementations. It is in this context, we can say that DA implementations are superior when targeting FPGAs.

8.2 Scope for Future work


The newly developed concept of sparsity in signal processing can be used in the context of Image Compression. The first step of the scheme is to use a sparsifying transform on the image. The sparse set of coefficients is encoded via Sparse PCA. Wavelet Transform had been used profusely for image compression tasks. But the choice is not the ideal one. The partial reconstruction error from wavelet coefficients is an order of magnitude higher than the ideal error rate for many critical application. Image compression can be carried in the curvelet domaina better choice compared to wavelets, atleast theoretically, since the reconstruction error rate with curvelet coefficients is of the same asymptotic order as that of the ideal error rate. Dept of E&C, Sir MVIT, Bengaluru Page 66

Modified DA based DWT-IDWT on FPGA for Image Compression

APPENDIX-A

FPGA ARCHITECTURE

Dept of E&C, Sir MVIT, Bengaluru

Page 67

Modified DA based DWT-IDWT on FPGA for Image Compression A Field Programmable Gate Array (FPGA) is a semiconductor device containing programmable logic components and programmable interconnects. The programmable logic components can be programmed to duplicate the functionality of basic logic gates such as AND, OR, XOR, NOT or more complex combinational functions such as decoders or imple mathematical functions. In most FPGAs these programmable logic components also include memory elements, which may be simple flip-flops or more programmable logic components also include memory elements, which may be simple flip-flops or more complete blocks of memories. FPGAs are generally slower than their Application Specific Integrated Circuits (ASIC) counterparts, as they cant handle as complex a design and draw more power.[7]

The programmable logic devices are capable of implementing a sequential network but not a complete digital system. Programmable gate arrays(PGAs) and complex programmable logic devices(CPLDs) are more flexible and more versatile and can be used and can be used to implement a complete digital system on a chip. Some of the largest devices can implement a small microprocessor.

A typical PGA is an IC that contains an array of identical logic cells with programmable interconnections. We can program the functions realized by each logic cell and connections between the cells. Such PGAs are called FPGAs since they are field programmable.[7]

Dept of E&C, Sir MVIT, Bengaluru

Page 68

Modified DA based DWT-IDWT on FPGA for Image Compression

A.1 APPLICATION OF FPGA

[7]

[7] Figure A.1: Multiply accumulate operation (a) Conventional implementation (b) Distributed arithmetic implementation.

Dept of E&C, Sir MVIT, Bengaluru

Page 69

Modified DA based DWT-IDWT on FPGA for Image Compression

A.2 Virtex-II Pro


One of most advanced FPGA families in industry is the FPGA series produced by Xilinx. The Virtex user programmable gate array comprises two major configurable elements: configurable logic blocks (CLBs) and input/output blocks (IOBs). Each CLB is composed of two slices as shown in Figure A.2 A slice contains 4- input, 1-output LUTs and two registers. Interconnections between these elements are configured by multiplexers controlled by SRAM cells programmed by a users bit stream. The LUTs allow any function of five inputs, and two functions of four inputs, or some functions of up to nine inputs to be created within a CLB slice. This structure allows a very powerful method of implementing arbitrary, complex digital logic.

Figure A.2: Simplified Architecture of Virtex configurable logic block.

Virtex FPGAs are programmed using Verilog HDL; a popular hardware description language . The language has capabilities to describe the behavioral nature of a design, the data flow of a design, a designs structural composition, delays and a waveform generation mechanism. Models written in this language can be verified using a Verilog simulator. As a programming and development environment, Xilinx ISE Foundation Series tools have been used to produce a physical implementation for the

Dept of E&C, Sir MVIT, Bengaluru

Page 70

Modified DA based DWT-IDWT on FPGA for Image Compression Viretx FPGA. Field programmable gate arrays (FPGAs) provide a new implementation platform for the discrete wavelet transform.

FPGAs maintain the advantages of the custom functionality of

VLSI ASIC

devices, while avoiding the high development costs and the inability to make design modifications after production. Furthermore, FPGAs inherit design flexibility and adaptability of software implementations.

We make maximal utilization of the lookup table (LUT) architecture of Virtex FPGAs by reformulating the wavelet transform computation in accordance with the distributed arithmetic algorithm. Distributed arithmetic makes extensive use of look-up tables, which makes it ideal for implementing the discrete wavelet transform functions onto the LUT-based architecture of Virtex FPGAs. Moreover, distributed arithmetic is suitable for low power portable applications because it allows replacement of costly multipliers with shifts and look-up tables. Indeed, one of the unique features of our discrete wavelet transform implementation is exploiting the natural match between the Virtex architecture and distributed arithmetic.

Three more unique features are worth mentioning at this point. The first is the flexibility of the implementation which is made possible by virtue of the re-programmability of FPGAs which allows easy modification of wavelet type. The second is that, unlike most reported implementations which concentrate on architecture development, this implementation goes down to the actual implementation level. Finally, describes implementations for both the forward and inverse transforms.

A.3 INTERNAL CONFIGURATION


The basic Virtex logic element in a CLB is the slice . Two slices are present in each CLB as shown in Figure 2.6. Each slice contains 4-input, 1-output LUTs and two registers. Interconnections between these elements are configured by multiplexers controlled by SRAM cells programmed by a users bitstream. The LUTs allow any function of five inputs, and two Dept of E&C, Sir MVIT, Bengaluru Page 71

Modified DA based DWT-IDWT on FPGA for Image Compression functions of four inputs, or some functions of up to nine inputs to be created within a CLB slice. The outputs of these functions may be registered, or the registers may be used independently of the LUTs. This structure allows a very powerful method of implementing arbitrary, complex digital logic.

Figure A.3. Simplified Virtex configurable slice

A.3.1 LOOK-UP TABLE IMPLEMENTATION


Virtex slices have the ability to implement distributed memory instead of logic. Each 4- input LUT in a slice may be used to implement a 16x1 ROM or RAM, or the two LUTs may be combined together to create a 32x1 ROM or RAM or a 16x1 dual-port RAM. This allows each slice to trade logic resources for memory in order to maximize the resources available for a particular application.

Dept of E&C, Sir MVIT, Bengaluru

Page 72

Modified DA based DWT-IDWT on FPGA for Image Compression

APPENDIX- B

VIRTEX-II PRO ARCHITECTURE

Dept of E&C, Sir MVIT, Bengaluru

Page 73

Modified DA based DWT-IDWT on FPGA for Image Compression

B.1 Introduction
The XUP Virtex-II Pro Development System provides an advanced hardware platform that consists of a high performance Virtex-II Pro Platform FPGA surrounded by a comprehensive collection of peripheral components that can be used to create a complex system and to demonstrate the capability of the Virtex-II Pro Platform FPGA[8].

Features
Figure-I shows the Virtex-II Trainer, which includes the following components and features: Virtex-II Pro FPGA with PowerPC 405 cores Up to 2 GB of Double Data Rate (DDR) SDRAM System ACE controller and Type II Compact Flash connector for FPGA configuration and data storage Embedded Platform Cable USB configuration port High-speed SelectMAP FPGA configuration from Platform Flash In-System Programmable Configuration PROM Support for Golden and User FPGA configuration bitstreams On-board 10/100 Ethernet PHY device Silicon Serial Number for unique board identification RS-232 DB9 serial port Two PS-2 serial ports Four LEDs connected to Virtex-II Pro I/O pins Four switches connected to Virtex-II Pro I/O pins Five push buttons connected to Virtex-II Pro I/O pins Six expansion connectors joined to 80 Virtex-II Pro I/O pins with over-voltage protection High-speed expansion connector joined to 40 Virtex-II Pro I/O pins that can be used differentially or single ended AC-97 audio CODEC with audio amplifier and speaker/headphone output and line level output Microphone and line level audio input On-board XSGA output, up to 1200 x 1600 at 70 Hz refresh Page 74

Dept of E&C, Sir MVIT, Bengaluru

Modified DA based DWT-IDWT on FPGA for Image Compression Three Serial ATA ports, two Host ports and one Target port Off-board expansion MGT link, with user-supplied clock 100 MHz system clock, 75 MHz SATA clock Provision for user-supplied clock On-board power supplies Power-on reset circuitry PowerPC 405 reset circuitry

Block Diagram

Figure B.1: XUP Virtex-II Pro Development System Block Diagram[8]

Dept of E&C, Sir MVIT, Bengaluru

Page 75

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure B.2: XUP Virtex-II Pro Development System Board Photo[8]

Dept of E&C, Sir MVIT, Bengaluru

Page 76

Modified DA based DWT-IDWT on FPGA for Image Compression

B.2 Virtex-II Pro FPGA:


U1 is a Virtex-II Pro FPGA device packaged in a flip-chip-fine-pitch FF896 BGA package. Two different capacity FPGAs can be used on the XUP Virtex-II Pro Development System with no change in functionality. Table B-1 lists the Virtex-II Pro device features.

Features XC2VP20 XC2VP30 Slices 9280 13969 Array Size 56x46 80x46 Distributed RAM 290Kb 428Kb Multiplier Blocks 88 136 Block RAMs 1584Kb 2448Kb DCMs 8 8 PowerPC RISC Cores 2 2 Multi-Gigabit Transceivers 8 8 Table B-1: XC2VP20 and XC2VP30 Device Features

Power Supplies and FPGA Configuration

The XUP Virtex-II Pro Development System is powered from a 5V regulated power supply. On-board switching power supplies generate 3.3V, 2.5V, and 1.5V for the FPGA, and peripheral components and linear regulators power the MGTs. The board has provisioning for current measurement for all of the FPGA digital power supplies, as well as application of external power if the capacity of the on-board switching power supplies is exceeded. The XUP Virtex-II Pro Development System provides several methods for the configuration of the Virtex-II Pro FPGA. The configuration data can originate from the internal Platform Flash PROM (two potential configurations), the internal CompactFlash storage media (eight potential configurations), and external configurations delivered from the embedded Platform Cable USB or parallel port interface

Dept of E&C, Sir MVIT, Bengaluru

Page 77

Modified DA based DWT-IDWT on FPGA for Image Compression Truth table of LUT3 I1

Column1 Column2 Column3 I2 IO O 0 0 0 0 0 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 Table B.2: Truth table of LUT3

Figure B.3: Internal structure of a basic LUT3[9]

Figure B.4: Karnaugh Map for LUT3[9]

Dept of E&C, Sir MVIT, Bengaluru

Page 78

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure B.5: I/O Connections to Peripheral Devices[8]

Multi-Gigabit Transceivers
Four of the eight Multi-Gigabit Transceivers (MGTs) that are present in the Virtex-II Pro FPGA are brought out to connectors and can be utilized by the user. Three of the bidirectional MGT channels are terminated at Serial Advanced Technology Attachment (SATA) connectors and the fourth channel terminates at user-supplied SubMiniature A (SMA) connectors. The MGT transceivers are equipped with a 75 MHz clock source that is independent for the system clock to support standard SATA communication. An additional MGT clock source is available through a differential usersupplied (SMA) connector pair. Two of the ports with SATA connectors are configured as Host ports and the third SATA port is configured as a Target port to allow for simple board-to-board networking. [8]

Dept of E&C, Sir MVIT, Bengaluru

Page 79

Modified DA based DWT-IDWT on FPGA for Image Compression

Figure B.6: SMA-based MGT Connections

MGT Signal Location PAD Name I/O Pin SATA_PORT0_TXN MGT_X0Y1 TXNPAD4 A27 SATA_PORT0_TXP MGT_X0Y1 TXPPAD4 A26 SATA_PORT0_RXN MGT_X0Y1 RXNPAD4 A24 SATA_PORT0_RXP MGT_X0Y1 RXPPAD4 A25 SATA_PORT0_IDLE B15 SATA_PORT1_TXN MGT_X1Y1 TXNPAD6 A20 SATA_PORT1_TXP MGT_X1Y1 TXPPAD4 A19 SATA_PORT1_RXN MGT_X1Y1 RXNPAD6 A17 SATA_PORT1_RXP MGT_X1Y1 RXPPAD6 A18 SATA_PORT1_IDLE AK3 SATA_PORT2_TXN MGT_X2Y1 TXNPAD7 A14 SATA_PORT2_TXP MGT_X2Y1 TXPPAD7 A13 SATA_PORT2_RXN MGT_X2Y1 RXNPAD7 A11 SATA_PORT2_RXP MGT_X2Y1 RXPPAD7 A12 SATA_PORT2_IDLE C15 MGT_TXN MGT_X3Y1 TXNPAD9 A7 MGT_TXP MGT_X3Y1 TXPPAD9 A6 MGT_RXN MGT_X3Y1 RXNPAD9 A4 MGT_RXP MGT_X3Y1 RXPPAD9 A5 MGT_CLK_N G16 MGT_CLK_P F16 EXTERNAL_CLOCK_N F15 EXTERNAL_CLOCK_P G15 Table B.3: SATA and MGT Signals

Notes HOST TARGET HOST USER BREFCLK BREFCLK2

Dept of E&C, Sir MVIT, Bengaluru

Page 80

Modified DA based DWT-IDWT on FPGA for Image Compression

System RAM
The XUP Virtex-II Pro Development System has provision for the installation of user supplied JEDEC-standard 184-pin dual in-line Double Data Rate Synchronous Dynamic RAM memory module. The board supports buffered and unbuffered memory modules with a capacity of 2 GB or less in either 64-bit or 72-bit organizations. The 72bit organization should be used if ECC error detection and correction is required.

System ACE Compact Flash Controller


The System Advanced Configuration Environment (System ACE) Controller manages FPGA configuration data. The controller provides an intelligent interface between an FPGA target chain and various supported configuration sources. The controller has several ports: the Compact Flash port, the Configuration JTAG port, the Microprocessor (MPU) port and the Test JTAG port. The XUP Virtex-II Pro Development System supports a single System ACE Controller. The Configuration JTAG ports connect to the FPGA and front expansion connectors. The Test JTAG port connects to the JTAG port header and USB2 interface CPLD, and the MPU ports connect directly to the FPGA. [8]

Serial Ports
The XUP Virtex-II Pro Development System provides three serial ports: a single RS-232 port and two PS/2 ports. The RS-232 port is configured as a DCE with hardware handshake using a standard DB-9 serial connector. This connector is typically used for communications with a host computer using a standard 9-pin serial cable connected to a COM port. The two PS/2 ports could be used to attach a keyboard and mouse to the XUP Virtex-II Pro Development System. All of the serial ports are equipped with level-shifting circuits, because the Virtex-II Pro FPGAs cannot interface directly to the voltage levels required by RS-232 or PS/2.

User LEDs, Switches, and Push Buttons


A total of four LEDs are provided for user-defined purposes. When the FPGA drives a logic 0, the corresponding LED turns on. A single four-position DIP switch and five push buttons are provided for user input. If the DIP switch is up, closed, or on, or the push button is pressed, a logic 0 is seen by the FPGA, otherwise a logic 1 is indicated. [8] Dept of E&C, Sir MVIT, Bengaluru Page 81

Modified DA based DWT-IDWT on FPGA for Image Compression

Table B.4: System Configuration Status LEDs

Expansion Connectors
A total of 80 Virtex-II Pro I/O pins are brought out to four user-supplied 60-pin headers and two 40-pin right angle connectors for user-defined use. The 60-pin headers are designed to accept ribbon-cable connectors, with every second signal a ground for signal integrity. Some of these signals are shared with the front-mounted right-angle connectors. The front-mounted connectors support Digilent expansion modules. In addition, a highspeed connector is provided to support Digilent high-speed expansion modules. This connector provides 40 single-ended or differential I/O signals in addition to three clocks. [8]

XSGA Output
The XUP Virtex-II Pro Development System includes a video DAC and 15-pin highdensity D-sub connector to support XSGA output. The video DAC can operate with a pixel clock of up to 180 MHz. This allows for a VESA-compatible output of 1280 x 1024 at 75 Hz refresh and a maximum resolution of 1600 x 1200 at 70 Hz refresh[8].

Dept of E&C, Sir MVIT, Bengaluru

Page 82

Modified DA based DWT-IDWT on FPGA for Image Compression DCM and XSGA Controller Settings for Various XSGA Formats

Table B.5: DCM and XSGA Controller settings for various XSGA Formats

USB 2 Programming Interface


The XUP Virtex-II Pro Development System includes an embedded USB 2.0 microcontroller capable of communications with either high-speed (480 Mb/s) or fullspeed (12 Mb/s) USB hosts. This interface is used for programming or configuring the Virtex-II Pro FPGA in Boundary-Scan (IEEE 1149.1/IEEE 1532) mode. Target clock speeds are selectable from 750 kHz to 24 MHz. The USB 2.0 microcontroller attaches to a desktop or laptop PC with an off-the-shelf high-speed A-B USB cable[8]. Dept of E&C, Sir MVIT, Bengaluru Page 83

Modified DA based DWT-IDWT on FPGA for Image Compression

Table B.6: XSGA Output Connections

Using the CPU Debug Port and CPU Reset


The CPU Debug port (J36) is a right angle header that provides connections to the debugging resources of the PowerPC 405 CPU core[8]. The PowerPC 405 CPU cores include dedicated debug resources that support a variety of debug modes for debugging during hardware and software development. These debug resources include: Internal debug mode for use by ROM monitors and software debuggers External debug mode for use by JTAG debuggers Dept of E&C, Sir MVIT, Bengaluru Page 84

Modified DA based DWT-IDWT on FPGA for Image Compression Debug wait mode, which allows the servicing of interrupts while the processor appears to be stopped Real-time trace mode, which supports event triggering for real time tracing Debug modes and events are controlled using debug registers in the processor. The debug

registers are accessed either through software running on the processor or through the JTAG port. The debug modes, events, controls, and interfaces provide a powerful combination of debug resources for hardware and software development tools. The JTAG port interface supports the attachment of external debug tools, such as the powerful ChipScope Integrated Logic Analyzer, a powerful tool providing logic analyzer capabilities for signals inside an FPGA, without the need for expensive external instrumentation. Using the JTAG test access port, a debug tool can single-step the processor and examine the internal processor state to facilitate software debugging. This capability complies with standard JTAG hardware for boundary scan system testing. External debug mode can be used to alter normal program execution. It provides the ability to debug system hardware as well as software. The mode supports multiple functions: starting and stopping the processor, single-stepping instruction execution, setting breakpoints, as well as monitoring processor status. Access to processor resources is provided through the CPU Debug Port. The PPC405 JTAG Debug Port supports the four required JTAG signals: CPU_TCK, CPU_TMS, CPU_TDO, and CPU_TDI. It also implements the optional CPU_TRST signal. The frequency of the JTAG clock signal, CPU_TCK, can range from 0 MHz up to one-half of the processor clock frequency. The JTAG debug port logic is reset at the same time the system is reset, using the CPU_TRST signal. When CPU_TRST is asserted, the JTAG TAP controller returns to the test-logic reset state.

Figure B.7: CPU Debug Connector Pinouts

Dept of E&C, Sir MVIT, Bengaluru

Page 85

Modified DA based DWT-IDWT on FPGA for Image Compression Figure B.7 shows the pinout of the header used to debug the operation of software in the CPU. This is accomplished using debug tools, such as the Xilinx Parallel Cable IV or third party tools. The JTAG debug resources are not hardwired to specific pins and are available for attachment in the FPGA fabric, making it possible to route these signals to whichever FPGA pins the user prefers to use. The signal-pin connections used on the XUP Virtex- II Pro Development System are identified in Table B.7 along with the recommended I/O characteristics. Level shifting circuitry is provided for all signals to convert from the 3.3V levels at the connector to the 2.5V levels at the FPGA.[8]

Table B.7: CPU Debug Port Connections and CPU Reset The RESET_RELOAD pushbutton (SW1) provides two different functions depending on how long the switch is depressed. If the switch is activated for more than 2 seconds, the XUP Virtex-II Pro Development System undergoes a complete reset and reloads the selected configuration. If, however, the switch is activated for less than 2 seconds, aprocessor reset pulse of 100 microseconds is applied to the

PROCESSOR_RESET_Z signal.[8]

Configuring the FPGA:


At power up, or when the RESET_RELOAD push button (SW1) is pressed for longer than 2 seconds, the FPGA begins to configure. The two configuration methods supported, JTAG and master SelectMAP, are determined by the CONFIG SOURCE switch, the most significant switch (left side) of SW9. If the CONFIG SOURCE switch is closed, on, or up, a high-speed SelectMap byte-wide configuration from the on-board Platform Flash configuration PROM (U3) is selected as the configuration source. This is identified to the user through the illumination of the PROM CONFIG LED (D19). Dept of E&C, Sir MVIT, Bengaluru Page 86

Modified DA based DWT-IDWT on FPGA for Image Compression The Platform Flash configuration PROM supports two different FPGA configurations (versions) selected by the position of the PROM VERSION switch, the least significant switch (right side) of SW9. If the PROM VERSION switch is closed, on, or up, the GOLDEN configuration from the onboard Platform Flash configuration PROM is selected as the configuration data. This is identified to the user through the illumination of the GOLDEN CONFIG LED (D14). This configuration can be a board test utility provided by Xilinx, or another safe default configuration. It is important to note that the PROM VERSION switch is only sampled on board powerup and after a complete system reset. This means that if this switch is changedafter board powerup, the RESET_RELOAD pushbutton (SW1) must be pressed for more than 2 seconds for the new state of the switch to be recognized. If the PROM VERSION switch is open, off, or down, a User configuration from the on-board Platform Flash configuration PROM is selected as the configuration data. This configuration must be programmed into the Platform Flash PROM from the JTAG The Platform Cable USB interface or the USB interface. The Platform Flash is normally disabled after the FPGA is finished configuring and has asserted the DONE signal. If additional data is made available to the FPGA after the completion of configuration, jumper JP9 must be moved from the NORMAL to the EXTENDED position to permanently enable the PROM and allow the FPGA to clock out the additional data using the FPGA_PROM_CLOCK signal. If the CONFIG SOURCE switch is open, off, or down, a lower speed JTAG-based configuration from Compact Flash or external JTAG source is selected as the configuration source. This is identified to the user through the illumination of the JTAG CONFIG LED (D20). The JTAG-based configuration can originate from several sources: the Compact Flash card, a PC4 cable connection through J27, and a USB to PC connection through J8 the embedded Platform Cable USB interface. If a JTAG-based configuration is selected, the default source is from the Compact Flash port (J7). The System ACE controller checks the associated Compact Flash socket and storage device for the existence of configuration data. If configuration data exists on the storage device, the storage device becomes the source for the configuration data. The file structure on the Compact Flash storage device Dept of E&C, Sir MVIT, Bengaluru Page 87

Modified DA based DWT-IDWT on FPGA for Image Compression supports up to eight different configuration data files, selected by the triple CF CONFIG SELECT DIP switch (SW8). During JTAG configuration, the SYSTEMACE STATUS LED (D12) flashes until the configuration process is completed, and the FPGA asserts the FPGA_DONE signal and illuminates the DONE LED (D4). At any time, the RESET_RELOAD pushbutton (SW1) can be used to load any of the eight different configuration data files by pressing the switch for more than 2 seconds. If a JTAG-based configuration is selected and a valid configuration file is not found on the Compact Flash card by the System ACE controller (U2), the SYSTEMACE ERROR LED (D11) flashes, and the System ACE controller connects to an external JTAG port for FPGA configuration. The default external source for FPGA configuration is the high-speed embedded Platform Cable USB configuration port (J8) and is enabled when the System ACE controller does not find configuration data on the storage device. If a USB-equipped host PC is not available as a configuration source, then a Parallel Cable 4 (PC4) interface can be used instead by connecting a PC4 cable to J27. Flash configuration PROM is enabled, the FPGA Start-Up Clock should be set to CCLK in the Startup Options section of the Process Options for the generation of the programming file, otherwise JTAG Clock should be selected.[8]

Figure B.8: Configuration data path Dept of E&C, Sir MVIT, Bengaluru Page 88

Modified DA based DWT-IDWT on FPGA for Image Compression

Table B.8: System Configuration Status LEDs

Four status LEDs show the configuration state of the XUP Virtex-II Pro Development System at all times. The user can see the configuration source, configuration version, and tell when the configuration has completed from the status LEDs shown in Table B-8.

Dept of E&C, Sir MVIT, Bengaluru

Page 89

Modified DA based DWT-IDWT on FPGA for Image Compression

References
[1] Rafael C. Gonzalez, University of Tennessee and Richard E. Woods, MedData Interactive, Digital Image Processing, Pearson Prentice Hall, 3 edition, 2009.

[2]

Performance Analysis of Image Compression Using Wavelets by Sonja Grgic, Mislav Grgic, and Branka Zovko-Cihlar IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 48, NO. 3, JUNE 2001

[3]

JPEG official website,-www.jpeg.org/jpeg2000.html

[4]

Performance Analysis of Image Compression Using Wavelets by Sonja Grgic, Mislav Grgic, and Branka Zovko-Cihlar IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, VOL. 48, NO. 3, JUNE 2001

[5]

An Efficient VLSI Implementation of Distributed Architecture for DWT by Xixin Cao, Qingqing Xie from School of Software and Microelectronics, Peking University,Beijing, China

[6]

Matlab support for Image Compression from http://www.mathworks.nl/matlabcentral/fileexchange/4772

[7]

http://www.support.xilinx.com/support/techsup/tutorials

[8]

Virtex-II Pro Datasheet http://www.xilinx.com/support/documentation/virtexii_pro_data_sheets.htm

[9]

Xilinx-XST software toolbar help

Dept of E&C, Sir MVIT, Bengaluru

Page 90