Sie sind auf Seite 1von 89



Digital imagery has had an enormous impact on industrial applications and

scientific applications. It is no surprise that image coding has been a subject of great
commercial interest in today’s world. Generally an image is a positive function on a
plane. The value of this function at each point specifies the luminance or brightness of the
picture at that point. Digital images are sample versions of such functions, where the
value of the function is specified only at discrete locations on the image plane, known as
pixels. The value of the luminance at each pixel is represented to a predefined precision
M. Eight bits of precision for luminance is common on imaging applications. The eight-
bit precision is motivated by both the existing memory structures (1 byte=8 bits) as well
as the dynamic range of the human eye. The prevalent custom is that the samples (pixels)
reside on a rectangular lattice, which will be assumed for convenience to be N x N
matrix. The brightness value at each pixel is a number between 0 and 2M-1 .The simplest
binary representation of such an image is a list of the brightness values at each pixel, a
list containing N2M bits.
In some image processing applications, exact reproduction of the image bits is
not necessary. In this case, one can perturb the image slightly to obtain a shorter
representation. Such a code procedure, where perturbations reduce storage requirements
is known as Image lossy coding. Where as in some other applications exact reproduction
of the image is necessary. This is known as lossless compression. However to improve
the time of transmission and the overall efficiency of system the Image is always
compressed before transmission.
Uncompressed Image data requires considerable storage capacity and
transmission bandwidth. Despite rapid progress in mass-storage density, processor
speeds, and digital communication system performance, demand for data storage capacity
and data-transmission bandwidth continues to outstrip the capabilities of available
technologies. The recent growth of data intensive multimedia-based web applications has
sustained the need for more efficient way to compress the images.
The wavelet coding method has been recognized as an efficient coding technique for
compression. The wavelet transform decomposes a typical image data to a few
coefficients with large magnitude and many coefficients with small magnitude. Since
most of the energy of the image concentrates on these coefficients with large magnitude,
lossy compression systems just using coefficients with large magnitude can realize both
high compression ratio and the reconstructed image with good quality at the same time. .3
An efficient transmission of data over a channel always remain a challenge as
the speed of transmission and the clarity always effect each other. As the speed of
transmission increases the bits transmitted has got higher tendency towards getting
corrupted. This problem is much severe when comes to image transmission, as the
images are represented in a large matrix dimension. Transmission of such big Matrices
with a high speed always results in poor clarity on reception. To overcome this problem
proper transformation of the image along with image compression is to be done, before
The main objective of this project is to use the wavelet transform for lossless data
Compression . In the proposed method, the lifting scheme which is the latest
implementation method of wavelet transformation, is adopted for lossless compression.
In the existing lossless compression methods to get better accuracy bitrate should be
compromised or to get better bitrate accuracy should be compromised. To overcome this
problem in the lossless compression lifting scheme has been proposed .Using lifting
scheme better accuracy can be achieved without compromising on bitrate.
The wavelet transform, which decomposes a signal into constituent parts in the
time-frequency domain, has been successful in providing high compression ratios while
maintaining good image quality. It is well known that the wavelets can execute
compression with a higher signal-to-noise ratio than DCT (discrete cosine transform) in
the case of low bit-rate. Then, JPEG2000, the next international standard, is being
developed based on wavelet transforms. However, the conventional wavelet transform
needs additional mechanisms to execute lossless compression, of which recovered data is
identical to the original one.
In contrast, the Lifting scheme, which is the latest implementation method of
wavelet transform, doesn't have such problems and is very suitable for the lossless
compression. It can decorrelate the target data in the space domain without Fourier
transform. The considerable advantages of LS are,
(a) simple and fast procedure,
(b) ease of treating integer number, and
(c) ease of obtaining inverse transform.
The advantage (a) means that LS is exceedingly suitable for hardware
implementation, because it uses only addition and multiplication, and requires small
memory for calculations.
From the viewpoint of (b), the conventional wavelet transform has a problem in
mapping integers to integers. It needs an additional mechanism to cancel the rounding
error for lossless compression. LS is, on the other hand, feasible to lossless compression,
because it does not require such mechanisms to treat integer data.
The last advantage (c) makes LS useful in practical implementation.
The lifting scheme is realized in a modular approach with three main modules
namely the split module, predict module, and update module. These modules are
integrated to the image compression system for performing loss less compression.
In chapter one we give brief introduction to image compression and the problems
happening with image compression and revalent survey to image compression. In chapter
two overview of the proposed system is given and brief description to image
compression with types of image compression i.e description about the lossy image
compression and lossless image compression and different types of lossy and lossless
compression . Wavelet compression is discussed and the overview of the lifting scheme
is given. In chapter three the design of the system is discussed .The lifting algorithm is
explained in detail various phases like split , predict and update is explained. In chapter
four implementation of the system is described.



Compression is a way of encoding digital data so it takes up less storage space

and requires less network bandwidth to be transmitted. To save bandwidth, data image
files are compressed before being transmitted over a network. That’s helpful when
transmitting E-mail or downloading information from the Internet, because compressed
files take up less space and take less time to transfer. Compression eliminates
unnecessary information, such as empty field and redundant data. A compressed file
must be expanded to its original size or a size close to it before it can be used. Whether
one needs to send images over the Internet or squeeze multiple files into a single file for
backup and storage; image compression components will give us the code which one
needs to make the images lean and mean and make ones applications run much faster.
There are two basic types of compression: LOSSY compression and LOSSLESS
compression. One of the important aspects of image storage is its efficient compression.
For example, An image, 1024 pixel x 1024 pixel x 24 bit, without compression, would
require 3 MB of storage and 7 minutes for transmission, utilizing a high speed, 64 Kbit/s,
ISDN line. If the image is compressed at a 10:1 compression ratio, the storage
requirement is reduced to 300 KB and the transmission time drops to under 6 seconds.
Seven 1 MB images can be compressed and transferred to a floppy disk in less time than
it takes to send one of the original files, uncompressed, over an AppleTalk network.
In a distributed environment large image files remain a major bottleneck within
systems. Compression is an important component of the solutions available for creating
file sizes of manageable and transmittable dimensions. Increasing the bandwidth is
another method, but the cost sometimes makes this a less attractive solution. Platform
portability and performance are important in the selection of the
compression/decompression technique to be employed. Compression solutions today are
more portable due to the change from proprietary high-end solutions to accepted and
implemented international standards.
Two categories of data compression algorithm can be distinguished: lossless and
lossy. Lossy techniques cause image quality degradation in each
compression/decompression step. Careful consideration of the human visual perception
ensures that the degradation is often unrecognizable, though this depends on the selected
compression ratio. In general, lossy techniques provide far greater compression ratios
than lossless techniques.
The storage requirement for uncompressed video is 23.6 Megabytes/second (512
pixels x 512 pixels x 3 bytes/pixel x 30 frames/second). With MPEG compression, full-
motion video can be compressed down to 187 kilobytes/second at a small sacrifice in
quality. As computer graphics attain higher resolution and image processing applications
require higher intensity resolution (more bits per pixel), the need for image compression
will increase. Medical imagery is a prime example of images increasing in both spatial
resolution and intensity resolution. Although humans don't need more than 8 bits per
pixel to view gray scale images, computer vision can analyze data of much higher
intensity resolutions.
Compression ratios are commonly present in discussions of data compression. A
compression ratio is simply the size of the original data divided by the size of the
compressed data. A technique that compresses a 1 megabyte image to 100 kilobytes has
achieved a compression ratio of 10.compression ratio = original data/compressed data = 1
M bytes/ 100 k bytes = 10.0
For a given image, the greater the compression ratio, the smaller the final image will be.


There are two basic types of image compression: lossless compression and lossy
compression. A lossless scheme encodes and decodes the data perfectly, and the resulting
image matches the original image exactly. There is no degradation in the process-no data
is lost.
Lossy compression schemes allow redundant and nonessential information to be
lost. Typically with lossy schemes there is a tradeoff between compression and image
quality. You may be able to compress an image down to an incredibly small size but it
looks so poor that it isn't worth the trouble. Though not always the case, lossy
compression techniques are typically more complex and require more computations.
Lossy image compression schemes remove data from an image that the human eye
wouldn't notice. This works well for images that are meant to be viewed by humans. If
the image is to be analyzed by a machine, lossy compression schemes may not be
appropriate. Computers can easily detect the information loss that the human eye may
not. The goal of lossy compression is that the final decompressed image be visually
lossless. Hopefully, the information removed from the image goes unnoticed by the
human eye.
Many people associate huge degradations with lossy image compression. What
they don't realize is that the most of the degradations are small if even noticeable. The
entire imaging operation is lossy, scanning or digitizing the image is a lossy process, and
displaying an image on a screen or printing the hardcopy is lossy. The goal is to keep the
losses indistinguishable.
Which compression technique to use depends on the image data. Some images,
especially those used for medical diagnosis, cannot afford to lose any data. A lossless
compression scheme will need to be used. Computer generated graphics with large areas
of the same color compress well with simple lossless schemes like run length encoding or
LZW. Continuous tone images with complex shapes and shading will require a lossy
compression technique to achieve a high compression ratio. Images with a high degree of
detail that can't be lost, such as detailed CAD drawings, cannot be compressed with lossy
When choosing a compression technique, you must look at more than the
achievable compression ratio. The compression ratio alone tells you nothing about the
quality of the resulting image. Other things to consider are the
compression/decompression time, algorithm complexity, cost and availability of
computational resources, and how standardized the technique is. If you use a compression
method that achieves fantastic compression ratios but you are the only one using it, you
will be limited in your applications. If your images need to be viewed by any hospital in
the world, you better use a standardized compression technique and file format.
If the compression/decompression will be limited to one system or set of systems you
may wish to develop your own algorithm. The algorithms presented in this chapter can be
used like recipes in a cookbook. Perhaps there are different aspects you wish to draw
from different algorithms and optimize them for your specific application

Input Data storage

stream transmission

Figure : A typical data compression system.

Before presenting the compression algorithms, it is needed to define a few terms

used in the data compression world. A character is a fundamental data element in the
input stream. It may be a single letter of text or a pixel in an image file. Strings are
sequences of characters. The input stream is the source of the uncompressed data to be
compressed. It may be a data file or some communication medium. Code words are the
data elements used to represent the input characters or character strings. Also the term
encoding to mean compressing is used. As expected, decoding and decompressing are the
opposite terms. In many of the following discussions, ASCII strings is used as data set.
The data objects used in compression could be text, binary data, or in our case, pixels. It
is easy to follow a text string through compression and decompression examples.


Lossless coding guaranties that the decompressed image is absolutely identical to

the image before compression. This is an important requirement for some application
domains, e.g. medial imaging, where not only high quality is in demand, but unaltered
archiving is a legal requirement. Lossless techniques can also used for the compression of
other data types where loss of information is not acceptable, e.g. text documents and
program executables. Some compression methods can be made more effective by adding
a 1D or 2D delta coding to the process of compression. These deltas make more
effectively use of run length encoding, have (statistically) higher maxima in code tables
(leading to better results in Huffman and general entropy codings), and build greater
equal value areas usable for area coding.

Some of these methods can easily be modified to be lossy. Lossy element fits
perfectly into 1D/2D run length search. Also, logarithmic quantisation may be inserted to
provide better or more effective results.

2.3.1 Run length encoding

Run length encoding is a very simple method for compression of sequential data.
It takes advantage of the fact that, in many data streams, consecutive single tokens are
often identical. Run length encoding checks the stream for this fact and inserts a special
token each time a chain of more than two equal input tokens are found. This special input
advises the decoder to insert the following token n times into his output stream. Run
length coding is easily implemented, either in software or in hardware. It is fast and very
well verifiable, but its compression ability is very limited.

2.3.2 Entropy coding (Lempel/Ziv)

Nowadays, there is a wide range of so-called modified Lempel/Ziv codings. These

algorithms all have a common way of working. The coder and the decoder both build up
an equivalent dictionary of meta symbols, each of which represents a whole sequence of
input tokens. If a sequence is repeated after a symbol was found for it, then only the
symbol becomes part of the coded data and the sequence of tokens referenced by the
symbol becomes part of the decoded data later. As the dictionary is build up based on the
data, it is not necessary to put it into the coded data, as it is with the tables in a Huffman
This method becomes very efficient even on virtually random data. The average
compression on text and program data is about 1:2, the ratio on image data comes up to
1:8 on the average GIF image. Here again, a high level of input noise degrades the
efficiency significantly. Entropy coders are a little tricky to implement, as there are
usually a few tables, all growing while the algorithm runs. LZ coding is subject to patents
owned by IBM and Unisys (formerly Sperry).

2.3.3 Area coding

Area coding is an enhanced form of run length coding, reflecting the two
dimensional character of images. This is a significant advance over the other lossless
methods. For coding an image it does not make too much sense to interpret it as a
sequential stream, as it is in fact an array of sequences, building up a two dimensional
object. Therefore, as the two dimensions are independent and of same importance, it is
obvious that a coding scheme aware of this has some advantages. The algorithms for area
coding try to find rectangular regions with the same characteristics. These regions are
coded in a descriptive form as an Element with two points and a certain structure. The
whole input image has to be described in this form to allow lossless decoding afterwards.

The possible performance of this coding method is limited mostly by the very high
complexity of the task of finding largest areas with the same characteristics. Practical
implementations use recursive algorithms for reducing the whole area to equal sized sub
rectangles until a rectangle does fulfill the criteria defined as having the same
characteristic for every pixel.

This type of coding can be highly effective but it bears the problem of a nonlinear
method, which cannot be implemented in hardware. Therefore, the performance in terms
of compression time is not competitive, although the compression ratio is.


In most of applications we have no need in the exact restoration of stored image. This
fact can help to make the storage more effective, and this way we get to lossy
compression methods. Lossy image coding techniques normally have three components:

• Image modelling which defines such things as the transformation to be applied to

the image
• Parameter quantization whereby the data generated by the transformation is
quantised to reduce the amount of information
• encoding, where a code is generated by associating appropriate code words to the
raw data produced by the quantiser.

Each of these operations is in some part responsible for the compression. Image
modelling is aimed at the exploitation of statistical characteristics of the image (i.e. high
correlation, redundancy). Typical examples are transform coding methods, in which the
data is represented in a different domain (for example, frequency in the case of the
Fourier Transform, the Discrete Cosine Transform ,the Kahrunen-Loewe Transform, and
so on), where a reduced number of coefficients contains most of the original information.
In many cases this first phase does not result in any loss of information.

The aim of quantisation is to reduce the amount of data used to represent the information
within the new domain. quantisation is in most cases not a reversible operation: therefore,
it belongs to the so called 'lossy' methods.

Encoding is usually error free. It optimises the representation of the information (helping,
sometimes, to further reduce the bit rate), and may introduce some error detection codes.

2.4.1 Transform coding (DCT/Wavelets/)

A general transform coding scheme involves subdividing an NxN image into smaller nxn
blocks and performing a unitary transform on each sub image. A unitary transform is a
reversible linear transform whose kernel describes a set of complete, orthonormal discrete
basic functions. The goal of the transform is to decorrelate the original signal, and this
decorrelation generally results in the signal energy being redistributed among only a
small set of transform coefficients. In this way, many coefficients may be discarded after
quantisation and prior to encoding. Also, visually lossless compression can often be
achieved by incorporating the HVS contrast sensitivity function in the quantisation of the

Transform coding can be generalized into four stages:

• image subdivision
• image transformation
• coefficient quantisation
• Huffman encoding.

For a transform coding scheme, logical modelling is done in two steps: a segmentation
one, in which the image is subdivided in bidimensional vectors (possibly of different
sizes) and a transformation step, in which the chosen transform (e.g. KLT, DCT,
Hadamard) is applied.

Quantisation can be performed in several ways. Most classical approaches use 'zonal
coding', consisting in the scalar quantisation of the coefficients belonging to a predefined
area (with a fixed bit allocation), and 'threshold coding', consisting in the choice of the
coefficients of each block characterised by an absolute value exceeding a predefined
threshold. Another possibility, that leads to higher compression factors, is to apply a
vector quantisation scheme to the transformed coefficients.

2.4.2 Huffman encoding

This algorithm, developed by D.A. Huffman, is based on the fact that in an input stream
certain tokens occur more often than others. Based on this knowledge, the algorithm
builds up a weighted binary tree according to their rate of occurrence. Each element of
this tree is assigned a new code word, where at the length of the code word is determined
by its position in the tree. Therefore, the token which is most frequent and becomes the
root of the tree is assigned the shortest code. Each less common element is assigned a
longer code word. The least frequent element is assigned a code word which may have
become twice as long as the input token.

The compression ratio achieved by Huffman encoding uncorrelated data becomes

something like 1:2. On slightly correlated data, as on images, the compression rate may
become much higher, the absolute maximum being defined by the size of a single input
token and the size of the shortest possible output token (max. compression = token
size[bits]/2[bits]). While standard palletised images with a limit of 256 colours may be
compressed by 1:4 if they use only one colour, more typical images give results in the
range of 1:1.2 to 1:2.5.

2.4.3Vector quantisation

A vector quantiser can be defined mathematically as a transform operator T from a K-

dimensional Euclidean space R^K to a finite subset X in R^K made up of N vectors. This
subset X becomes the vector codebook, or, more generally, the codebook. An optimum
scalar quantiser was proposed by Lloyd and Max. Later on, Linde, Buzo and Gray
resumed and generalized this method, extending it to the case of a vector quantiser. The
algorithm that they proposed is derived from the KNN clusterisation method, and is
performed by iterating the following basic operations:

• subdivide the training set into N groups (called 'partitions' or 'Voronoi regions'),
which are associated with the N codebook letters, according to a minimum
distance criterion;
• the centroids of the Voronoi regions become the updated codebook vectors;
• compute the average distortion: if the percent reduction in the distortion (as
compared with the previous step) is below a certain threshold, then STOP.

Once the codebook has been designed, the coding process simply consists in the
application of the T operator to the vectors of the original image. In practice, each group
of n pixels will be coded as an address in the vector codebook, that is, as a number from 1
to N.
2.4.4 Segmentation and approximation methods

With segmentation and approximation coding methods, the image is modelled as a

mosaic of regions, each one characterised by a sufficient degree of uniformity of its
pixels with respect to a certain feature (e.g. grey level, texture); each region then has
some parameters related to the characterising feature associated with it.

The operations of finding a suitable segmentation and an optimum set of approximating

parameters are highly correlated, since the segmentation algorithm must take into account
the error produced by the region reconstruction (in order to limit this value within
determined bounds). These two operations constitute the logical modelling for this class
of coding schemes; quantisation and encoding are strongly dependent on the statistical
characteristics of the parameters of this approximation (and, therefore, on the
approximation itself).

Classical examples are polynomial approximation and texture approximation. For

polynomial approximation regions are reconstructed by means of polynomial functions in
(x, y); the task of the encoder is to find the optimum coefficients. In texture
approximation, regions are filled by synthesizing a parameterized texture based on some
model (e.g. fractals, statistical methods, Markov Random Fields [MRF]). It must be
pointed out that, while in polynomial approximations the problem of finding optimum
coefficients is quite simple (it is possible to use least squares approximation or similar
exact formulations), for texture based techniques this problem can be very complex.

2.4.5 Comparison of Different Compression Methods

During the last years, some standardization processes based on transform coding, such as
JPEG, have been started. Performances of such a standard are quite good if compression
factors are maintained under a given threshold (about 20 times). Over this threshold,
artifacts become visible in the reconstruction and tile effect affects seriously the images
decoded, due to quantisation effects of the DCT coefficients.
On the other hand, there are two advantages: first, it is a standard, and second, dedicated
hardware implementations exist. For applications which require higher compression
factors with some minor loss of accuracy when compared with JPEG, different
techniques should be selected such as wavelets coding or spline interpolation, followed
by an efficient entropy encoder such as Huffman, arithmetic coding or vector
quantisation. Some of these coding schemes are suitable for progressive reconstruction
(Pyramidal Wavelet Coding, Two Source Decomposition, etc). This property can be
exploited by applications such as coding of images in a database, for previewing purposes
or for transmission on a limited bandwidth channel.

The fundamental idea behind wavelets is to analyze the signal at different scales

or resolutions, which is called multiresolution. Wavelets are a class of functions used to

localize a given signal in both space and scaling domains. A family of wavelets can be

constructed from a mother wavelet. Compared to Windowed Fourier analysis, a mother

wavelet is stretched or compressed to change the size of the window. In this way, big

wavelets give an approximate image of the signal, while smaller and smaller wavelets

zoom in on details. Therefore, wavelets automatically adapt to both the high frequency

and the low-frequency components of a signal by different sizes of windows. Any small

change in the wavelet representation produces a correspondingly small change in the

original signal, which means local mistakes are not influence the entire transform. The

wavelet transform is suited for nonstationary signals, such as very brief signals and

signals with interesting components at different scales.

Wavelets are functions generated from one single function ψ, which is called

mother wavelet, by dilations and translations

Where ψ must satisfy ∫ ψ (x) dx = 0.

The basic idea of wavelet transform is to represent any arbitrary function f as a

decomposition of the wavelet basis or write f as an integral over a and b of ψa,b .

Let with m, n € integers, and a0>1,b0>0 fixed. Then the wavelet

decomposition is

In image compression, the sampled data are discrete in time. It is required to have

discrete representation of time and frequency, which is called the discrete wavelet

transform (DWT).

Wavelet Transform (WT) was used to analyze non-stationary signals, i.e., whose

frequency response varies in time. Although the time and frequency resolution problems

are results of a physical phenomenon and exist regardless of the transform used, it is

possible to analyze any signal by using an alternative approach called the multi resolution

analysis (MRA). MRA analyzes the signal at different frequencies with different

resolutions. MRA are basically designed to give good time resolution and poor frequency

resolution at high frequencies and good frequency resolution and poor time resolution at

low frequencies. This approach is useful especially when the signal considered has high

frequency components for short durations and low frequency components for long

durations. Which are basically used in practical applications.


A continuous wavelet transform is given as:

---------- (1)

Where * denotes complex conjugation. From this equation it can be seen how a function f

(t) is decomposed into a set of basis functions, Ψ s, τ (t) called the wavelets. The variables

s and t , scale and translation, are the new dimensions after the wavelet transform. For

completeness sake (2) gives the inverse wavelet transform.

--------- (2)

The wavelets are generated from a single basic wavelet (t), the so-called mother

wavelet, by scaling and translation:

---------- (3)

In (3) s is the scale factor is the translation factor and the factor s-1/2 is for energy

normalization across the different scales. It is important to note that in (1), (2) and (3) the

wavelet basis functions are not specified. This is a difference between the wavelet

transform and the Fourier transform, or other transforms. The theory of wavelet

transforms deals with the general properties of the wavelets and wavelet transforms only.

It defines a framework within one can design wavelets to taste and wishes.

The wavelet transform has three properties that make it difficult to use directly in

the form of (1). The first is the redundancy of the CWT. In (1) the wavelet transform is

calculated by continuously shifting a continuously scalable function over a signal and

calculating the correlation between the two. It is seen that these scaled functions is
nowhere near an orthogonal basis and the obtained wavelet coefficients is therefore be

highly redundant. For most practical applications this redundancy is removed.

Even without the redundancy of the CWT one still have an infinite number of

wavelets in the wavelet transform and would like to see this number reduced to a more

manageable count. This is the second problem.

The third problem is that for most functions the wavelet transforms have no

analytical solutions and they can be calculated only numerically or by an optical analog

computer. Fast algorithms are needed to be able to exploit the power of the wavelet

transform and it is in fact the existence of these fast algorithms that have put wavelet

transforms where they are today.

Let us start with the removal of redundancy.

As mentioned before the CWT maps a one-dimensional signal to a two-

dimensional time-scale joint representation that is highly redundant. The time-bandwidth

product of the CWT is the square of that of the signal and for most applications, which

seek a signal description with as few components as possible, this is not efficient. To

overcome this problem discrete wavelets have been introduced.

Discrete wavelets are not continuously scalable and translatable but can only be

scaled and translated in discrete steps. This is achieved by modifying the wavelet

Representation (3) to create

---------- (4)

Although it is called a discrete wavelet, it normally is a (piecewise) continuous

function. In (4) j and k are integers and s0 > 1 is a fixed dilation step. The translation
factor τ0 depends on the dilation step. The effect of discretizing the wavelet is that the

time-scale space is now sampled at discrete intervals. It is usually chosen s0 = 2 so that

the sampling of the frequency axis corresponds to dyadic sampling. This is a very natural

choice for computers, the human ear and music for instance. For the translation factor the

value is usually chosen τ0 = 1 so that a dyadic sampling of the time axis is obtained.

When discrete wavelets are used to transform a continuous signal the result is be a

series of wavelet coefficients, and it is referred to as the wavelet series decomposition.

An important issue in such a decomposition scheme is of course the question of

reconstruction. It is all very well to sample the timescale joint representation on a dyadic

grid, but if it is not be possible to reconstruct the signal it is not be of great use. As it

turns out, it is indeed possible to reconstruct a signal from its wavelet series

decomposition. It is proven that the necessary and sufficient condition for stable

reconstruction is that the energy of the wavelet coefficients must lie between two positive

bounds, i.e.

------- (5)

Where || f ||2 is the energy of f(t), A > 0, B < and A, B are independent of f(t). When (5) is

satisfied, the family of basis functions ψ j, k (t) with j, k € Z is referred to as a frame with

frame bounds A and B. When A = B the frame is tight and the discrete wavelets behave

exactly like an orthonormal basis. When A≠B exact reconstruction is still possible at the

expense of a dual frame. In a dual frame discrete wavelet transform the decomposition

wavelet is different from the reconstruction wavelet.

The last step that has to taken is making the discrete wavelets orthonormal. This

can be done only with discrete wavelets. The discrete wavelets can be made orthogonal to
their own dilations and translations by special choices of the mother wavelet, which


--------- (6)

An arbitrary signal can be reconstructed by summing the orthogonal wavelet basis

functions, weighted by the wavelet transform coefficient:

--------- (7)

Equation (7) shows the inverse wavelet transform for discrete wavelets, which is not yet

seen. Orthogonal is not essential in the representation of signals. The wavelets need not

be orthogonal and in some applications the redundancy can help to reduce the sensitivity

to noise or improve the shift invariance of the transform. This is a disadvantage of

discrete wavelets: the resulting wavelet transform is no longer shift invariant, which

means that the wavelet transforms of a signal and of a time-shifted version of the same

signal are not simply shifted versions of each other.

In many practical applications the signal of interest is sampled. In order to use the

results one have achieved so far with a discrete signal that have to make our wavelet

transform discrete too. Remember that our discrete wavelets are not time-discrete, only

the translation- and the scale step are discrete. Simply implementing the wavelet filter

bank as a digital filter bank intuitively seems to do the job. But intuitively is not good

Stated that the scaling function could be expressed in wavelets from minus

infinity up to a certain scale j. If added a wavelet spectrum to the scaling function

spectrum that is get a new scaling function, with a spectrum twice as wide as the first.

The effect of this addition is that one can express the first scaling function in terms of the

second, because all the information that is need to do this is contained in the second

scaling function. It can express this formally in the so-called multiresolution formulation

or two-scale relation

----------- (8)

The two-scale relation states that the scaling function at a certain scale can be

expressed in terms of translated scaling functions at the next smaller scale. Do not get

confused here: smaller scale means more detail. The first scaling function replaced a set

of wavelets and therefore one can also express the wavelets in this set in terms of

translated scaling functions at the next scale. More specifically it can be written for the

wavelet at level j. Which is the two-scale relation between the scaling function and the


--------- (9)

A signal f (t) could be expressed in terms of dilated and translated wavelets up to

a scale j-1, this leads to the result that f (t) can also be expressed in terms of dilated and

translated scaling functions at a scale j:

--------- (10)
To be consistent in the notation discrete scaling functions are to be considered,

since only discrete dilations and translations are allowed. If in this equation one step up a

scale to j-1, it had to add wavelets in order to keep the same level of detail. It can then

express the signal f (t) as

-------- (11)

If the scaling function φ j, k (t) and the wavelets ψ j, k (t) are orthonormal or a tight

frame, then the coefficients λ j-1(k) and γ j-1(k) are found by taking the inner products

----------- (12)

If φ j ,k (t) and ψ j ,k (t) are replaced in the inner products by suitably scaled and

translated versions of manipulate a bit, keeping in mind that the inner product can also

be written as an integration,

--------- (13)

--------- (14)

These two equations state that the wavelet- and scaling function coefficients on a

certain scale can be found by calculating a weighted sum of the scaling function

coefficients from the previous scale. Now recall from the section on the scaling function

that the scaling function coefficients came from a low pass filter and recall from the

section on sub band coding how it is iterated a filter bank by repeatedly splitting the low-
pass spectrum into a low-pass and a high-pass part. The filter bank iteration started with

the signal spectrum, so if the signal spectrum is the output of a low-pass filter at the

previous (imaginary) scale, then the sampled signal can be considered as the scaling

function coefficients from the previous (imaginary) scale. In other words, the sampled

signal f(k) is simply equal to (k) at the largest scale.

As known from signal processing theory a discrete weighted sum are the same as

a digital filter and since the coefficients λ j (k) come from the low-pass part of the splitted

signal spectrum, the weighting factors h (k) must form a low-pass filter. And since the

coefficients γ j (k) come from the high-pass part of the splitted signal spectrum, the

weighting factors g (k) must form a high-pass filter.

This means that they form one stage of an iterated digital filter bank and from

now on the coefficients h (k) is referred as the scaling filter and the coefficients g(k) as

the wavelet filter.

It is now certain that implementing the wavelet transform as an iterated digital

filter bank is possible and from now on one can speak of the discrete wavelet transform

or DWT. Our intuition turned out to be correct. Because of this one are rewarded with a

useful bonus property of (13) and (14), the sub sampling property. One last look at these

two equations one see that the scaling and wavelet filters have a step-size of 2 in the

variable k. The effect of this is that only every other λ j(k) is used in the convolution, with

the result that the output data rate is equal to the input data rate. Although this is not a

new idea, it has always been exploited in sub band coding schemes, it is kind of nice to

see it pop up here as part of the deal.

The sub sampling property also solves our problem, which had come up at the end

of the section on the scaling function, of how to choose the width of the scaling function

spectrum. Because, every time the iterate the filter bank the number of samples for the

next stage is halved so that in the end one are left with just one sample (in the extreme

case). It is be clear that this is where the iteration definitely has to stop and this

determines the width of the spectrum of the scaling function.

Normally the iteration is stop at the point where the number of samples has

become smaller than the length of the scaling filter or the wavelet filter, whichever is the

longest, so the length of the longest filter determines the width of the spectrum of the

scaling function.

With the redundancy removed, one still have two hurdles to take before, the wavelet

transform in a practical form. Continuing by trying to reduce the number of wavelets

needed in the wavelet transform and save the problem of the difficult analytical solutions

for the end.

Even with discrete wavelets it still needs an infinite number of scalings and

translations to calculate the wavelet transform. The easiest way to tackle this problem is

simply not to use an infinite number of discrete wavelets. Of course this poses the

question of the quality of the transform. Is it possible to reduce the number of wavelets to

analyze a signal and still have a useful result.

The translations of the wavelets are of course limited by the duration of the signal

under investigation so that have an upper boundary for the wavelets.

From (18) it is observed that wavelet has a band-pass like spectrum. From Fourier

theory the compression in time is equivalent to stretching the spectrum and shifting it


This means that a time compression of the wavelet by a factor of 2 is stretch the

frequency spectrum of the wavelet by a factor of 2 and also shift all frequency

components up by a factor of 2. Using this insight one can cover the finite spectrum of

our signal with the spectra of dilated wavelets in the same way as that one covered our

signal in the time domain with translated wavelets. To get a good coverage of the signal

spectrum the stretched wavelet spectra should touch each other, as if they were standing

hand in hand. This can be arranged by correctly designing the wavelets.

Fig : Touching Wavelet Spectra resulting from scaling of mother wavelet in the time


Summarizing, if one wavelet can be seen as a band-pass filter, then a series of

dilated wavelets can be seen as a band-pass filter bank. If one look at the ratio between

the center frequency of a wavelet spectrum and the width of this spectrum it is seen that it
is the same for all wavelets. This ratio is normally referred to as the fidelity factor Q of a

filter and in the case of wavelets one speaks therefore of a constant-Q filter bank

The scaling function was introduced by Mallat. It is sometimes referred to as the

averaging filter Because of the low-pass nature of the scaling function spectrum.

If the scaling function is considered as being just a signal with a low-pass spectrum,

then it can be decompose in wavelet components and expressed as (7):

---------- (15)

Since the scaling function (t) is selected in such a way that its spectrum neatly fitted

in the space left open by the wavelets, the expression (15) uses an infinite number of

wavelets up to a certain scale j as shown in figure 2.2. This means to analyze a signal

using the combination of scaling function and wavelets, the scaling function by itself

takes care of the spectrum otherwise covered by all the wavelets up to scale j, while the

rest is done by the wavelets. In this way limited the number of wavelets from an infinite

number to a finite number are obtained.

Fig : An infinite set of wavelets replaced by one scaling Function

By introducing the scaling function that have circumvented the problem of the

infinite number of wavelets and set a lower bound for the wavelets. Of course when using

a scaling function instead of wavelets, information is lost. That is to say, from a signal

representation point of view one do not loose any information, since it is still be possible

to reconstruct the original signal, but from a wavelet analysis point of view possible

valuable scale information is discarded. The width of the scaling function spectrum is

therefore an important parameter in the wavelet transform design. The shorter its

spectrum the more wavelet coefficients you is have and the more scale information. But,

as always, there is be practical limitations on the number of wavelet coefficients you can

handle. Later on, in the discrete wavelet transform this problem is more or less

automatically solved. The low-pass spectrum of the scaling function allows us to state

some sort of admissibility condition similar to (19)

∫ φ (t) dt =1 -------- (16)

Which shows that the 0th moment of the scaling function cannot vanish.

Summarizing once more, if one wavelet can be seen as a band-pass filter and a

scaling function is a low pass filter, then a series of dilated wavelets together with a

scaling function can be seen as a filter bank.

A time-scale representation of a digital signal is obtained using digital filtering

Techniques. Recall that the CWT is a correlation between a wavelet at different scales

and the signal with the scale (or the frequency) being used as a measure of similarity. The

continuous wavelet transform was computed by changing the scale of the analysis
window, shifting the window in time, multiplying by the signal, and integrating over all

times. In the discrete case, filters of different cutoff frequencies are used to analyze the

signal at different scales. The signal is passed through a series of high pass filters to

analyze the high frequencies, and it is passed through a series of low pass filters to

analyze the low frequencies.

The resolution of the signal, which is a measure of the amount of detail

information in the signal, is changed by the filtering operations, and the scale is changed

by up sampling and down sampling (sub sampling) operations. Sub sampling a signal

corresponds to reducing the sampling rate, or removing some of the samples of the signal.

For example, sub sampling by two refers to dropping every other sample of the signal.

Sub sampling by a factor n reduces the number of samples in the signal n times.

Up sampling a signal corresponds to increasing the sampling rate of a signal by

adding new samples to the signal. For example, up sampling by two refers to adding a

new sample, usually a zero or an interpolated value, between every two samples of the

signal. Up sampling a signal by a factor of n increases the number of samples in the

signal by a factor of n.

Although it is not the only possible choice, DWT coefficients are usually sampled

from the CWT on a dyadic grid, i.e., s0 = 2 and ∏ 0 = 1, yielding s=2j and ∏ =k*2j . Since

the signal is a discrete time function, the terms function and sequence is be used

interchangeably in the following discussion. This sequence is be denoted by x[n], where

n is an integer.

The procedure starts with passing this signal (sequence) through a half band digital

low pass filter with impulse response h [n]. Filtering a signal corresponds to the
mathematical operation of convolution of the signal with the impulse response of the

filter. The convolution operation in discrete time is defined as follows:

A half band low pass filter removes all frequencies that are above half of the

highest frequency in the signal. For example, if a signal has a maximum of 1000 Hz

component, then half band low pass filtering removes all the frequencies above 500 Hz.

The unit of frequency is of particular importance at this time. In discrete signals,

frequency is expressed in terms of radians. Accordingly, the sampling frequency of the

signal is equal to 2∏ radians in terms of radial frequency. Therefore, the highest

frequency component that exists in a signal is be ∏ radians, if the signal is sampled at

Nyquist’s rate (which is twice the maximum frequency that exists in the signal); that is,

the Nyquist’s rate corresponds to ∏ rad/s in the discrete frequency domain. Therefore

using Hz is not appropriate for discrete signals. However, Hz is used whenever it is

needed to clarify a discussion, since it is very common to think of frequency in terms of

Hz. It should always be remembered that the unit of frequency for discrete time signals is


After passing the signal through a half band low pass filter, half of the samples

can be eliminated according to the Nyquist’s rule, since the signal now has a highest

frequency of ∏/2 radians instead of ∏ radians. Simply discarding every other sample is

sub sample the signal by two, and the signal is then have half the number of points. The

scale of the signal is now doubled. Note that the low pass filtering removes the high

frequency information, but leaves the scale unchanged. Only the sub sampling process
changes the scale. Resolution, on the other hand, is related to the amount of information

in the signal, and therefore, it is affected by the filtering operations. Half band low pass

filtering removes half of the frequencies, which can be interpreted as losing half of the

information. Therefore, the resolution is halved after the filtering operation. Note,

however, the sub sampling operation after filtering does not affect the resolution, since

removing half of the spectral components from the signal makes half the number of

samples redundant anyway. Half the samples can be discarded without any loss of

information. In summary, the low pass filtering halves the resolution, but leaves the scale

unchanged. The signal is then sub sampled by 2 since half of the number of samples is

redundant. This doubles the scale.

This procedure can mathematically be expressed as

Having said that, now looking at how the DWT is actually computed: The DWT

analyzes the signal at different frequency bands with different resolutions by

decomposing the signal into coarse approximation and detail information. DWT employs

two sets of functions, called scaling functions and wavelet functions, which are

associated with low pass and high pass filters, respectively. The decomposition of the

signal into different frequency bands is simply obtained by successive high pass and low

pass filtering of the time domain signal. The original signal x[n] is first passed through a

half band high pass filter g[n] and a low pass filter h[n]. After the filtering, half of the

samples can be eliminated according to the Nyquist’s rule, since the signal now has a

highest frequency of ∏ /2 radians instead of ∏. The signal can therefore be sub sampled
by 2, simply by discarding every other sample. This constitutes one level of

decomposition and can mathematically be expressed as follows:

Where yhigh [k] and ylow [k] are the outputs of the high pass and low pass filters,

respectively, after sub sampling by 2. This decomposition halves the time resolution since

only half the number of samples now characterizes the entire signal. However, this

operation doubles the frequency resolution, since the frequency band of the signal now

spans only half the previous frequency band, effectively reducing the uncertainty in the

frequency by half. The above procedure, which is also known as the sub band coding, can

be repeated for further decomposition. At every level, the filtering and sub sampling is

result in half the number of samples (and hence half the time resolution) and half the

frequency band spanned (and hence double the frequency resolution). Figure 2.3

illustrates this procedure, where x [n] is the original signal to be decomposed, and h[n]

and g[n] are low pass and high pass filters, respectively. The bandwidth of the signal at

every level is marked on the figure as "f".

Fig: Decomposition of signal x[n] into low pass and high pass filters h[n] and g[n]

The Sub band Coding Algorithm as an example, suppose that the original signal

x[n] has 512 sample points, spanning a frequency band of zero to ∏ rad/s. At the first

decomposition level, the signal is passed through the high pass and low pass filters,

followed by sub sampling by 2. The output of the high pass filter has 256 points (hence

half the time resolution), but it only spans the frequencies ∏/2 to ∏ rad/s (hence double

the frequency resolution). These 256 samples constitute the first level of DWT

coefficients. The output of the low pass filter also has 256 samples, but it spans the other

half of the frequency band, frequencies from 0 to ∏/2 rad/s. This signal is then passed

through the same low pass and high pass filters for further decomposition. The output of
the second low pass filter followed by sub sampling has 128 samples spanning a

frequency band of 0 to ∏/4 rad/s, and the output of the second high pass filter followed

by sub sampling has 128 samples spanning a frequency band of ∏/4 to ∏/2 rad/s. The

second high pass filtered signal constitutes the second level of DWT coefficients. This

signal has half the time resolution, but twice the frequency resolution of the first level

signal. In other words, time resolution has decreased by a factor of 4, and frequency

resolution has increased by a factor of 4 compared to the original signal. The low pass

filter output is then filtered once again for further decomposition. This process continues

until two samples are left. For this specific example there would be 8 levels of

decomposition, each having half the number of samples of the previous level. The DWT

of the original signal is then obtained by concatenating all coefficients starting from the

last level of decomposition (remaining two samples, in this case). The DWT is then have

the same number of coefficients as the original signal.

The frequencies that are most prominent in the original signal is appear as high

amplitudes in that region of the DWT signal that includes those particular frequencies.

The difference of this transform from the Fourier transform is that the time localization of

these frequencies is not be lost. However, the time localization is have a resolution that

depends on which level they appear. If the main information of the signal lies in the high

frequencies, as happens most often, the time localization of these frequencies is be more

precise, since they are characterized by more number of samples. If the main information

lies only at very low frequencies, the time localization is not be very precise, since few

samples are used to express signal at these frequencies. This procedure in effect offers a
good time resolution at high frequencies, and good frequency resolution at low

frequencies. Most practical signals encountered are of this type.


Two of the three problems mentioned in above section have now been resolved,

but one still does not know how to calculate the wavelet transform. If regarded the

wavelet transform as a filter bank, then considering the wavelet transforming a signal as

passing the signal through this filter bank. The outputs of the different filter stages are

the wavelet and scaling function transform coefficients. Analyzing a signal by passing it

through a filter bank is not a new idea and has been around for many years under the

name sub band coding. It is used for instance in computer vision applications.

Fig : Splitting the signal spectrum with an iterated filter bank

The filter bank needed in sub band coding can be built in several ways. One way

is to build many band pass filters to split the spectrum into frequency bands. The
advantage is that the width of every band can be chosen freely, in such a way that the

spectrum of the signal to analyze is covered in the places where it might be interesting.

The disadvantage is that to design every filter separately and this can be a time

consuming process. Another way is to split the signal spectrum in two (equal) parts, a

low pass and a high-pass part. The high-pass part contains the smallest details that are

interested in and could stop here. However, the low-pass part still contains some details

and therefore it can be split again. And again, until a satisfactory number of bands are

have created. In this way an iterated filter bank can be created.

Fig : Implementation of one stage iterated filter banks

Usually the number of bands is limited by for instance the amount of data or

computation power available. The process of splitting the spectrum is shown in figure

2.5. The advantage of this scheme is to design only two filters whereas the disadvantage

is; only signal spectrum coverage is fixed.


The most important properties of wavelets are the admissibility and the regularity

conditions and these are the properties, which gave wavelets their name. It can be shown

that square integrable functions (t) satisfying the admissibility condition,

---------- (17)

can be used to first analyze and then reconstruct a signal without loss of information. In

(17) Ψ(ω) stands for the Fourier transform of ψ (t) the admissibility condition implies

that the Fourier transform of (t) vanishes at the zero frequency, i.e.

----------- (18)

This means that wavelets must have a band-pass like spectrum. This is a very

important observation, which is be used later on to build an efficient wavelet transform.

A zero at the zero frequency also means that the average value of the wavelet in the time

domain must be zero,

∫ Ψ (t) dt=0 ---------- (19)

and therefore it must be oscillatory. In other words, ψ (t) must be a wave.

As can be seen from (1) the wavelet transform of a one-dimensional function is

two-dimensional; the wavelet transform of a two-dimensional function is four-

dimensional. The time-bandwidth product of the wavelet transform is the square of the

input signal and for most practical applications this is not a desirable property. Therefore

one imposes some additional conditions on the wavelet functions in order to make the

wavelet transform decrease quickly with decreasing scale s. These are the regularity

conditions and they state that the wavelet function should have some smoothness and

concentration in both time and frequency domains. Regularity is a quite complex concept

and is try to explain it a little using the concept of vanishing moments.

If the wavelet transform (1) is expanded into the Taylor series at t = 0 until order n (let =

0 for simplicity):

------- (20)

Here f (p) stands for the pth derivative of f and O(n+1) means the rest of the expansion.

Now, if the moments of the wavelet is defined by Mp, Mp = ∫ t p ψ(t) dt ------- (21) then it

can rewrite (20) into the finite development

-------- (22)

From the admissibility condition, already has that the 0th moment M0 = 0 so that the

first term in the right-hand side of (22) is zero. If it is now manage to make the other

moments up to Mn zero as well, then the wavelet transform coefficients (s, ) is decay as

fast as sn+2 for a smooth signal f(t). If a wavelet has N vanishing moments, then the

approximation order of the wavelet transform is also N. The moments do not have to be

exactly zero, a small value is often good enough. In fact, experimental research suggests

that the number of vanishing moments required depends heavily on the application.

Summarizing, the admissibility condition gave us the wave, regularity and vanishing

moments gave us the fast decay or the let, and put together they give us the wavelet.
Wavelet transform is capable of providing the time and frequency information

simultaneously. Hence it gives a time-frequency representation of the signal. When one is

interested in knowing what spectral component exists at any given instant of time, to

know the particular spectral component at that instant. In these cases it may be very

beneficial to know the time intervals these particular spectral components occur.

Wavelets (small waves) are functions defined over a finite interval and having an average

value of zero. The basic idea of the wavelet transform is to represent any arbitrary

function ƒ(t) as a superposition of a set of such wavelets or basis functions. These basis

functions are obtained from a single wave, by dilations or contractions (scaling) and

translations (shifts). The discrete wavelet transform of a finite length signal x(n) having N

components, for example, is expressed by an N x N matrix similar to the discrete cosine

transform .


There are several ways wavelet transforms can decompose a signal into various sub

bands. These include uniform decomposition, octave-band decomposition, and adaptive

or wavelet-packet decomposition. Out of these, octave-band decomposition is the most

widely used.

The procedure is as follows: wavelet has two functions “wavelet “and “scaling

function”. They are such that there are half the frequencies between them. They act like a

low pass filter and a high pass filter. Figure 2-6 shows a typical decomposition scheme.

The decomposition of the signal into different frequency bands is simply obtained by
successive high pass and low pass filtering of the time domain signal. This filter pair is

called the analysis filter pair. First, the low pass filter is applied for each row of data,

thereby getting the low frequency components of the row. But since the low pass filter is

a half band filter, the output data contains frequencies only in the first half of the original

frequency range. By Shannon's Sampling Theorem, they can be sub-sampled by two, so

that the output data now contains only half the original number of samples. Now, the

high8 pass filter is applied for the same row of data, and similarly the high pass

components are separated.

Fig : Two-dimensional,four-band filter bank for subband image coding.

This is a non-uniform band splitting method that decomposes the lower frequency

part into narrower bands and the high-pass output at each level is left without any further

decomposition. This procedure is done for all rows. Next, the filtering is done for each

column of the intermediate data. The resulting two-dimensional array of coefficients

contains four bands of data, each labeled as LL (low-low), HL (high-low), LH (low-high)

and HH (high-high). The LL band can be decomposed once again in the same manner,
thereby producing even more sub bands. This can be done up to any level, thereby

resulting in a pyramidal decomposition as shown in Figure 2-6.



(c) (d)
Fig : (a) (b) (c) (d) (e) Pyramidal Decomposition of ‘Barbara’ image

The LL band is decomposed thrice as shown in figure 2.7(d). The compression ratios

with wavelet-based compression can be up to 300-to-1, depending on the number of

iterations. The LL band at the highest level is most important, and the other 'detail' bands

are of lesser importance, with the degree of importance decreasing from the top of the

pyramid to the bands at the bottom.


The inverse fast wavelet transform can be computed iteratively using digital

filters. The figure below shows the required synthesis or reconstruction filter bank, which

reverses the process of the analysis or decomposition filter bank of the forward process.

At each iteration, four scale j approximation and detail sub images are up sampled and

convolved with two one dimensional filters-one operating on the sub images columns and

the other on its rows. Addition of the results yields the scale j +1 approximation, and the

process is repeated until the original image is reconstructed. The filters used in the

convolutions are a function of the wavelets employed in the forward transform.

2 hψ(m)
Wψ (j,m,n)
Rows + 2 hψ(m)
(along m)
2 Columns Wø (j+1,m,n)
WψV (j,m,n) (along n)

WψH (j,m,n) 2 hψ(m)

Rows + 2 hø(m)
Wø (j,m,n) 2 Columns

Fig : The Inverse wavelet Transform

Inverse Fourier Transform

The Fourier transform relates a signal's time and frequency domain

representations to each other. The direct Fourier transform (or simply the Fourier

transform) calculates a signal's frequency domain representation from its time-domain

variant. The inverse Fourier transform finds the time-domain representation from the

frequency domain.

A function F(ω) is called the Fourier transform of f(x), if

is called the inverse Fourier transform of F(ω).

The Fourier transform of f is therefore a function F{f(t)} of the new variable ω. This
function, evaluated at ω, is F(ω).

Wavelet algorithms are recursive. The output of one step of the algorithm becomes the
input for the next step. The initial input data set consists of 2n elements. Each successive
step operates on 2n-i elements, were i = 1 ... n-1. For example, if the initial data set
contains 128 elements, the wavelet transform will consist of seven steps on 128, 64, 32,
16, 8, 4, and 2 elements.

On this web page stepj+1 follows stepj. If element i in step j is being updated, the notation
is stepj,i. The forward lifting scheme wavelet transform divides the data set being
processed into an even half and an odd half. In the notation below even i is the index of
the ith element in the even half and oddi is the ith element in the odd half . Viewed as a
continuous array (which is what is done in the software) the even element would be a[i]
and the odd element would be a[i+(n/2)].

Another way to refer to the recursive steps is by their power of two. This notation is used
in Ripples in mathematics. Here stepj-1 follows stepj, since each wavelet step operates on
a decreasing power of two. This is a nice notation, since the references to the recursive
step in a summation also correspond to the power of two being calculated.

The wavelet Lifting Scheme is a method for decomposing wavelet transforms into a set
of stages. Lifting scheme algorithms have the advantage that they do not require
temporary arrays in the calculation steps, as is necessary for some versions of the wavelet
algorithm. Lossless compression is a compression technique that does not lose any data in
the compression process. Lossless compression "packs data" into a smaller file size by
using a kind of internal shorthand to signify redundant data. If an original file is 1.5MB
(megabytes), lossless compression can reduce it to about half that size, depending on the
type of file being compressed. This makes lossless compression convenient for
transferring files across the Internet, as smaller files transfer faster. Lossless compression
is also handy for storing files as they take up less room. The zip convention, used in
programs like WinZip, uses lossless compression. For this reason zip software is popular
for compressing program and data files. That's because when these files are
decompressed, all bytes must be present to ensure their integrity. If bytes are missing
from a program, it won't run. If bytes are missing from a data file, it will be incomplete
and garbled. GIF image files also use lossless compression. Lossless compression has
advantages and disadvantages.

The advantage is that the compressed file will decompress to an exact duplicate of the
original file, mirroring its quality. The disadvantage is that the compression ratio is not all
that high, precisely because no data is lost. To get a higher compression ratio -- to reduce
a file significantly beyond 50% -- you must use lossy compression. Lossy compression
will strip a file of some of its redundant data. Because of this data loss, only certain
applications are fit for lossy compression, like graphics, audio, and video. Lossy
compression necessarily reduces the quality of the file to arrive at the resulting highly
compressed size, but depending on the need, the loss may acceptable and even
unnoticeable in some cases. JPEG uses lossy compression, which is why converting a
GIF file to JPEG will reduce it in size. It will also reduce the quality to some extent.
Lossless and lossy compression have become part of our every day vocabulary largely
due to the popularity of MP3 music files. A standard sound file in WAV format,
converted to a MP3 file will lose much data as MP3 employs a lossy, high-compression
algorithm that tosses much of the data out.

This makes the resulting file much smaller so that several dozen MP3 files can fit, for
example, on a single compact disk, verses a handful of WAV files. However the sound
quality of the MP3 file will be slightly lower than the original WAV, noticeably so to
some. As always, whether compressing video, graphics or audio, the ideal is to balance
the high quality of lossless compression against the convenience of lossy compression.
Choosing the right lossy convention is a matter of personal choice and good results
depend heavily on the quality of the original file.

The lifting scheme is a new method for constructing wavelets. The main difference with
classical constructions is that it is does not rely on the Fourier transform. This way,
lifting can be used to construct second generation wavelets, i. e., wavelets which are not
necessarily translations and dilations of one function. The latter we refer to as first
generation wavelets or classical wavelets. Since the lifting scheme does not depend on
the Fourier transform, it has applications in the following examples:

Wavelets on bounded domains: The construction of wavelets on an interval is needed to

transform finite length signals without introducing artifacts in the boundaries. The
remainder of this paper gives more details.
Wavelets on curves and surfaces: This case is related to solving equations on curves or
surfaces or analysis of data that live on curves or surfaces.
Weighted wavelets: Wavelets biorthogonal with respect to a weighted inner product are
needed for diagonalization of differential operators and weighted approximation.
Wavelets and irregular sampling: Many real life problems require basis functions and
transforms adapted to irregular sampled data.
It can be seen that wavelets adapted to these settings cannot be formed by translation and
dilation. The Fourier transform can thus no longer be used as a construction tool. The
lifting scheme provides an alternative.
The basic idea behind lifting scheme is very simple. Begin with a trivial wavelet, the
Lazy wavelet a function which essentially does not calculate anything, but which has the
formal properties of a wavelet. The lifting scheme then gradually builds a new wavelet,
with improved properties. This is the inspiration behind the name lifting scheme. To
explain the way the lifting scheme works, refer to the block diagram that shows the three
stages of lifting: split, predict, and update. Starting with a simple case.
Then develop a general framework to create any type of wavelet. Suppose sample a
signal f(t) with sampling distance 1. Denote the original samples as λ0,k = f(k)
for k Є Z. Decorrelate this signal. In other words, see if it is possible to capture the
information contained in this signal with fewer coefficients, i.e., coefficients with a larger
sampling distance. A more compact representation is needed in applications such as data
compression. May be it will not be possible to exactly represent the signal with fewer
coefficients but instead find an approximation within an acceptable error bound. Thus
want to have precise control over the information which is lost by using fewer
coefficients. Obviously, the difference between the original and approximated signals to
be small.


3.1.1 Split phase
The number of coefficients can be reduced by simply sub sampling the even samples and
obtaining a new sequence given by
λ-1,k = λ0,2k for k Є Z

- γ j,k

λ j+1,k
split Predict Update

λ j,k
Figure : Lifting scheme
where negative indices are used because the convention is that the smaller the data set,
the smaller the index. Information lost should also be kept track of. In other words, which
extra information is needed to recover the original set{λ0,k}from the set{ λ-1,k}.
Coefficients {γ-1,k}are used to encode this difference and refer to them as wavelet
coefficients. Many different choices are possible and, depending on the statistical
behavior of the signal, one will be better than the other. Better means smaller wavelet
coefficients. The most naive trivial choice would be to say that the lost information is
simply contained in the odd coefficients, γ-1,k = λ0,2k+1 for k Є Z. This choice corresponds
to the Lazy wavelet. Indeed, not much is done except for sub sampling the signal in even
and odd samples. Obviously this will not decorrelate the signal. The wavelet coefficients
are only small when the odd samples are small and there is no reason what so ever why
this should be the case. No restriction should be imposed on how the data set should be
split, nor on the relative size of each of the subsets. Simply a procedure is needed to join
{λ-1,k} and {γ-1,k} again into the original data set {λ0,k}. The easiest possibility for the split
is a simply brutal cut of the data set into two disjoint parts, but a split between even and
odd samples is a better choice. Choice of Lazy wavelet is better. The next step, the
predict, will help to find a more elaborate scheme to recover the original samples {λ 0,k}
from the sub sampled coefficients {λ-1,k}.
3.1.2 Predict phase
A more compact representation of {λ0,k}should be obtained. Consider the case where
{λ-1,k} does not contain any information (e. g. that part of the signal is equal to zero).
Then a more compact representation is obtained since {λ0,k}can be replaced with the
smaller set {λ-1,k}.Indeed, the extra part needed to reassemble {λ0,k} does not contain any
information. The previous situation hardly ever occurs in practice. Therefore, a way is
needed to find the odd samples {γ-1,k}.The even samples {λ0,2k} can immediately be found
as λ0,2k = λ-1,k. On the other hand the odd samples {λ0,2k+1}are predicted based on the
correlation present in the original data. If prediction operator P can be found independent
of the data, so that
γ-1.k = P(λ-1,k)

then again original data set can be replaced with {λ-1,k}, now missing part can be
predicted to reassemble {λ0,k}. The construction of the prediction operator is typically
based on some model of the data which reacts its correlation structure. Obviously, the
prediction operator P cannot be dependent on the data, otherwise the information would
be hidden in P.
Again, in practice, it might not be possible to exactly predict {γ-1,k} based on {λ-1,k}.
However, P(λ-1,k) is likely to be close to {γ-1,k}. Thus, {γ-1,k} is replaced with the
difference between itself and its predicted value P(λ-1,k). If the prediction is reasonable,
this difference will contain much less information than the original {γ-1,k} set. This
abstract difference operator is represented with a - sign and thus get
γ-1,k := λ0,2k+1 - P(λ-1,k)

The wavelet subset now encodes how much data deviates from the model on which P was
built. If the signal is some how correlated, the majority of the wavelet coefficients is
small. An insight is obtained on how to split the original data set. Indeed, in order to get
maximal data reduction from prediction, subsets {λ-1,k} and {γ-1,k}should be maximally
correlated. Cutting the signal into left and right parts might not be the best idea since
values on the far left and the far right are hardly correlated. Predicting the right half of the
signal based on the left is thus a tough job. A better method is to interlace the two sets, as
done in the previous step. Now , the original data set can be replaced with the smaller set
{λ-1,k} and the wavelet set {γ-1,k}. With a good prediction, the two subsets {λ-1,k,γ-1,k} yield
a more compact representation than the original set {λ0,k}.To find a good prediction
operator, again maximal correlation is assumed among neighboring samples. Hence odd
sample λ0,2k+1 can be predicted as the average of its two (even) neighbors: λ-1,k and λ-1,k+1.

The difference with this prediction scheme then becomes

γ-1,k = λ0,2k+1 – 1/2(λ-1,k + λ-1,k+1)

The model used to build P is a function piecewise linear over intervals of length 2. If the
original signal complies with the model, all wavelet coefficients in {γ-1,k} are zero. In
other words, the wavelet coefficients measure to which extent the original signal fails to
be linear. Their expected value is small. In terms of frequency content, the wavelet
coefficients capture high frequencies present in the original signal. These wavelet
coefficients are used. They encode the detail needed to go from the {λ-1,k}coefficients to
the{λ0,k}. They capture the high frequencies present in the original signal while the
{λ-1,k}some how capture the low frequencies. This scheme can now be iterated . Split
{λ-1,k} into two subsets {λ-2,k}and{γ-2,k} (by sub sampling) and then replace {γ-2,k}with the
difference between {γ-2,k}and P(λ-2,k. After n steps, the original data set is replaced with
the wavelet representation. {λ-n,k ,γ-n,k …….,γ-1,k}Given that the wavelet sets encode the
difference with some predicted value based on a correlation model, this is likely to give a
more compact representation.
Different prediction functions
The prediction does not necessarily have to be linear. Failure can be found to be cubic
and any other higher order. This introduces the concept of interpolating subdivision. An
extension of original sampled function is defined to a function defined on the whole real
line. Some value N is used to denote the order of the subdivision (interpolation) scheme.
For instance, to find a piecewise linear approximation, use N equal to 2. To find a cubic
approximation N should be equal to 4. It can be seen that N is important because it sets
the smoothness of the interpolating function used to find the wavelet coefficients (high
frequencies). This function is referred as the dual wavelet and to N as the number of dual
vanishing moments.

Linear interpolation Cubic interpolation

Figure : linear and cubic interpolation

Consider fancier interpolation schemes than the piecewise linear. Say, instead of defining
the new value at the midpoint between two old values as a linear interpolation of the
neighboring values, use two neighboring values on either side and define the (unique)
cubic polynomial p(x) which interpolates those four values
λj,k-1 = p(xj,k-1), λj,k = p(xj,k), λj,k+1=p(xj,k+1), λj,k+2=p(xj,k+2)
The new sample value (odd index) will then be the value that this cubic polynomial takes
on at the midpoint, while all odd samples (even index) are preserved
λj+1,2k=λj,k λj+1,2k+1=p(xj+1,2k+1)
Even though each step in the subdivision involves cubic polynomials, the limit function
will not be a polynomial anymore. While there is no sense yet as to what the limit
function looks like, it is easy to see that it can reproduce cubic polynomials. Assume that
the original sequence of samples came from some given cubic polynomial. In that case
the interpolating polynomial over each set of 4 neighboring sample values will be the
same polynomial and all newly generated samples will be on the original cubic
polynomial, in the limit reproducing it. In general, use N (N even) samples and build
polynomials of degree N-1. Then the order of the subdivision scheme is N.
The prediction function P, thus, uses polynomial interpolation of order N - 1 to find the
predicted values. The higher the order of this function, the better approximation of the
coefficients based on the coefficients. This is good if it is known that the original data set
resembles some polynomial of order N-1, as said before. Then, the {γj,k} set is going to be
zero or very small, for there is almost no difference between the original data and the
predicted values. What makes interpolating subdivision so attractive from an
implementation point of view is that only a routine is needed which can construct an
interpolating polynomial given some numbers and locations. The new sample
k=1 k=2 k=3 k=4 k=1 k=2 k=3 k=4 k=1 k=2 k=3 k=4

unaffected by boundary unaffected by boundary affected by boundary

Figure : Behaviour of cubic interpolation subdivision near boundary

value is then simply given by the evaluation of this polynomial at the new, refined
location. The algorithm that best adapts to the interpolating subdivision scheme is
Neville's algorithm. Notice also that nothing in the definition of this procedure requires
the original samples to be located at integers. This feature can be used to define scaling
functions over irregular subdivisions, which is not part of the scope of this paper. This
interpolation scheme allows to easily accommodate interval boundaries for finite
sequences. For the cubic construction described 1 sample is taken on the left and 3 on the
right at the left boundary of an interval. The cases are similar at the right boundary. As
soon the calculation of new γ values start near the right boundary, will be having less λ
coefficients on the right side and more on the left. If the γ coefficient is at the right
boundary, there are no λ coefficients on the right. All of them will be on the left. A list of
all the possible cases when N = 4 is the following:
Case 1: Near Left Boundary: More λ coefficients on the right side of the γ coefficient
than on the left side.
1λ on the left and 3 λ's on the right (remember that due to the splitting, always λ is in the
first position)
Case 2: Middle: Enough λ coefficients on either side of the γ coefficient.
2 λ's on the left and 2 λ's on the right
Case 3: Near Right Boundary: More λ coefficients on the left side of the λ coefficient
than on the right side.
3 λ's on the left and 1 λ on the right
4 λ's on the left and 0 λ's on the right
Using the interpolation scheme and Neville's algorithm, a set of coefficients are generated
that will help to find the correct approximation using a function of order N -1. For
example, if N = 2, then two coefficients are needed for the two possible cases (one λ on
the left and one on the right, and 2 λ's on the left and none on the right).If N = 4, four
coefficients are needed for each one of the four cases as mentioned. These coefficients
will be called filter coefficients.
Due to symmetry, it is known that all the cases on the right side are the opposite to the
cases on the left side .For example, the coefficients used for the case 3 λ's on the left and
1 λ on the right are the same as the ones used in the case 1λ on the left and 3λ's on the
right", but in opposite order. Thus, a total of N/2+1 different cases (one for the middle
case and N=2 for the symmetric boundary cases. That is, when there are two λ's on either
side and when there are one λ on the left and three λ's on the right. They are referred as
cases (a) and (b).Since there is a unique cubic interpolation (it does not matter how
separated the samples are, always they have the same interpolating cubic function), the
set of coefficients that will help to predict any γ every time should be known.
The basic idea is the following: N is equal to 4; therefore, there are 4 coefficients for
every case. To calculate, c1 put its value to 1 and all the other three, c2, c3 and c4, to
zero. Construct the polynomial that best fits the available resources and start evaluating
the function at the points. For case (a) evaluate the function where two coefficients are to
the left and two to the right. For case (b) evaluate the function where there is one
coefficient on the left and three on the right. The procedure is the same with the other
coefficients. Tables list the filter coefficients needed for the interpolation with N = 2 and
4. One property of these filter coefficients is that every set of N coefficients for every
case adds up to 1.The prediction phase thus gets reduced to a lookup in the previous
tables in order to be able to calculate the wavelet coefficients. For example, if to predict a
γ value using N = 4 and three λ coefficients on the left and one λ on the right, we would
perform the following operation
γ-j,k = λ-j+1,k = (0.0625*λ-j,k-3 – 0.3125*λ-j,k-2 + 0.9375 * λ-j,k-1 + 0.3125*λ-j,k+1)
The prediction of other γ coefficients would be a similar process except the use of the
neighboring λ coefficients and the corresponding filter coefficients.
cases coefficients

# λ on left #λ on right k-3 k-1 k+1 k-3

0 2 -0.5 1.5
1 1 -0.5 0.5
2 0 1.5 0.5
Table : Filter coefficients for N=2

The wavelet coefficients have been calculated. However, we are not very pleased with
the choice of the{λ-1,k}. The reason is the following. Suppose we are given 2n +1 original
samples {λ0,k 0‫≤׀‬k≤2n}. This scheme could be applied (split and predict) n times thus
obtaining{γj,k‫׀‬-n≤j≤-1,0≤k≤2n+j}and two (coarsest level) coefficients λ-n,0 and λ-n,1.These
are the first (λ-n,0=λ0,0) and the last (λ-n,1=λ0.2n)original samples. This introduces
considerable aliasing. Some global properties of the original data set should be
maintained in the smaller version{λ-j,k} . For example, in the case of an image, the smaller
images {λ-j,k}should have the same overall brightness, i.e., the same average pixel value.
Therefore the last values should be the average of all the pixel values in the original
image. Part of this problem can be solved by introducing a third stage: the update.
3.1.3 Update phase
Lift the λ-1,k with the help of the wavelet coefficients γ-1,k. Again neighboring wavelet
coefficients are used. The idea is to find a better λ-1,k so that a certain scalar quantity Q(),
e. g., the mean, is preserved, or
This could be done by finding a new operator to extract λ-1,k directly from λ0,k but not
done for two reasons. First, this would create a scheme which is very hard to invert.
Secondly, it would be better to reuse the work already done maximally. Therefore, the
proposed system uses the already computed wavelet set {γ-1,k} to update{ λ-1,k }so that the
latter preserves Q(). In other words, an operator U is constructed and update { λ-1,k } as
λ-1,k = λ-1,k +U(γ-1,k )
find a scaling function using the previously calculated wavelet coefficients in order to
maintain some properties among all the λ coefficients throughout all levels. One way is to
set all λ0,k to zero except for λ0,0 which is set to one. Then, the interpolating subdivision ad
infinitum. The resulting function is φ(x), the scaling function, which will help to create a
real wavelet that will maintain some desired properties from the original signal. This
function will have an order depending of some (even) value Ñ, which is not necessarily
equal to N. We will call Ñ the number of real vanishing moments. The higher the order of
this function, the less aliasing effect is seen in the resulting transform. The basic
properties to be preserved are the moments of the wavelet function, ψ at every level. One
of the things known from the properties of wavelets, is that the integral of ψ along the
real line from must be equal to zero. This is also true for higher moments. Thus,

Basically want to preserve up to Ñ-1 moments of the λ's at every level and use this
information to see how much of every coefficient is needed to update every λ. These
update values are named lifting coefficients. Before starting the algorithm to calculate the
lifting coefficients, first need to initialize the moments information for every coefficient
at the first level. The integral is set to one for all the coefficients because all the filter
coefficients for each λ add to one. The initial values for the higher order moments are
calculated using the coefficient indices as shown in the table below. Table 2: Initial
moments using index k

Initial moments using index k

Mom1=k1 0 1 2 3 4 5
Mom2=k2 0 1 4 9 16 25
Mom3=k3 0 1 8 27 64 125

Once the moments are initialized, the following steps can be applied.
1.Check for the λ's that contributed to predicting every γ and see how much this
contribution was. (These values are given by the filter coefficients found in the prediction
2. Update the moments for every λ at the current level with the following equation,
mj,k=mj,k + fj*ml,k
where j is the index relative to a λ coefficient, f(j) is its corresponding filter coefficient
(0 < j ≤ N), k is the moment being updated (0 < k ≤ Ñ), and l is the index relative to a
3. Knowing that all the moments must be zero at every level, a linear system can be
constructed to find the lifting coefficients for every . The steps are:
(a) Put a one in a γ coefficient and zero in all the remaining γ's.
(b) Apply an inverse transform one step up to see how much this is contributing to the λ's
that update it and create a linear system of Ñ x Ñ variables
(c) Solve the system and find the set of lifting coefficients for the γ coefficient with value
set to one.
This linear system is built and solved for every coefficient to find its corresponding set
of lifting coefficients. After applying the previous steps, we have a set of Ñ lifting
coefficients for every at every level. These values are used to apply the update operator,
U, to the λ coefficients before iterating to the next level. To update the λ's, position at a
λj,k coefficient and take its corresponding lifting coefficients, e.g. (a,b) for Ñ= 2.Identify
the λ's which were affected by this γ , e. g. λ-j,k-1 and λ-j,k+1. Now,

λ-j,k-1 = λ-j,k+1 +a*γ-j,k and λ-j,k+1 = λ-j,k+1 + b* γ-j,k

Then, move to the next and do the same. An example of the split, predict and update
phases for a 1-D signal with length L = 8, N = 2 and Ñ= 2 follows. First, consider the
split and predict:
λ1 λ2 λ3 λ4 λ5 λ6 λ7 λ8
γ1 γ2 γ3 γ4
γ1 uses λ1 and λ3 for prediction. Similarly, γ2 uses λ3 and λ5, γ3 uses λ5 and λ7, and γ4 uses
λ5 and λ7. The second stage, the update, is performed. In this example, the following
lifting coefficients ((a,b) pairs) are obtained,
γ1 γ2 γ3 γ4
(2/5,1/5) (0,2/3) (4/15,1/5) (-2/15,2/5)
λ1 λ3 λ5 λ7
λ1 uses a from γ1 for updating. Similarly, λ3 uses b from γ1 and a from γ2; λ5 uses b from
γ2 a from γ3, and a from γ4; and λ7 uses b from γ3 and b from γ4.
At the next level, the coefficients get organized as follows after the split and predict
λ1 λ3 λ5 λ7
γ1 γ3
γ1 uses λ1 and λ5, and γ2 uses λ1 and λ5 for prediction.
γ1 γ2
(1/2,0.214286) (-1/3,0.476190)
λ1 λ5
In the update stage ,λ1 uses a from γ1 and a from γ2, and λ5 uses b from γ1 and b from γ2
for updating.
It is important to note that for a longer signal, the lifting coefficients are going to be
(1/4,1/4) for all the λ's unaffected by the boundaries. Using these values, the λ
coefficients can be updated with the following equation,
λ-1,k= λ-1,k + ¼* γ-1,k-1 + ¼ * γ-1,k
The three stages of lifting described by Equations and depicted in the block diagram are
combined and iterated to generate the 1-D fast lifted forward wavelet transform
{λj,k, γj,k = split(λj+1,k )
For j= -1 down to –n γj,k -= P (λj,k )
λj,k +=U (γj,k )

one of the nice properties of lifting can be illustrated: once forward transform is
performed, immediately inverse can be derived. Just have to reverse the operations and
toggle + and -. This leads to the following algorithm for the inverse transform:
λj,k -=u(γj,k )
For j= -n to-1 γj,k +=p(λj,k )
λj+1,k =join(λj,k ,γj,k )

To calculate the total number of iterations of this transform, three factors have to be
considered: the signal length (L), the number of dual vanishing moments (N), and the
number of real vanishing moments (Ñ). It can be proven that the total number of
iterations is given by,

N=[log2((L-1)/(Nmax -1))]

where Nmax = max(N,Ñ). It can be seen from Equation 6 that the size of the signal does
not matter, i.e., signals do not necessarily have to have dyadic dimensions. The
interpolating subdivision guarantees correct treatment of the boundaries for every case.
An extension of the 1-D algorithm for 2-D signals is a simple repetitive scheme of the 1-
D transform through rows and columns, as the transform is separable. For better support
of frequencies, the application of the square 2-D method is proposed. The basic idea is to
apply the 1-D transform to all the rows first and, afterwards, to all the columns. This is
done at every level in order to create a square window transform that gives better
frequency support than a rectangular window transform. Different filter and lifting
coefficients are used for each dimension (X,Y ) if they are different. Using the filter
coefficients (1/2,1/2) and lifting coefficients (1/4,1/4), the wavelet transform presented
here is the (N = 2, Ñ= 2) biorthogonal wavelet transform of Cohen-Daubechies-
Feauveau. This simple example already shows how the lifting scheme can speed up the
implementation of the wavelet transform. Classically, the {λ-1,k}coefficients are found as
the convolution of the {λ0,k} coefficients with the filter
This step would take 6 operations per coefficient while lifting only needs 3.

3.1.4 Inverse lifting scheme

γj,k -

Update Predict Merge λj+1,k


Figure : Inverse lifting scheme

A whole family of biorthogonal wavelets can be constructed by varying the three stages
of lifting:
1. Split: Choices other than the Lazy wavelet are possible as the initial split. A typical
alternative is the Haar wavelet transform.
2. Predict: In wavelet terminology, the prediction step establishes the number of
vanishing moments(N) of the dual wavelet. In other words, if the original signal is a
polynomial of degree less than N, all wavelet coefficients will be zero. It is shown that
schemes with order higher than N = 2 are easily obtained by involving more neighbors.
3. Update: Again in wavelet terminology, the update step establishes the number of
vanishing moments ( Ñ) of the primal or real wavelet. In other words, the transform
preserves the first Ñ moments of the λj,k sequences. It is shown that schemes with order
higher than Ñ= 2 can be constructed by involving more neighbors. In some cases, namely
when the split stage already creates a wavelet with a vanishing moment (such as the
Haar), the update stage can be omitted. In other cases, with more than one update step
applied, another family of wavelets, rather than the biorthogonal ones, can be created.
The in-place implementation can be easily derived from the diagrams and equations. Let's
assume the original samples are stored in a vector v[k]. Each coefficient λj,k or γj,k is
stored in location . The Lazy wavelet transform is then immediate. All other operations
can be done with + = or - = operations. In other words, when predicting the γ coefficients,
the λ coefficients can be substituted by them in the same position. When updating the λ
coefficients, they can be saved in the same position.
Chapter 4

Design approach

The Embedded zero-wavelet tree coding is fundamentally been classified into 6

major parts;

1) Preprocessing of the source image.

2) Transformation of the processed image using wavelet


3) Encoding the transformed data for compression.

4) Decoding the compressed encoded data.

5) Performing the inverse transformation on the decoded data to

retrieve back the original data back.

6) Processing the retrieved data to obtain the original image


The design unit implements the Embedded zero-tree wavelet coding system for

data compression. The coding system reads the multiresolution component of the image

obtain from the transformation module and pass the data to the decoder unit to retrive the

image back. Figure below shows the implemented embedded zero tree wavelet coding

system for image processing.


Fig 4.9 shows the block diagram of JPEG 2000 coding system used for

compressing still images.

Source Image

Wavelet Quantizer
Compressed data

Inverse wavelet Dequantizer Decoder


Retrieved Image

Fig : Block diagram for the JPEG-2000 image coding system


To perform the forward DWT the JPEG2000 system uses a one-dimensional (1-

D) subband decomposition of a 1-D set of samples into low-pass and high-pass samples.

Low-pass samples represent a down-sampled, low-resolution version of the original set

and High-pass samples represent a down-sampled residual version of the original set.

The transformation uses the convolution based filering mode with the process

performing similar to the one expalined in section 4.2.2 of embedded coding.


Quantization refers to the process of approximating the continuous set of values

in the image data with a finite (preferably small) set of values. The input to a quantizer is

the original data, and the output is always one among a finite number of levels. The

quantizer is a function whose set of output values are discrete, and usually finite.

Obviously, this is a process of approximation, and a good quantizer is one which

represents the original signal with minimum loss or distortion.

The input-output characteristics of quantization process is shown in Fig 4.10.

Inverse quantization formulates the reverse of quantization. In quantization process each

block of Transformed coefficients is subjected to a process of quantization, wherein

grayscale and color information are discarded. Each transformed coefficient is divided by

its corresponding element in a scaled quantization matrix, and the resulting numerical

value is rounded. The default quantization matrices for luminance and chrominance are

specified in the JPEG standard, and were designed in accordance with a model of human

perception. The scale factor of the quantization matrix directly affects the amount of

image compression, and the lossy quality of JPEG compression arises as a direct result of

this quantization process.

Fig : Quantization relationship wrt. a given input

In quantization, each input symbol is treated separately in producing the output.

A quantizer can be specified by its input partitions and output levels (also called

reproduction points). If the input range is divided into levels of equal spacing, then

the quantizer is termed as a Uniform Quantizer, and if not, it is termed as a Non-

Uniform Quantizer. A uniform quantizer can be easily specified by its lower bound

and the step size. Also, implementing a uniform quantizer is easier than a non-

uniform quantizer. For example in Fig 4.11 if the input falls between n*r and (n+1)*r,

the quantizer outputs the symbol n.

Fig : A uniform quantizer

Just the same way a quantizer partitions its input and outputs discrete levels, a

Dequantizer is one which receives the output levels of a quantizer and converts them into

normal data, by translating each level into a 'reproduction point' in the actual range of

data. It can be seen from literature, that the optimum quantizer (encoder) and optimum

dequantizer (decoder) must satisfy the following conditions.

• Given the output levels or partitions of the encoder, the best decoder is one that

puts the reproduction points x' on the centers of mass of the partitions. This is

known as centroid condition.

• Given the reproduction points of the decoder, the best encoder is one that puts the

partition boundaries exactly in the middle of the reproduction points, i.e. each x is
translated to its nearest reproduction point. This is known as nearest neighbour


• The quantization error (x - x') is used as a measure of the optimality of the

quantizer and dequantizer.

The Quantizer Quantizes the data at a specific step level ∆ given by

• (Step size) ∆ = R/ (2b -1)

• where R is the range of the image matrix (I)

(Range) R = max (I)-min (I). ‘b’ indicate the number of bit outputted for every step. The

quantization carried out on thresholding operation. For every value of the image element

fed to the quantizer an equivalent binary sequence is passed as output which is passed to

the encoder module for further processing.


After the data has been quantized into a finite set of values, it can be encoded

using an Entropy Coder to give additional compression. By entropy, it means the amount

of information present in the data, and an entropy coder encodes the given set of symbols

with the minimum number of bits required to represent them. Various algorithms were

proposed for the coding of image among which huffman coding is found to be more

efficient than others.

Huffman encoding is designed to work best on images that have a lot of

repetition. The general concept is to assign the most used bytes fewer bits, and the least

used bytes more bits. First, the most used bytes in the image are assigned a variable

length binary code. The more often the byte is used the shorter the control code. The less

often a byte occurs the longer the control code.


The decoder unit decodes the encoded data bit stream from the encoder module.

The decoder unit performs the reverse operation to the encoder process. This unit shares

the same code word used under encoding from the code book.

The Huffman decoder block carries out decoding reading the unique code bits passed in

place of the data bit. The data bits are received in serial format and compared with the

unique word. Equivalent data bits are passed out whenever there is a matching of the

unique word. For decoding of ‘m’ block of data bits a minimum of 2 m-1 iterations are

performed which make the system much slower in operation.

The dequantizer unit dequantizer the decoded data bits. The dequantization

operation is carried in the reverse manner to the quantization. The dequantizer takes the

same step sizes as in quantization from the quantization table. The reconstructed data are

not exactly recovered to the original image which makes the system a lossy compression

The inverse Transforamtion is carried out in a similar fashion to the one explained

Under the embedded coding.


Source Image

Wavelet Lifting
Pre-Processor Transformation Coding

Compressed data

Post Processor Inverse wavelet Inverse Lifting

Transformation Codin

Retrieved image

Fig : Block diagram for the proposed EZW coding system


Before the processing of image data the image are preprocessed to improve the

rate of operation for the coding system. Under preprocessing tiling on the original image

is carried out. The term “tiling” refers to the partition of the original (source) image into

rectangular nonoverlapping blocks (tiles), which are compressed independently, as

though they were entirely distinct images. All operations, including component mixing,

wavelet transform, quantization and entropy coding are performed independently on the

image tiles. The tile component as shown in Fig 4.2 is the basic unit of the original or

reconstructed image. Tiling reduces memory requirements, and since they are also
reconstructed independently, they can be used for decoding specific parts of the image

instead of the whole image. All tiles have exactly the same dimensions, except maybe

those at the boundary of the image. Arbitrary tile sizes are allowed, up to and including

the entire image (i.e., the whole image is regarded as one tile).

8x8 tile image

Original imageC

Fig : Tiling of original image


This unit Transforms the input image from time domain to frequency domain and

decomposes the original image into its fundamental components. The transformation is

performed using Debuchie wavelet transform. Wavelet transform is a very useful tool for

signal analysis and image processing, especially in multi-resolution representation. One-

dimensional discrete wavelet transform (1-D DWT) decomposes an input sequence into

two components (the average component and the detail component) by calculations with

a low-pass filter and a high-pass filter. Two-dimensional discrete wavelet transform (2-D

DWT) decomposes an input image into four sub-bands, one average component (LL) and

three detail components (LH, HL, HH) as shown in Fig 4.3.

Fig : The result of 2-D DWT decomposition

The wavelet transform uses filter banks shown in Fig 4.5 for the decomposition of

preprocessed original image into 3 details and 1 approximate coefficient. 2-D DWT is

achieved by two ordered 1-D DWT operations (row and column). Initially the row

operation is carried out to obtain 1 D decomposition. Then it is transformed by the

column operation and the final resulted 2-D DWT is obtained.

Fig : Filter Bank Implementation of Wavelet sub-band decomposition

The filtering is carried out by convolving the input image with the filter

coefficients passed. Each decomposing stage consists of a pair of high pass and a low

pass filter. These filters isolate the highly varying and low varying components from the

given image. Fig 4.5 (a) shows the original image used for decomposition into

fundamental subbands using filter bands as shown in Fig 4.4. Fig 4.5 (d) shows the 3level

decomposition of the original image. The approximate coefficient obtained at each level

were further decomposed to 3 detailed and 1 approximate coefficient for n levels.

(a) Original image (b) 1 scale level decomposition

(c) 2 scale level decomposition

1 level detailed coefficient

2nd level

(d) 3 scale level decomposition Fig : (a) (b) (c) (d


This technique overcomes the lossy nature of wavelet compression by two modules
namely the encoding and the decoding. The encoding module constitutes of three sub
modules called split, predict and update modules. The reconstruction of the lossless
compressed data is decoding module which consists of three sub modules update, predict
and merge.


In this section we describe the lifting scheme in more detail. Consider a signal sj with 2j
samples which we want to transform into a coarser signal sj-1 and a detail signal
dj-1. A typical case of a wavelet transform built through lifting consists of three steps:
split, predict , and update.
4.5.1 Split:
This stage does not do much except for splitting the signal into two disjoint sets of
samples. In our case one group consists of the even indexed samples s2l and the other
group consists of the odd indexed samples s2l+1. Each group contains half as many
samples as the original signal. The splitting into even and odds is a called the Lazy
wavelet transforms.
(evenj-1, oddj-1) = Split(sj)
4.5.2 Predict:
The even and odd subsets are interspersed. If the signal has a local correlation structure,
the even and odd subsets will be highly correlated. In other words given one of the two
sets, it should be possible to predict the other one with reasonable accuracy. Always use
the even set to predict the odd one. In the Haar case the prediction is particularly simple.
An odd sample sj,2l+1 will use its left neighboring even sample sj,2l as its predictor. Then let
the detail dj-1,l be the difference between the odd sample and its prediction:
dj-1,l = sj,2l+1 – sj,2l
which defines an operator P such that
dj-1 = oddj-1-P(evenj-1)

It should be possible to represent the detail more efficiently. Note that if the original
signal is a constant, then all details are exactly zero.
4.5.3 Update:
One of the key properties of the coarser signals is that they have the same average value
as the original signal, i.e., the quantity

is independent of j. This results in the fact that the last coefficients s0,0 is the DC
component or overall average of the signal. The update stage ensures this by letting

sj-1,l = sj,2l + dj-1,l/2

Substituting this definition we easily verify that

which defines an operator U of the form

sj-1 = evenj-1 + U(dj-1)

All this can be computed in-place: the even locations can be overwritten with the
averages and the odd ones with the details. An abstract implementation is given by:

(oddj-1, evenj-1) := Split(sj);

oddj-1 -= P(evenj-1);
evenj-1 += U(oddj-1);

These three stages are depicted in a forward lifting scheme we can immediately build the
inverse scheme, Again we have three stages:

4.5.4 Undo update:

Given dj and sj we can recover the even samples by simply subtracting the update
evenj-1 = sj-1 - U(dj-1):
In the case of Haar, we compute this by letting

Sj,2l = sj-1,l – dj-1,l/2

4.5.5 Undo predict:
Given evenj-1 and dj-1 we can recover the odd samples by adding the prediction
oddj-1 = dj-1 + P(evenj-1)

In the case of Haar, we compute this by letting

sn,2l+1 = dn-1,l + sn,2l

4.5.6 Merge:
Now that we have the even and odd samples we simply have to zipper them together to
recover the original signal. This is the inverse Lazy wavelet:

sj = Merge(evenj-1, oddj-1)

assuming that the even slots contain the averages and the odd ones contain the difference,
the implementation of the inverse transform is:

evenj-1 -= U(oddj-1);
oddj-1 += P(evenj-1);
sj := Merge(oddj-1, evenj-1)
The inverse transform is thus always found by reversing the order of the operations and
flipping the signs.


Inverse transformation is the process of retriving back the image data from the

obtained image values. The image data trasnformed and decomposeed under encoding

side is rearranged from higher level decomposition to lower level with the highest

decomposed level been arranged at the top. From the highest level of decomposition the

lower values are arranged as shown in Fig 4.6.



Retrieved Image

Fig : Decomposition level and the reconstruction of image back

The reconstuction of the decompsed image is carried out by iteratively repeating the filter

banks followed by upsampling by 2 and then combining the two obained filtered

components, each sub level addition gives the approximate coefficient of the upper level

on a total reconstruction the final addition gives the reconstructed image back. Fig 4.7

shows the reconstuction of the obtained decomposed component for a two level scaling.

D11 D3




Fig : Multilevel reconstruction of the decomposed image


This unit reforms the image from the obtained tiles by placing them in sequence

as 8x8 blocks for the complete image dimension. These tiles are aligned in same

sequence as segmented under encoding side.

Reconstructed tile

Reconstructed image
Fig : Reconstruction of Image from tile


Wavelet transform using lifting scheme FLOW CHART

Wavelet transform

Separate Even
even value
values matrix
valu Split Predict Update

Odd value Prediction
matrix function

Inverse wavelet transform using lifting scheme FLOW CHART

Inverse Wavelet

d Update Predict Merge
Prediction Odd value
function matrix



5.1 CASE 1: A 256 x 256 tif image

In the above screen first the input image is read or taken. In this case tif image is taken as
Read input button is clicked to get the input image.
The input image which is taken is shown above. It is 256 x 256 tif image.
After reading the image wavelet method button is pressed and the compressed image
after applying wavelet transform is obtained.
The above image is the retrieved image after lifting scheme method is applied on original
The error rate of the wavelet compression method and the lifting scheme is compared in
the above graph.
5.2 Case II: A 307 x 593 tif image

An image is read by clicking the read input button.

Compressed image obtained by wavelet transformation method.
The above image is the retrieved image after applying lifting scheme method.
The error rate of the wavelet compression method and the lifting scheme is compared in
the above graph.
5.3 Case III :A 256 x 256 bmp image

An image is read by clicking the read input button.

Compressed image obtained by wavelet transformation method. The retrieved image
obtained after wavelet method is blurred.
The above image is the retrieved image after applying lifting scheme method. The above
image is almost similar to the original image.
The error rate of the wavelet compression method and the lifting scheme is compared in
the above graph.


This project implements a method to optimize wavelet function for lossless data
compression using lifting scheme. The project work realized a optimal data compression
system using lifting scheme which compress the image in a lossless manner retaining the
accuracy. The lifting scheme is realized in a modular approach in Matlab platform with
three major blocks mainly split, predict, update. The observations were carried out for
various types of images with different formats and observed that the lifting scheme
provides a considerable accuracy as compared to its counterpart wavelet transform with
quantization. The lifting scheme retains the level of compression compared to wavelet
transformation. From all made observations it is concluded that the proposed lifting
scheme for lossless compression provides considerable accuracy maintaining
compression level as compared to wavelet transform.

The proposed scheme can be further enhanced for high rate of compression for more
accuracy using advances methods such as genetic algorithm .The method could also be
extended for analysis of this method for various bitrate applications.
1. A.Alecu, A.Munteanu, P.Schelkens, J.Cornelis, and S.Dewitte,"Wavelet-based
infinite scalable coding," IEE Electronics Letters, vol. 38, no. 22, pp. 1338-1340,
2. R. Ansari, N. Memon, and E. Ceran, "Near-lossless Image Compression
Techniques," Journal of Electronic Imaging, vol. 7, no. 3, pp. 486-494, July 1998.
3. M. Antonini, M. Barlaud, P. Mathieu, and I. Daubechies, "Image Coding Using
Wavelet Transform," IEEE Transactions on Image Processing, vol. 1, pp. 205-
220, April 1992.
4. A. R. Calderbank, I.Daubechies, W.Sweldens, and B.L.Yeo, "Wavelet
Transforms that Map Integers to Integers," Journal of Applied Computational
Harmonics Analysis,vol. 5, pp. 332-369, 1998.
5. I.Daubechies,"Orthonormal bases of compactly supported wavelets,"
Communications on Pure and Applied Mathematics, vol. 41, pp. 909-996, 1988.
6. I Daubechies and W.Sweldens, "Factoring Wavelet Transforms into Lifting
Steps," Journal of Fourier Analysis and Applications, vol. 4, no. 3, pp. 247-269,
7. A. Munteanu, J. Cornelis, G. V. d. Auwera, and P. Cristea, "Wavelet-based
lossless compression scheme with progressive transmission capability,"
International Journal of Imaging Systems and Technology, vol. 10, no. 1, pp. 76-
85, 1999.
8. A. Munteanu, J. Cornelis, and P. Cristea, "Wavelet-Based Lossless Compression
of Coronary Angiographic Images," IEEE Transactions on Medical Imaging,
vol.18, no. 3, pp. 272-281, 1999.
9. W. Sweldens and P. Schróder: Building your own wavelets at home, In “Wavelets
in Computer Graphics”,ACM SIGGRAPH Course Notes (1996).

10. W. Sweldens: The lifting scheme: A construction of second generation wavelets,

Siam J. Math. Anal, Vol. 29,No. 2, pp 511-546 (1997).