Ram Kit Hes Is

CHAPTER 1
1. INTRODUCTION:
Uncompressed multimedia (graphics, audio and video) data requires considerable storage
capacity and transmission bandwidth. Despite rapid progress in mass-storage density, processor
speeds, and digital communication system performance, demand for data storage capacity and
data-transmission bandwidth continues to outstrip the capabilities of available technologies. The
recent growth of data intensive multimedia-based web applications have not only sustained the
need for more efficient ways to encode signals and images but have made compression of such
signals central to storage and communication technology.
Image compression can be lossy or lossless. Lossless compression involves with
compressing data which when decompressed, will be an exact replica of the original data. This is
the case when binary data such as executables, documents etc. are compressed they need to be
exactly reproduced when decompressed. On the other hand, images (and music too) need not be
reproduced exactly.
Near-lossless compression denotes compression methods, which give quantitative bounds
on the nature of the loss that is introduced. Such compression techniques provide the guarantee
that no pixel difference between the original and the compressed image is above a given value.
An approximation of the original image is enough for most purposes, as long as the error
between the original and the compressed image is tolerable. This is because lossy compression
methods, especially when used at low bit rates, introduce compression artifacts. For the lossy
reconstructions at the intermediate stages, no precise bounds can be set on the extent of distortion
present. Near-lossless compression in such a framework is only possible either by an appropriate
pre quantization of the wavelet coefficients and lossless transmission of the resulting bit stream,
or by truncation of the bit stream at an appropriate point followed by transmission of a residual
layer to provide the near-lossless bound.
This project aims on providing an application of neural networks to still image compression in
frequency domains. The sparse properties of Support Vector Machine (SVM) learning are
exploited in the compression algorithms. SVM has the property that it will choose the minimum
number of training points to use as centers of the Gaussian kernel functions. It is this property
that is exploited as the basis for image compression algorithm. compression is more efficient in
frequency space.
1
1.1 OBJECTIVE:
To clearly understand and to Implement An Algorithm for the Application of
SVM(Non Linear Regression) Learning and DCT to Image Compression
To compare the obtained results with the standard image compression techniques
available like JPEG.
To obtain good quality, good compression ratio and signal to noise ratio within
the required bound
2
1.2 DETAILED LITERATURE SURVEY:
IMAGE COMPRESSION
The degree of compression is best expressed in terms of the average information or entropy
of a compressed image source, expressed in terms of bits/pixel. Regardless of the particular
technique used, compression engines accomplish their intended purpose in the following manner:
1. Those portions of the image which are not perceptible to the human eye are not
transmitted.
2. Frame redundancies in the image are not transmitted.
3. The remaining information is coded in an efficient manner for transmission.
Currently, a number of image compression techniques are being used singly or in combination.
These include the following
1.2.1 ARTIFICIAL NEURAL NETWORKS[1]
In this paper describes an algorithm using backpropagation learning in a feed forward
network. The number of hidden neurons were fixed before learning and the weights of the
network after training were transmitted. The neural network (and hence the image) could then be
3
recovered from these weights. Compression was generally around 8:1 with an image quality
much lower than JPEG.
1.2.2 IMAGE COMPRESSION BY SELF-ORGANIZED KOHONEN MAP[2]
In this paper a compression scheme based on the discrete cosine transform (DCT), vector
Quantization of the DCT coefficients by Kohonen map, differential coding by first-order
predictor and entropic coding of the differences. This method gave better performance than
JPEG for compression ratios greater than 30:1
.
1.2.3 SUPPORT VECTORS IN IMAGE COMPRESSION[3]
In this paper The use of support vector machines (SVMs) in an image compression
algorithm was first presented. This method used SVM to directly model the color surface.
The parameters of a neural network (weights and Gaussian centers) were transmitted so that
the color surface could be reconstructed from a neural network using these parameters.
1.2.4 SUPPORT VECTOR REGRESSION MACHINES[4]
In this paper a new regression technique based on Vapniks concept of support vectors is
introduced. We compare support vector regression (SVR) with a committee regression
technique (bagging) based on regression trees and ridge regression done in feature space.
On the basis of these experiments it is expected that SVR will have advantages in high
dimensionality space because SVR optimization does not depend on the dimension &y of
the input space
1.2.5 SUPPORT VECTOR METHOD FOR FUNCTION OF APPROXIMATION[5]
In this paper The Support Vector (SV) method was recently proposed for estimating
regressions, constructing multidimensional splines, and solving linear operator equations
In this presentation we report results of applying the SV method to these problems. 1
Introduction The Support Vector method is a universal tool for solving multidimensional
function estimation problems. Initially it was designed to solve pattern recognition
problems, where in order to find a decision rule with good generalization ability one
selects some (small) subset of the training data, called the Support Vectors (SVs).
4
Optimal separation of the SVs is equivalent to optimal separation the entire data. This led
to a new method of representing decision functions where the decision functions are a
linear expansion on a basis whose elements are nonlinear functions parameterized by the
SVs (we need one SV for each element of the basis).
1.2.6 THE NATURE OF STATISTICAL LEARNING THEORY[6]
The aim of this book is to discuss the fundamental ideas which lie behind the statistical
theory of learning and generalization. It considers learning as a general problem of
function estimation based on empirical data. These include the setting of learning
problems based on the model of minimizing the risk functional from empirical data .
a comprehensive analysis of the empirical risk minimization principle including
necessary and sufficient conditions for its consistency non-asymptotic bounds for the risk
achieved using the empirical risk minimization principles for controlling the
generalization ability of learning machines using small sample sizes based on these
bounds the Support Vector methods that control the generalization ability when
estimating function using small sample size.
1.2.7 SUPPORT VECTOR MACHINES, NEUTRAL NETWORKS AND FUZZY LOGIC
MODELS [7]
This is the first textbook that provides a thorough, comprehensive and unified
introduction to the field of learning from experimental data and soft computing. Support
vector machines (SVMs) and neural networks (NNs) are the mathematical structures, or
models, that underlie learning, while fuzzy logic systems (FLS) enable us to embed
structured human knowledge into workable algorithms. The book assumes that it is not
only useful, but necessary, to treat SVMs, NNs, and FLS as parts of a connected whole..
This approach enables the reader to develop SVMs, NNs, and FLS in addition to
understanding them.

1.2.8 IMAGE COMPRESSION WITH NEURAL NETWORKS [8]

5
In this paper new technology such as neural networks and genetic algorithms are being
developed to explore the future of image coding. Successful applications of neural
networks to vector quantization have now become well established, and other aspects of
neural network involvement in this area are stepping up to play significant roles in
assisting with those traditional technologies. This paper presents an extensive survey on
the development of neural networks for image compression which covers three
categories: direct image compression by neural networks; neural network implementation
of existing techniques, and neural network based technology which provide improvement
over traditional algorithms.
1.2.9 NEURAL NETWORKS BY SIMON HAYKINS[9]
The author of this book has briefed about the concepts of SVM. How SVM is used in
pattern recognition. This book also gives the information about the generalization ability
of a linear SVM and also about the kernels used.
CHAPTER 2
BACKGROUND THEORIES
2.1 IMAGE COMPRESSION:
Image compression is the application of Data compression on digital images. In
effect, the objective is to reduce redundancy of the image data in order to be able to store or
transmit data in an efficient form. Image compression is minimizing the size in bytes of a
graphics file without degrading the quality of the image to an unacceptable level. The reduction
in file size allows more images to be stored in a given amount of disk or memory space. It also
reduces the time required for images to be sent over the Internet or downloaded from Web pages.
2.1.1 APPLICATIONS:
Currently image compression is recognized as an Enabling Technology. Its been used in
following applications,
6
Image compression is the natural technology for handling the increased spatial
resolutions of todays imaging sensors and evolving broadcast television standards.
Plays a major role in many important and diverse applications including tele video
conferencing, remote sensing, document and medical imaging, facsimile transmission.
Its also very useful in control of the remotely piloted vehicles in military, space and
hazardous waste management applications.
2.1.2 NEED FOR COMPRESSION:
One of the important aspects of image storage is its efficient compression. To make
this fact clear let's see an example. An image, 1024 pixel x 1024 pixel x 24 bit, without
compression, would require 3 MB of storage and 7 minutes for transmission, utilizing a high
speed, 64 Kbps, ISDN line. If the image is compressed at a 10:1 compression ratio, the storage
requirement is reduced to 300 KB and the transmission time drops to under 6 seconds. Seven 1
MB images can be compressed and transferred to a floppy disk in less time than it takes to send
one of the original files, uncompressed, over an AppleTalk network.
In a distributed environment large image files remain a major bottleneck within systems.
Compression is an important component of the solutions available for creating file sizes of
manageable and transmittable dimensions. Increasing the bandwidth is another method, but the
cost sometimes makes this a less attractive solution.
At the present state of technology, the only solution is to compress multimedia data
before its storage and transmission, and decompress it at the receiver for play back. For example,
with a compression ratio of 32:1, the space, bandwidth, and transmission time requirements can
be reduced by a factor of 32, with acceptable quality.
The figures in Table 1 show the qualitative transition from simple text to full-motion
video data and the disk space, transmission bandwidth, and transmission time needed to store and
transmit such uncompressed data.
Table 2.1 Multimedia data types and uncompressed storage space, transmission bandwidth, and transmission
time required. The prefix kilo- denotes a factor of 1000 rather than 1024.
Multimedi
a data
Size/Duration Bits/pixel
(or)
Bits/Sample
Uncompressed
size (B for
Bytes)
Transmission
Bandwidth
(b for bits)
Transmission
time.(using a
28.8k
7
modem)
A page of
text
11'' x 8.5'' Varying
resolution
4-8kB 32-64kb/page 1.1-2.2 sec
Telephone
quality
speech
10 sec 8bps 80kB 64kb/sec 22.2 sec
Gray scale
image
512*512 8bpp 262kB 2.1 Mb/image 1 min 13 sec
Colour
image
512*512 24bpp 286kB 6.29Mb/image 3 min 39sec
Medical
image
2048*1680 12bpp 5.26MB 41.3Mb/image 23min 54 sec
SHD image 2048*2048 24bpp 12.58MB 100Mb/image 58 min 15sec
Full Motion
Video
640*480,1
min (30
frames/sec)
24bpp 1.66GB 221Mb/sec 5 days 8 hrs
The examples above clearly illustrate the need for sufficient storage space, large
transmission bandwidth, and long transmission time for image, audio, and video data. At the
present state of technology, the only solution is to compress multimedia data before its storage
and transmission, and decompress it at the receiver for play back. For example, with a
compression ratio of 32:1, the space, bandwidth, and transmission time requirements can be
reduced by a factor of 32, with acceptable quality.
2.1.3 COMPRESSION PRINCIPLE:
A common characteristic of most images is that the neighboring pixels are correlated and
therefore contain redundant information. The foremost task then is to find less correlated
representation of the image.
Image compression addresses the problem of reducing the amount of data required to
represent a digital image. The underlying basis of the reduction process is the removal of
redundant data. From a mathematical viewpoint, this amounts to transforming a 2-D pixel array
into a statistically uncorrelated data set. The transformation is applied prior to storage and
transmission of the image. The compressed image is decompressed at some later time, to
reconstruct the original image or an approximation to it.
Two fundamental components of compression are redundancy and irrelevancy reduction.
Redundancy reduction aims at removing duplication from the signal source (image/video).
Irrelevancy reduction omits parts of the signal that will not be noticed by the signal receiver,
8
namely the Human Visual System (HVS). In general, three types of redundancy can be
identified:
Spatial Redundancy or correlation between neighboring pixel values.
Spectral Redundancy or correlation between different color planes or spectral
bands.
Temporal Redundancy or correlation between adjacent frames in a sequence of
images (in video applications).
Image compression research aims at reducing the number of bits needed to represent an
image by removing the spatial and spectral redundancies as much as possible.
The best image quality at a given bit-rate (or compression rate) is the main goal of image
compression. However, there are other important properties of image compression schemes.
Scalability generally refers to a quality reduction achieved by manipulation of the bits-stream
or file (without decompression and re-compression). Other names for scalability are progressive
coding or embedded bit-streams. Despite its contrary nature, scalability can also be found in
lossless codecs, usually in form of coarse-to-fine pixel scans. Scalability is especially useful for
previewing images while downloading them (e.g. in a web browser) or for providing variable
quality access to e.g. databases. There are several types of scalability:
Quality progressive or layer progressive: The bit-stream successively refines the
reconstructed image.
Resolution progressive: First encode a lower image resolution; then encode the difference
to higher resolutions.
Component progressive: First encode grey; then color.
Region of interest coding: Certain parts of the image are encoded with higher quality than others.
This can be combined with scalability (encode these parts first, others later).
Meta information: Compressed data can contain information about the image which can be used
to categorize, search or browse images. Such information can include color and texture statistics,
small preview images.
9
The quality of a compression method is often measured by the Peak signal-to-noise ratio.
It measures the amount of noise introduced through a lossy compression of the image. However,
the subjective judgment of the viewer is also regarded as an important, perhaps the most
important measure.
2.1.4 CLASSIFICATION OF COMPRESSION TECHNIQUE:
Two ways of classifying compression techniques are mentioned here.
(a) Lossless vs. Lossy compression:
In lossless compression schemes, the reconstructed image, after compression, is
numerically identical to the original image. However lossless compression can only to
achieve a modest amount of compression. An image reconstructed following lossy
compression contains degradation relative to the original. Often this is because the
compression scheme completely discards redundant information. However, lossy
schemes are capable of achieving much higher compression. Under normal viewing
conditions, no visible loss is perceived (visually lossless).
Compressing an image is significantly different than compressing raw binary data. Of
course, general-purpose compression programs can be used to compress images, but the
result is less than optimal. This is because images have certain statistical properties,
which can be exploited by encoders specifically designed for them. Also, some of the
finer details in the image can be sacrificed for the sake of saving a little more bandwidth
or storage space. Lossy compression methods are especially suitable for natural images
such as photos in applications where minor (sometimes imperceptible) loss of fidelity is
acceptable to achieve a substantial reduction in bit rate.
A text file or program can be compressed without the introduction of errors, but only up
to a certain extent. Lossless compression is sometimes preferred for artificial images such
as technical drawings, icons or comics. This is because lossy compression methods,
especially when used at low bit rates, introduce compression artifacts. Lossless
compression methods may also be preferred for high value content, such as medical
imagery or image scans made for archival purposes. This is called Lossless compression.
Beyond this point, errors are introduced. In text and program files, it is crucial that
compression be lossless because a single error can seriously damage the meaning of a
10
text file, or cause a program not to run. In image compression, a small loss in quality is
usually not noticeable. There is no "critical point" up to which compression works
perfectly, but beyond which it becomes impossible. When there is some tolerance for
loss, the compression factor can be greater than it can when there is no loss tolerance. For
this reason, graphic images can be compressed more than text files or program.
The information loss in lossy coding comes from quantization of the data. Quantization
can be described as the process of sorting the data into different bits and representing
each bit with a value. The value selected to represent a bit is called the reconstruction
value. Every item in a bit has the same reconstruction value, which leads to information
loss (unless the quantization is so fine that every item gets its own bit).
(b) Predictive vs. Transform coding: In predictive coding, information already sent or
available is used to predict future values, and the difference is coded. Since this is
done in the image or spatial domain, it is relatively simple to implement and is
readily adapted to local image characteristics. Differential Pulse Code Modulation
(DPCM) is one particular example of predictive coding. Transform coding, on the
other hand, first transforms the image from its spatial domain representation to a
different type of representation using some well-known transform and then codes the
transformed values (coefficients). This method provides greater data compression
compared to predictive methods, although at the expense of greater computation
2.1.5 IMAGE COMPRESSION MODEL:
The block diagram of the image compression model is given in fig 2.1
Figure 2.1 Image Compression Model
2.1.5.1 SOURCE ENCODER:
11
SOURCE
ENCODER
CHANNEL
ENCODER
CHANNEL
CHANNEL
DECODER
SOURCE
DECODER
The source encoder is responsible for reducing the coding; inter pixel or psycho visual
redundancies in the input image. In the first stage of source encoding process, the mapper
transforms the input data into a format designed to reduce the inter pixel redundancies in the
input image. The second stage or quantizer block reduces the accuracy of the mappers output in
accordance with some pre established fidelity criterion. This stage reduces the psycho visual
redundancies of the input image. In the third and final stage of the source encoder the symbol
creates a fixed or variable length code to represent the mapped and quantized data set.
Figure 2.2 Source Encoder
2.1.5.2 SOURCE DECODER:
The source decoder contains only two components a symbol decoder and inverse mapper.
These blocks perform in reverse order the inverse operation of the source encoders symbol
encoder and mapper block.
2.1.5.3 CHANNEL ENCODER &DECODER
The channel encoder and decoder play an important role in the overall encoding-decoding
process when the channel in fig 2.1 is noisy or prone to error. They are designed to reduce the
impact of channel noise by inserting a controlled form of redundancy into the source encoded
data. As the output of the source encoder retains little redundancy, it would be highly sensitive to
transmission noise without the addition of this controlled redundancy.
2.2.1 COMPRESSION RATIO:
The compression ratio is defined as the ratio of original uncompressed image to the compressed
image.
12
MAPPER
QUANTIZER
SYMBOL
ENCODER
2.2.2 BITS PER PIXEL:
Bits per pixel is defined as the ratio of the number of bits required to encode the image to the
number of pixel in an image.
2.2.3 ENTROPY:
Entropy is the measure of average information in an image.
Where pi= probability of the ith gray level=
nk= Total number of pixels with gray level k
L=Total number of gray levels.
2.2.4 PSNR:
The peak signal to noise ratio is defined as
Where Xij and Xij are the original and reconstructed pixel values at the location (i, j)
respectively, and (MN) is the image size.
2.3 IMAGE COMPRESSION TECHNIQUES:
2.3.1 JPEG: DCT BASED IMAGE CODING STANDARD:
The DCT can be regarded as a discrete-time version of the Fourier-Cosine series. It is a
close relative of DFT, a technique for converting a signal into elementary frequency components.
Thus DCT can be computed with a Fast Fourier Transform (FFT) like algorithm in O(n log n)
13
Eq
Eq 2.4
Eq 2.3
Eq 2.2
operations. Unlike DFT, DCT is real-valued and provides a better approximation of a signal with
fewer coefficients. The DCT of a discrete signal x(n), n=0, 1, .. , N-1 is defined as:
where, (u) = 0.707 for u = 0 and
= 1 otherwise.
JPEG established the first international standard for still image compression where the encoders
and decoders are DCT-based. The JPEG standard specifies three modes namely sequential,
progressive, and hierarchical for lossy encoding, and one mode of lossless encoding. The
`baseline JPEG coder' which is the sequential encoding in its simplest form, will be briefly
discussed here. Fig. 2.3 and 2.4 show the key processing steps in such an encoder and decoder
for grayscale images. Color image compression can be approximately regarded as compression
of multiple grayscale images, which are either compressed entirely one at a time, or are
compressed by alternately interleaving 8x8 sample blocks from each in turn.
The original image block is recovered from the DCT coefficients by applying the inverse discrete
cosine transform (IDCT), given by:
Where, (u) = 0.707 for u = 0 and
= 1 otherwise.
Steps in JPEG Compression:
1. If the color is represented in RGB mode, translate it to YUV.
2. Divide the file into 8 X 8 blocks.
14
Eq 2.5
Eq 2.6
3. Transform the pixel information from the spatial domain to the frequency domain with the
Discrete Cosine Transform.
4. Quantize the resulting values by dividing each coefficient by an integer value and
rounding off to the nearest integer.
5. Look at the resulting coefficients in a zigzag order. Follow by Huffman coding
Figure 2.3 Encoder block diagram.
Figure 2.4 Decoder block diagram
2.3.2 BASIC CONCEPTS OF SVM:
Support Vector Machine is a universal learning machine. It has its roots in neural networks and
statistical learning theory.
2.3.2.1 MACHINE LEARNING:
Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow
computers to learn. What this means in most cases is that an algorithm is given a set of data and
infers information about the properties of the dataand that information allows it to make
15
predictions about other data that it might see in the future. This is possible because almost all
nonrandom data contains patterns, and these patterns allow the machine to generalize. In order to
generalize, it trains a model with what it determines are the important aspects of the data.
To understand how models come to be, we consider a simple example in the otherwise complex
field of email filtering. Suppose we receive a lot of spam that contains the words online
pharmacy. As a human being, we are well equipped to recognize patterns, and can quickly
determine that any message with the words online pharmacy is spam and should be moved
directly to the trash. This is a generalization we have in fact created a mental model of what is
spam.
There are many different machine-learning algorithms, all with different strengths and suited to
different types of problems. Some, such as decision trees, are transparent, so that an observer can
totally understand the reasoning process undertaken by the machine. Others, such as neural
networks, are black box meaning that they produce an answer, but its often very difficult to
reproduce the reasoning behind it
2.3.2.2 SUPPORT VECTOR MACHINE:
Support vector machines (SVM), introduced by Vapnik and coworkers in 1992, and has been
noted as one of the best classifiers during the past 20 years. It is popular in bioinformatics, text
analysis and pattern classification. As a learning method support vector machine is regarded as
one of the best classifiers with a strong mathematical foundation. During the past decade, SVM
has been commonly used as a classifier for various applications
The handling of high feature dimensionality and the labeling of training data are the two major
challenges in pattern recognition. To handle the high feature dimensionality, there are two major
approaches. One is to use special classifiers which are not sensitive to dimensionality, for
example, SVM algorithm.
2.3.2.3 LINEAR CLASSIFICATION PROBLEM:
Most matrimonial sites collect a lot of interesting information about their members, including
demographic information, interests, and behavior. Imagine that this site collects the following
information:
16
Age
List of interests
Location
Qualification
Furthermore, this site collects information about whether two people have made a good match,
whether they initially made contact, and if they decided to meet in person. This data is used to
create the matchmaker dataset.
Each row has information about a man and a woman and, in the final column, a 1or a 0 to
indicate whether or not they are considered a good match. For a site with a large number of
profiles, this information might be used to build a predictive algorithm that assists users in
finding other people who are likely to be good matches. It might also indicate particular types of
people that the site is lacking, which would be useful in strategies for promoting the site to new
members. Lets take only the parameter ages and give the match information to illustrate how the
classifiers work, since two variables are much easier to visualize.
2.3.2.4 SVM IN LINEAR CLASSIFICATION:
The main idea of SVM is to construct a Hyper plane as the decision surface in such a way that
the margin of separation between the positive and negative examples is maximized. SVM basic
idea, which is to map the data into some other dot product space (called the feature space)
Consider a two-class linearly separable classification problem

Figure2.5.Linearly Separable Classification
17
Let {x1, ..., xn} be our data set and let di= {1,-1} be the class label of xi .The decision boundary
should classify all points correctly. The decision boundary is the Hyperplane. The equation of the
Hyperplane is given by W
T
.X+b=0
Where x is the input vector and w is the adjustable weight vector, b is the bias. The problem here
is there can be many decision boundaries as shown in figure 2.6 (a), 2.6(b) and 2.7(c)
(a)
18
(b)
(c)
Figure 2.6 Decision Boundaries that Can Be Formed
The decision boundary should be as far away from the data of both classes as possible. Therefore
we should maximize the margin. Margin is the width the boundary can be increased by before it
hits a data point. The positive plane that acts as a margin for positive class is given by
{X :<W
T
.X>+b =+1}
The negative plane which acts as the margin for negative class is
{X :<W
T
.X>+b =-1}
Hence we classify as (+1 if {X :<W
T
.X>+b =+1}) and (-1 if {X :<W
T
.X>+b =-1}) The vector
w is perpendicular to both planes. Margin width is 2 / |w|. So to maximize margin we have to
minimize the value of | w | .This is done by many ways. The trick often used is the Lagrangian
formulation of the problem.
19
Figure 2.7 Support Vectors and the Hyperplane
Support Vectors are those points which the margin pushes up against the Hyperplane. The
particular data points for which the following equations is satisfied with the equality sign are
called the support vectors, hence the name support vector machine. These vectors play a
prominent role in the operation of this class of learning machine. In conceptual terms the support
vectors are those points that lie closest to the Hyperplane are the most difficult to classify. As
such they have the direct bearing on the optimum location of the decision surface.
2.3.2.5 SOLUTION BY LAGRANGIAN MULTIPLIERS:
The Lagrangian is written:
] 1 ) b x w ( y [ w w 5 . 0 ) , b , w ( L
i
T
l
1 i
i i
T
+

where the
i
are the Lagrange multipliers. This is now an optimization problem without
constraints where the objective is to minimize the Lagrangian L (w,b,).
2.3.2.6 NON-SEPARABLE CLASSIFICATION:
There is no line that can be drawn between the two classes that separates the data without
misclassifying some data points. Now the aim is to find the hyperplane that makes the smallest
number of errors. Non-negative slack variables
1
,
2,

3,

l
are

introduced. These measure the
20
deviation of the data from the maximal margin, thus it is desirable that the
i
be as small as
possible.
The optimization problem is now:
+
l
1 i
i
2
C || w || 5 . 0 ) w , x ( f
Here C is a design parameter called the penalty parameter. The penalty parameter controls the
magnitude of the
i
An increase in C penalizes larger errors (large
i
). However this can be
achieved only by increasing the weight vector norm W (that we want to minimize). At the same
time an increase in W does not guarantee smaller
i

Figure2.8. Non Linearly Separable Classification
2.3.2.7 Function Approximation by SVM:
Regression is an extension of the non-separable classification such that each data point can be
thought of as being in its own class.
We are now approximating functions of the form

) x ( w ) w , x ( f
i
N
1 i
i

where the functions ) x (

i
are termed kernel functions(basis functions) and N is the number of
support vectors.
Vapniks linear loss functions with -insensitivity zone as a measure of the error of
approximation:
21

Thus, the loss is equal to 0 if the difference between the predicted f(x;w) and the measured value
is less than . Vapniks -insensitivity loss function defines an tube such that if the predicted
value is within the tube the error is zero. For all other predicted points outside the tube, the error
equals the magnitude of the difference between the predicted value and the radius of the tube.
Error=y- f(x,w)
The total risk or error is given by:

| b x w y | * L / 1 R
i
T
l
1 i
i emp
Now The goal is now to minimize R from the definition of (
i
,
i
*
) for data outside the
insensitivity tube :
|y- f(x,w)|-= for data above tube
|y- f(x,w)|+=
*
for data below tube
so our optimization problem is now to find w which minimizes the risk or error given by

) ( C w 5 . 0 R
l
1 i
*
i
l
1 i
i
2
, , w
*

+ +
Where
i
and
i
*
are slack variables for measurements above and below an -tube respectively
and x is a Gaussian kernel. Forming the Lagrangian and finding out partial derivative w.r.to w:

i
l
1 i
i i
x ) ( w

Similarly finding partial derivative w.r.to b,
i
,
i
*
We obtain matrix notation in the form of
Min
T T
f H 5 . 0 ) ( L
Where ] 1 x x [ H
T
+
22

T
l
2
1
l
2
1
y
...
y
y
y
...
y
y
f
1
1
1
1
1
1
1
1
1
1
1
]
1
+
+
+

Our final goal is to solve non-linear regression problems. i.e problems of the type
f(x;w) = w
T
G(x)+b
Where G(x) is a non-linear mapping that maps input space x to feature space G(*) The mapping
of G(x) is normally the RBF design matrix given by:

,
_
) c , x ( G ... ) c , x ( G
... ... ...
) c , x ( G ... ) c , x ( G
G
1 l 11 l
l 1 1 1
Where G(x, c) is the kernel function. Typically a Gaussian kernel function is used given by (in 1-
dimension):
] )
) c x (
( 5 . 0 exp[ ) , c , x ( G
2

Where x is the spacial coordinate
c is the centre of the Gaussians
is the Gaussian width (or shape parameter)
To solve non-linear regression problems the only change required is to the Hessian matrix and is
given by

,
_
) c , x ( G ) c , x ( G
) c , x ( G ) c , x ( G
H
The weights vector w is found from

*
w
Note that when a positive definite kernel (such as Gaussian or complete polynomial) is used the
bias b equals zero.

23
2.3.2.8 APPLICATIONS OF SUPPORT VECTOR MACHINES
Since support-vector machines work well with high-dimensional datasets, they are most often
applied to data-intensive scientific problems and other problems that deal with very complex sets
of data. Some examples include:
Classifying facial expressions
Detecting intruders using military datasets
Predicting the structure of proteins from their sequences
Handwriting recognition
Determining the potential for damage during earthquakes
Digital watermarking
Image compression
CHAPTER 3
3.1 PROGRAMMING METHODOLOGY:
24
Most image compression algorithms operate in the frequency domain. That is, the image is first
processed through some frequency analyzing function, further processing is applied onto the
resulting coefficients and the results generally encoded using an entropy encoding scheme such
as Huffman coding. The JPEG image compression algorithm is an example of an algorithm of
this type. The first step of the JPEG algorithm is to subdivide the image into 88 blocks then
apply the DCT to each block. Next quantization is applied to the resulting DCT coefficients. This
is simply dividing each element in the matrix of DCT coefficients by a corresponding element in
a quantizing matrix. The effect of reduces the value of most coefficients, some of which vanish
(i.e. their value becomes zero) when rounding is applied. Huffman coding is used to encode the
coefficients.
In this chapter the image is transformed into the frequency domain and applies SVM to the
frequency components. The Discrete Cosine Transform (DCT) is used as it has properties which
are exploited in SVM learning. The basic idea is to transform the image using the DCT, use
SVM learning to compress the DCT coefficients and use Huffman coding to encode the data as a
stream of bits
The algorithm presented here uses the discrete cosine transform. The DCT has properties which
make it suitable to SVM learning. SVM learning is applied to the DCT coefficients. Before the
SVM learning is applied the DCT coefficients are processed in such a way as to make the trend
of the DCT curve more suitable to generalization by a SVM.As the DCT is fundamental to the
algorithm a detailed description follows
3.2 DESCRIPTIONS:
3.2.1 Input image:
The input image that is chosen is required to be a gray scale image with intensity levels 0-255.
The input image chosen depends upon the application where the compression is required
3.2.2 DISCRETE COSINE TRANSFORM:
25
The DCT has properties making it the choice for a number of compression schemes. It is the
basis for the JPEG compression scheme The DCT is a transform that maps a block of pixel color
values in the spatial domain to values in the frequency domain.
The DCT of a discrete signal x(n), n=0, 1, .. , N-1 is defined as:
= 1 otherwise.
The DCT is more efficient on smaller images. When the DCT is applied to large images, the
rounding effects when floating point numbers are stored in a computer system result in the DCT
coefficients being stored with insufficient accuracy. The result is deterioration in image quality.
As the size of the image is increased, the number of computations increases disproportionately.
image is subdivided into 88 blocks. Where an image is not an integral number of 88 blocks,
the image can be padded with white pixels (i.e. extra pixels are added so that the image can be
divided into an integral number of 8 8 blocks. The 2-dimensional DCT is applied to each block
so that an 88 matrix of DCT coefficients is produced for each block. This is termed the DCT
Matrix. The top left component of the DCT matrix is termed the DC coefficient and can be
interpreted as the component responsible for the average background colour of the block.The
remaining 63 components of the DCT matrix are termed the AC components as they are
frequency components The DC coefficient is often much higher in magnitude than the AC
components in the DCT matrix The original image block is recovered from the DCT coefficients
by applying the inverse discrete cosine transform (IDCT), given by:
= 1 otherwise
3.2.3 TRANSFORMATION OF THE DCT MATRIX TO 1-D(ZIG-ZAG
TRANSFORMATION):
26
Eq 3.1
Eq 3.2
The elements of the DCT Matrix are mapped using the zig-zag sequence shown in Figure 6.2 to
produce a single row of numbers. That is a single row of numbers is collected as the zig-zag trail
is followed in the DCT matrix. This will produce a row of 64 numbers where the magnitude
tends to decrease traveling down the row of numbers.
Figure 3.1: The zig-zag pattern applied to a block of DCT Coefficients
3.2.4 COMBINING SVM WITH DCT
The 1-dimensional row of DCT coefficients is used as the training data for a SVM. SVM will
produce the minimum number of support vectors required to generalize the training data within a
predefined error (the -tube). Thus it is expected that when the row of DCT coefficients are used
as training data for the SVM, a lower number of support vectors will be required in order to
recover the DCT coefficients within the predefined error. Examination of the input data (i.e. the
DCT coefficients) reveals that the magnitudes of the coefficients are generally decreasing
27
traveling down the row of input data, however the sign (positive or negative) appears to be
random. This has the consequence that two coefficients next to each other can be of similar
magnitude but opposite sign causing a large swing in the input data. If the sign of each DCT
coefficient is ignored when used as input data to the SVM, there is the problem of how to re-
assign the signs when the DCT coefficients have been recovered.
The SVM learning process selects the minimum number of training points to use as the centers
of the Gaussian kernel functions in an RBF network in order for the function to be approximated
within the insensitivity zone. These selected training points are the support vectors. The
insensitivity zone drawn around the resulting function. When the penalty parameter C is infinite
the support vectors will always lie at the edge of the zone. There are only three parameters which
affect the compression which must be defined before learning can begin. These are the maximum
allowed error termed the insensitivity zone in SVM terminology, the penalty parameter C
and the Gaussian shape parameter.
3.2.4.1 QUADRATIC PROGRAMMING
Quadratic Programming deals with functions in which the x
i
are raised to the power of 0, 1, or 2.
The goal of Quadratic Programming is to determine the x
i
for which the function L() is a
minimum. The system is usually stated in Matrix and vector form.
A Quadratic program is an optimization problem with a Quadratic Objective and linear con-
straints
Minimize L() = (1/2)
T
H + f
T

Subject to A*x<=b
Which is usually further defined by a number of constraints (The 1/2 factor is included in the
quadratic term to avoid the appearance of a factor of 2 in the derivatives). L() is called the
objective function, H is a symmetric matrix called the Hessian matrix and f is a vector of
constants. This is a constrained minimization problem with quadratic function and linear
inequality constraints Where
,
_
) c , x ( G ) c , x ( G
) c , x ( G ) c , x ( G
H
28
and G(x) is given by the Gaussian kernel function
T
l
2
1
l
2
1
y
...
y
y
y
...
y
y
f
1
1
1
1
1
1
1
1
1
1
1
]
1
+
+
+

3.2.4.2 KERNEL FUNCTION

The relationship between the kernel function K and the mapping (.) is
K(x,y)=<(x),(y)>
This is known as the kernel trick In practice, we specify K thereby specifying (.) indirectly
instead of choosing (.) Intuitively, K(x,y) represents our desired notion of similarity between
data x and y and this is from our prior knowledge K(x,y) needs to satisfy a technical condition
(Mercer condition) in order for (.) to exist
Linear operation in the feature space is equivalent to non-linear operation in input space
The classification task can be easier with a proper transformation
Transform xi to a higher dimensional space is to
Input space: the space containing xi
Feature space: the space of (xi) after transformation
Figure 3.2: Transformation of input space to future space

Gaussian kernel function is used given by (in 1-dimension):
29
] )
) c x (
( 5 . 0 exp[ ) , c , x ( G
2

Where
x is the spacial coordinate
c is the centre of the Gaussians
is the Gaussian width (or shape parameter)
3.2.5 THE INVERSION BIT
The inversion bit indicates which of the recovered points should be inverted (i.e. multiplied by
-1) so that they are negative that is positive points in Figure 6.4 (b) that were originally
negative are made negative by multiplying by -1 if the inversion bit is set. The inversion bit is a
single 0 or 1. It is the sign of the corresponding input data. Each input data has an inversion
bit
After a block has been processed by the SVM, some the recovered DCT coefficients may have a
magnitude lower than the maximum error defined for the SVM. If these components had an
inversion bit of 1 this can be set to 0 as the sign of coefficients with small magnitude does not
affect the final recovered image. Put another way, inversion bits for very small magnitude DCT
coefficients do not contain significant information required for the recovery of the image.
3.2.6 ENCODING DATA FOR STORAGE
For each block weights and support vectors are required to be stored. The support vectors are the
Gaussian centers. In our algorithm we combine the weights with the support vectors so that each
block has the same number of weights as DCT coefficients. Where a weight has no
corresponding support vector the value of the weight is set to zero. That is the only non-zero
weights are weights for which a training point has been chosen to be a support vector by the
support vector machine. The next step is to quantize the weights.
3.2.6.1 QUANTIZATION
Quantizing involves reassigning the value of the weight to one of limited number of values. To
quantize the weights the maximum and minimum weight values (for the whole image) are found
30
and the number of quantization levels are pre-defined. The number of quantization levels chosen
is a degree of freedom in the algorithm.
The steps taken to quantize the weights are:
1. Find the maximum and minimum weight values. Call these max and min.
2. Find the difference (d) between quantization levels by d=max-min/n where n is the number of
quantization levels.
3. Set lowest quantization level q1= min.
4. Set remaining quantization levels by q
m
=q
m-1
+d, until qn = max.
5. Reassign each weight the value of the closest matching quantization level q
m
The inversion bits are now combined with the weights as follows. After quantization,
the minimum quantization level is subtracted from each weight. This will ensure that all weights
have a positive value. An arbitrary number is added to all weights (the same number is added to
all numbers) making all weights positive and non-zero. To recover the weights both the
minimum quantization level and the arbitrary number must be stored. Each individual weight has
an associated inversion bit. The inversion bit is combined with its corresponding weight to
making the value of the weight negative if the inversion bit is 1, otherwise it is positive. Where
the weight is not a support vector the inversion data is discarded. This introduces a small error
when the image is decompressed, but significantly increases compression. The above steps
introduce many zero values into the weight data. By setting inversion bits from 1 to 0 when
the associated DCT is less than the error many more zeros are introduced.
3.2.6.2 HUFFMAN ENCODING
The quantized weights are encoded using a Huffman encoding. Huffman coding is an entropy
encoding algorithm used for lossless data compression. it is a variable length code table for
encoding a source symbol
31
CHAPTER 4
4.1 FORMULATION OF THE APPROACH:
The image is first sub-divided into 88 blocks. The 2-dimensional DCT is applied to each block
to produce a matrix of DCT coefficients. The zig-zag mapping is applied to each matrix of DCT
coefficients to obtain a single row of numbers for each original block of pixels. The first term of
each row (the DC component) is separated so that only the AC terms are left. Not all the terms in
the row of AC coefficients are needed since the higher order terms do not contribute significantly
to the image. Exactly how many values are taken is a degree of freedom in the algorithm.
Support vector machine learning is applied to the absolute values of each row of AC terms as
described above and the inversion number for each block is generated. By following this method,
for each original block the Gaussian centers (i.e. the support vectors), the weights and the
inversion number need to stored/transmitted to be able to recover the block. The AC components
are used as training data to a SVM. The SVM learning process selects the minimum number of
training points to use as the centers of the Gaussian kernel functions in an RBF network in order
for the function to be approximated within the insensitivity zone. These selected training points
are the support vectors. A SVM trained on the data above with an error and Gaussian width set
to different values. The SVM was implemented in Matlab with a quadratic programming. This
will return a value for from which we can compute the weights. In order to recover the image
the DC coefficient, the support vectors, the weights and the inversion number are stored. The
next step is to quantize the weights. Quantizing involves reassigning the value of the weight to
one of limited number of values. To quantize the weights the maximum and minimum weight
values (for the whole image) are found and the number of quantization levels are pre-defined.
The number of quantization levels chosen is a degree of freedom in the algorithm. The inversion
bits are now combined with the weights. After quantization, the minimum quantization level is
subtracted from each weight. This will ensure that all weights have a positive value. An arbitrary
number is added to all weights (the same number is added to all numbers) making all weights
positive and non-zero. To recover the weights both the minimum quantization level and the
arbitrary number must be stored. The quantized weights and number of zeros between non zero
weights are Huffman encoded to produce a binary file. The compression of the SVM surface
modeled images was computed from an actual binary file containing all information necessary
32
to recover an approximated version of the original image. To objectively measure image quality,
the signal to noise ratio (SNR) is calculated.
4.2 FLOW CHART:

DIVIDEING IMAGE INTO
8*8 BLOCKS
READING
IMAGE
APPLYING DCT TO EACH BLOCK
OF IMAGE
ZIG-ZAG TRANSFORMATION
QUANTIZATION AND HUFFMAN ENCODING
APPLYING IDCT TO EACH
BLOCK
OUTPUT IMAGE
MODELLING OF DCT
COEFFICIENTS USING SVM
DEQUANTIZATION AND HUFFMAN DECODING
DCT COEFFICIENTS FROM SVM MODEL
33
CHAPTER 5
RESULTS AND ANALYSIS:
In this section simulation results of the performance of image algorithm is being presented and
the results are being compared with the existing JPEG algorithm. In the implementation of
Algorithm for Application of SVM(Regression) Learning and DCT to Image Compression, we
first sub-divide image into 88 blocks. The 2-dimensional DCT is applied to each block to
produce a matrix of DCT coefficients. The zig-zag mapping is applied to each matrix of DCT
coefficients to obtain a single row of numbers for each original block of pixels. The first term of
each row (the DC component) is separated so that only the AC terms are left. Not all the terms in
the row of AC coefficients are needed since the higher order terms do not contribute significantly
to the image. Exactly how many values are taken is a degree of freedom in the algorithm.
Support vector machine learning is applied to the absolute values of each row of AC terms as
described above and the inversion number for each block is generated. By following this method,
for each original block the Gaussian centers (i.e. the support vectors), the weights and the
inversion number need to stored/transmitted to be able to recover the block. The AC components
are used as training data to a SVM. The support vector machine learning used is identical to the
This is a constrained minimization problem with quadratic function and linear inequality
constraints. It is called quadratic programming. This will return a value for from which we can
compute the weights. In order to recover the image the DC coefficient, the support vectors, the
weights and the inversion number are stored. The next step is to quantize the weights. Quantizing
involves reassigning the value of the weight to one of limited number of values. To quantize the
weights the maximum and minimum weight values (for the whole image) are found and the
number of quantization levels are pre-defined. The number of quantization levels chosen is a
degree of freedom in the algorithm. The inversion bits are now combined with the weights. After
quantization, the minimum quantization level is subtracted from each weight. This will ensure
that all weights have a positive value. An arbitrary number is added to all weights (the same
number is added to all numbers) making all weights positive and non-zero. To recover the
weights both the minimum quantization level and the arbitrary number must be stored. The
quantized weights and number of zeros between non zero weights are Huffman encoded to
produce a binary file. The compression of the SVM surface modeled images was computed from
an actual binary file containing all information necessary to recover an approximated version of
34
the original image. To objectively measure image quality, the signal to noise ratio (SNR) is
calculated.
5.1 INPUT IMAGE:
Figure 5.1 Input image (a) Lena of size 128*128 is considered for compression
5.2 RESULTS OBTAINED FOR IMAGE COMPRESSION:
5.2.2 DIFFERENT VALUES OF EPSILON:
5.2.2.1 EPSILON=0.001
The image compression is obtained which can be seen in figures 5.2(e) and input image, plot of
DCT coefficients(for one example block) and plot of Absolute value of DCT coefficients, error
between the output and Desired input(for one example block) was shown in 5.2(a),
5.2(b),5.2(c),5.2(d)
35

(a) (b) (c)

(d) (e)
Figure 5.2 (a) Input image (b)DCT Coefficients(for one example block) (c)Absolute Value of DCT
Coefficients (for one example block) (d) error between the output and Desired input(for one example block)
(e)output image
5.2.2.2 EPSILON=0.01
5.3(b),5.3(c),5.3(d)
36

(a) (b) (c)

(d) (e)
Coefficients(for one example block) (d) error between the output and Desired input(for one example block)
(e)output image
5.2.2.3 EPSILON=0.1
5.4(b),5.4(c),5.4(d)
37

(a) (b) (c)

(d) (e)
Coefficients(for one example block) (d) error between the output and Desired input(for one example block)
(e)output image
38
Table 5.1The number of bits with different Epsilon values, Quantization levels and number of supported
vectors respectively with Compression Ratio and SNR(DB).

5.3 COMPARISON OF THE OBTAINED RESULTS WITH JPEG ALGORITHM:
Analysis:
For the purpose o comparison of the proposed algorithm and the JPEG algorithm we have the
compression ratio for both the images. Since the bound can be set previously before in our
proposed algorithm we set the bound to be different values and we can see that the compressed
images are being compared. But in JPEG it is found to be having lots of error even though the
picture quality is being maintained. The signal to noise ratio was considered for comparison and
the value is found in db as per the formula discussed in chapter 2. It is seen that the signal to
noise ratio is very high in case of our algorithm and hence the image with the information is
highly secured and we can obtain till image compression through this algorithm
39
Different
values of
epsilon()
No of
Quantizatio
n Levels
Number
of
Supported
Vectors
Length
Of
Huffman
Code
Total
Numbe
r of
Bits
Compressio
n Ratio
SN
R(D
B)
0.001 60 16128 64343 131072 2.03 38
0.01 60 16128 36107 131072 3.63 22
0.1 60 8756 22354 131072 5.86 18
5.4 INPUT IMAGE:
Figure 5.5 Input image Lena of size 128*128 is considered for compression
5.2 RESULTS OBTAINED FOR JPEG COMPRESSION:
5.2.1 ANALYSIS:
The image compression is obtained which can be seen in figures 5.6(b),5.6(c),5.6(d) and input
image 5.6(a)
40

(a) (b) (c)

(d)
Figure 5.6 (a) Input image (b)(c)(d)compressed image for quality coefficients 2,5,10 respectively
Table 5.2 The number of bits with different Quality Coefficient, Length of Huffman Coding respectively with
Compression Ratio and SNR(DB).
41
CHAPTER 6
CONCLUSION:

In this project, an
image
compression algorithm which takes advantage of SVM learning was presented. The algorithm
exploits the trend of the DCT coefficients after the image has been transformed from the spacial
domain to the frequency domain via the DCT. SVM learning is used to estimate the DCT
coefficients within a predefined error. The SVM is trained on the absolute magnitude of the DCT
coefficients as these values require less SVs to estimate the underlying function. The net result of
the SVM learning is to compress the DCT coefficients much further than other methods such as
JPEG. The algorithm also defines how the original values are recovered by the introduction of
the inversion number. The inversion number allows us to recover the original sign (i.e., positive
42
Quality
coefficient
Length
Of
Huffman
Code
Total
Numbe
r of
Bits
Compressio
n Ratio
SN
R(D
B)
2 25201 131072 5.2 21.7
5 21264 131072 6.16 19.5
10 19381 131072 6.76 18.2
or negative) of each DCT coefficient so that combined with the magnitude of the coefficient as
estimated by the SVM, a close approximation to the original value of the DCT coefficient is
obtained in order to reconstruct the image. the new method produces better image quality than
the JPEG compression algorithm for compression ratios. Large compression ratios are possible
with the new method while still retaining reasonable image quality.
REFERENCES:
[1] M. H. Hassoun, Fundamentals of Artificial Neural Networks. Cambridge, MA: MIT Press,
1995
[2] C. Amerijckx, M. Verleysen, P. Thissen, and J. Legat, Image compression by self-organized
Kohonen map, IEEE Trans. Neural Networks, vol. 9, pp. 503507, May 1998.
[3] J. Robinson and V. Kecman, The use of support vectors in image compression,Proc. 2nd
Int. Conf. Engineering Intelligent Systems, June 2000.
43
[4] H. Drucker C. J. C. Burges, L. Kaufmann, A. Smola, and V.Vapnik, Support Vector
Regression Machines. Cambridge, MA: MIT Press, 1997, Advances in Neural Information
Processing Systems, pp. 155161.
[5] V. Vapnik, S. Golowich, and A. Smola, Support Vector Method for Function Approximation,
Regression Estimation and Signal Processing. Cambridge, MA: MIT Press, 1997, vol. 9,
Advances in Neural Information Processing Systems.
[6] V. N. Vapnik, The Nature of Statistical Learning Theory. New York:Springer-Verlag, 1995.
[7] V. Kecman, Learning and Soft Computing: Support Vector Machines, Neutral Networks and
Fuzzy Logic Models. Cambridge, MA: MIT Press, 2001
[8] J. Jiang, Image compression with neural networksA survey, Signal Processing: Image
Communication, vol. 14, 1999.
[9]Simon Haykins Neural Networks: A Comprehensive Foundation (2nd
edition)
[10] J. Miano, Compressed Image File Formats. Reading, MA: Addison-Wesley, 1999.
[11] Digital Compression and Coding of Continuous-Tone Still Images,Amer. Nat. Standards
Inst., ISE/IEC IS 10918-1, 1994.
BIO DATA:
Name: rama kishor mutyala
Email : ramkishore_mutyala@yahoo.com
Course: Bachelors of Technology
University: Vellore Institute of Technology University
Branch: Electronics and Communication Engineering
Address: rama kishor mutyala
44
Door no: 2-74,Near ramalayam street
Gandhi Nagar, Vetlapalem
Samalkot Mandal, E.G.Dist
Andhra Pradesh-533434
45

Ram Kit Hes Is

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Ram Kit Hes Is

Hochgeladen von

Copyright:

Verfügbare Formate

CHAPTER 1

where the functions ) x (

3.2.4.2 KERNEL FUNCTION

Das könnte Ihnen auch gefallen