Sie sind auf Seite 1von 61

Image Compression

The term data compression refers to the


process of reducing the amount of data
required to represent a given quantity of
information.
Data are the means by which information is
conveyed.
Because various amounts of data can be used
to represent the same amount of information,
representations that contain irrelevant or
repeated information are said to contain
redundant data.

Data Redundancy
Let n1 and n2 denote the number of information
carrying units in two data sets that represent the
same information
The relative redundancy RD is define as :
1
RD 1
CR
where CR, commonly called
n1 the compression ratio,
CR
is
n2

Data Redundancy
If n1 = n2 , CR=1 and RD=0
R
If n1 >> n2 , C
andR1D
If n1 << n2 , C
and
R 0
D
R

no redundancy
high redundancy
undesirable

A compression ration of 10 (10:1) means that the first


data set has 10 information carrying units (say, bits)
for every 1 unit in the second (compressed) data set.
In Image compression , 3 basic redundancy can be
identified
Coding Redundancy
Interpixel Redundancy
Psychovisual Redundancy

Coding Redundancy
A code is system of symbols used to represent a body of
information or a set of events. Each piece of information or
event is assigned a sequence of code symbols, called a
codeword. The number of symbols in each codeword is its
length. The 8- bit codes that are used to represent the
intensities in most 2-D intensity arrays contain more bits than
are needed to represent the intensities.
Recall from the histogram calculations

h( rk )
nk
p ( rk )

n
n

where p(rk) is the probability of a pixel to have a certain value rk


If the number of bits used to represent rk is l(rk), then

Lav

L 1

l (r
k 0

)( p ( rk )

Coding Redundancy
Example:
7

Lav l (rk )( p (rk )


k 0

2(019) 2(0.25) 3(0.16) ... 6(0.02)


2.7 bits
3
C R 1.11
2
1
RD 1
0.099
1.11

As the preceding example shows, coding


redundancy is present when the codes
assigned to a set of events (such as intensity
values) do not take full advantage of the
probabilities of the events.

Coding redundancy is almost always present


when the intensities of an image are
represented using a natural binary code.

The reason is that most images are


composed of objects that have a regular and
somewhat predictable morphology.

Inter-pixel Redundancy
Also called spatial and temporal redundancy.
Because the pixels of most 2-D intensity arrays are
correlated spatially (i.e. each pixel is similar to or
dependent on neighboring pixels), information is
unnecessarily replicated in the representations of the
correlated pixels.
In a video sequence, temporally correlated pixels also
duplicate information
Here the two pictures have
Approximately the same
Histogram.
We must exploit Pixel
Dependencies.
Each pixel can be estimated
From its neighbors.
Eg. Run Length Coding

In most images, pixels are correlated


spatially and in time. Because most
pixel intensities can be predicted
reasonably well from neighboring
intensities, the information carried by
a single pixel is small.
Much of its visual contribution is
redundant in the sense that it can be
inferred from its neighbors.

Psychovisual Redundancy
One of the simplest ways
to compress a set of data
is to remove superfluous
data from the set. In the
context of digital image.
compression, information
that is ignored by the
human visual system or is
extraneous to the
intended use of an image
are obvious candidates for
omission.

The human visual system is


more
sensitive to edges

Measuring Image
Information
How few bits are actually needed to represent the
information in an image? That is, is there a
minimum amount of data that is sufficient to
describe an image without losing information?
A random event E with probability P (E) is said to
contain
I (E)= log (1/P(E))=-log P(E) units of information.
If P(E)=1, I(E)=0 and no information is attributed
to it. Because no uncertainty is associated with
the even, no information would be transferred by
communicating that the event has occurred.

Given a source of statistically


independent random events from a
discrete set of possible events
{a1,a2,a3..aj} with associated
probabilities {P(a1), P(a2)P(aj)},
the average information per source
output, called the entropy of the
source, is

Fidelity Criteria
The removal of irrelevant visual information involves a
loss of real or quantitative image information.

The error between two


functions is given by:

e( x, y ) f ( x, y ) f ( x, y )

So, the total error between the two images is

[ f ( x, y) f ( x, y)]

M 1 N 1
x 0 y 0

The root-mean-square error averaged over the


whole image is

erms

[ f ( x, y ) f ( x, y )]2
MN

Fidelity Criteria
A closely related objective fidelity criterion is the
mean square signal to noise ratio of the
compressed-decompressed image
M 1 N 1

SNRms

x 0 y 0
M 1 N 1

f ( x, y ) 2

2
[
f
(
x
,
y
)

f
(
x
,
y
)]

x 0 y 0

Fidelity Criteria
While objective fidelity criteria offers a simple and convenient
way to evaluate information loss, decompressed images are
ultimately viewed by humans.
So, measuring image quality by the subjective evaluations of
people is often more appropriate.
This can be done by presenting a decompressed image to a
cross section of viewers and averaging their evaluations.
The evaluations may be made using an absolute rating scale or
by means of side-by-side comparisons of f(x,y) and
(x,y).

Compression Model

The source encoder is responsible for removing


redundancy
(coding, inter-pixel, psycho-visual)
The channel encoder ensures robustness against
channel noise.

Compression Types
Compression
Error-Free Compression
(Loss-less)

Lossy Compression

Lossless Compression
In lossless data compression, the integrity of the data is
preserved.
The original data and the data after compression and
decompression are exactly the same because, in these
methods, the compression and decompression
algorithms are exact inverses of each other: no part of
the data is lost in the process.
Redundant data is removed in compression and added
during decompression.
Lossless compression methods are normally used when
we cannot afford to lose any data.

1. Run Length
Coding

Run-length encoding is probably the simplest method of


compression. It can be used to compress data made of any
combination of symbols. It does not need to know the
frequency of occurrence of symbols and can be very efficient
if data is represented as 0s and 1s.
The general idea behind this method is to replace
consecutive repeating occurrences of a symbol by one
occurrence of the symbol followed by the number of
occurrences.
The method can be even more efficient if the data uses
only two symbols (for example 0 and 1) in its bit pattern and
one symbol is more frequent than the other.

Figure : Run-length encoding example

2. Huffman coding
Huffman coding assigns shorter codes to symbols that occur
more frequently and longer codes to those that occur less
frequently. For example, imagine we have a text file that uses
only five characters (A, B, C, D, E).
Before we can assign bit patterns to each character, we
assign each character a weight based on its frequency of use.
In this example, assume that the frequency of the characters
is as shown in Table

Figure :Huffman coding

A characters code is found by starting at the root and


following the branches that lead to that character. The code
itself is the bit value of each branch on the path, taken in
sequence.

Figure Final tree and code

Encoding
Let us see how to encode text using the code for our five
characters. Figure shows the original and the encoded text.

Figure Huffman encoding

Decoding
The recipient has a very easy job in decoding the data it
receives. Figure shows how decoding takes place.

Figure Huffman decoding

3. Lempel Ziv encoding


Lempel Ziv (LZ) encoding is an example of a
category of algorithms called dictionary-based
encoding. The idea is to create a dictionary (a
table) of strings used during the communication
session. If both the sender and the receiver have
a copy of the dictionary, then previouslyencountered strings can be substituted by their
index in the dictionary to reduce the amount of
information transmitted.

Compression
In this phase there are two concurrent events: building an
indexed dictionary and compressing a string of symbols.
The algorithm extracts the smallest substring that cannot be
found in the dictionary from the remaining uncompressed
string. It then stores a copy of this substring in the
dictionary as a new entry and assigns it an index value.
Compression occurs when the substring, except for the last
character, is replaced with the index found in the dictionary.
The process then inserts the index and the last character of
the substring into the compressed string.

Example- Consider the following 4 X 4, 8 bit image


of a vertical edge
39

39

126

126

39

39

126

126

39

39

126

126

39

39

126

126

Image is encoded by processing its pixels in a


left-to-right, top-to-bottom manner

Currently
Recognized
Sequence

Pixel Being
Processed

Encoded Output

Dictionary
Location (Code
Word)

Dictionary Entry

39
39

39

39

256

39-39

39

126

39

257

39-126

126

126

126

258

126-126

126

39

126

259

126-39

39

39

39-39

126

256

260

39-39-126

126

126

126-126

39

258

261

126-126-139

39

39

39-39

126

39-39-126

126

260

262

39-39-126-126

126

39

126-39

39

259

263

126-39-39

39

126

39-126

126

257

264

39-126-126

126

126

Decompression
Decompression is the inverse of the compression
process. The process extracts the substrings from the
compressed string and tries to replace the indexes with
the corresponding
entry in the dictionary, which is empty at first and built
up gradually. The idea is that when an index is received,
there is already an entry in the dictionary corresponding
to that index.

Predictive Coding
A simpler compression approach that
achieves good compression without
significant computational overhead and
can be either error free or lossy.
The approach, commonly referred to as
predictive coding, is based on eliminating
the redundancies of closely spaced pixelsin space and/or time- by extracting and
coding only the new information in each
pixel.
The new information of a pixel is defined
as the difference between the actual and

Loss-less Predictive
The system consists of
an encoder and a decoder, each
Encoding
containing an identical predictor.
The predictor generates the anticipated value of each sample
based on specified number of past samples.
The output of the predictor is then rounded to the nearest
integer, denoted by(n), and used to form the difference or
prediction error.
e(n)= f(n)(n)

Prediction error is then encoded using a variablelength code to generate the next element of the
compressed data stream.
The decoder reconstructs e(n) from the received
variable-length code words and performs the inverse
operation
f(n)=e(n)+
to decompress or recreate the original input
sequence

(n)

Loss-less Predictive
Encoding

Software Research

Lossy Compression
Our eyes and ears cannot distinguish subtle
changes. In such cases, we can use a lossy data
compression method.
These methods are cheaperthey take less time
and space when it comes to sending millions of
bits per second for images and video.
Several methods have been developed using lossy
compression techniques.

Lossy Predictive Encoding


In this encoding technique, we add a quantizer to
the lossless predictive coding model.
The block diagram shows the quantizer, which
replaces the nearest integer function of the error
free encoder. It is inserted between the symbol
encoder and the point at which the prediction
error is formed.
It maps the prediction error into a limited range
of outputs, denoted
, which establish the
amount of compression and distortion that occurs.
The lossy encoders predictor is placed within a
feedback loop, where its input, denoted
is
generated as function of past predictions and
corresponding quantized errors. i.e.

Lossy Predictive coding


Where
is defined as earlier. The closed loop
configuration prevents error buildup at the decoders
output

Delta Modulation
Delta Modulation is a simple but well known form
of lossy predictive coding in which the predictor
and quantizer are defined as

and

Where
is a prediction coefficient (normally
less than 1) and is a positive constant.

Example of delta
The figure below illustrates
modulation
the mechanics of the delta modulation process,
for input sequence
[14,15,14,15,13,15,15,14,20,26,27,28,27,27,29,37,47,62,75,77,78,79,80,81
,82,82] with
=1 and =6.5

The process begins with the error free transfer of


the first input sample to the decoder.
When n=1

The distortions noted in this example are


common to all forms of lossy predictive
coding. The severity of these distortions
depends on a complex set of interactions
between the quantization and prediction
methods employed.

Differential Pulse Code modulation


(DPCM)

Optimal predictors

In many predictive coding applications, the predictor


is chosen to minimize the encoders mean-square
prediction error.
Subject to the constraint that

That is, the optimization criterion is minimal meansquare prediction error, the quantization error is
assumed to be negligible and prediction is
constrained to a linear combination of m previous
samples.
The resulting predictive coding approach is referred

JPEG Compression
One of the most popular and comprehensive
continuous tone, still frame compression
standards is the JPEG standard.
Nearly all digital cameras use a compression
algorithm that is know as JPEGCompression.
It defines three different coding systems1) A lossy baseline coding system which is
based on DCT and is adequate for most
compression applications.
2) an extended coding system for greater
compression, higher precision or
progressive reconstruction applications.
3) a lossless independent coding system for

Fact about JPEG


Compression
JPEG stands for Joint Photographic Experts
Group
Used on 24-bit color files.
Works well on photographic images.
Although it is a lossy compression
technique, it yields an excellent quality
image with high compression rates.
Some of the information is lost, but only
information that is judged to be
insignificant.
43

Steps in JPEG Compression


1. (Optionally) If the color is represented in RGB
mode, translate it to YUV or YCbCr.
2. Divide the file into 8 X 8 blocks.
3. Transform the pixel information from the
spatial domain to the frequency domain with the
Discrete Cosine Transform.
4. Quantize the resulting values by dividing each
coefficient by an integer value and rounding off
to the nearest integer.
5. Look at the resulting coefficients in a zigzag
order. Do a run-length encoding of the
coefficients ordered in this manner.
44

Step 1
We will use acolor space
transformto produce a new vector
whose components
representluminance, Y,and blue and
redchrominance, CbandCr.

The luminance describes the brightness of


the pixel while the chrominance carries
information about its hue.

Step 2
The image is divided into 8 by 8
blocks of pixels.

Since each block is processed


without reference to the others, we'll
concentrate on a single block. In
particular, we'll focus on the block
highlighted below.

Here is the same block blown up so


that the individual pixels are more
apparent. Notice that there is not
tremendous variation over the 8 by 8
block (though other blocks may have
more).

When we apply the color space


transformation to each pixel in our
block we obtain three new blocks,
one corresponding to each
component. These are shown below
where brighter pixels correspond to
larger values.

Step 3
Now we come to the heart of the
compression algorithm.
The DCT (Discrete Cosine Transform)
transforms the data from the spatial
domain to the frequency domain.
The spatial domain shows the amplitude
of the color as you move through space
The frequency domain shows how
quickly the amplitude of the color is
changing from one pixel to the next in an
image file.

The frequency domain is a better


representation for the data because
it makes it possible for you to
separate out and throw away
information that isnt very important
to human perception.
The human eye is not very sensitive
to high frequency changes
especially in photographic images, so
the high frequency data can, to some
extent, be discarded.
51

Forward DCT
puv ,0 u N ,0 v N
For an N X N pixel image
the DCT is an array of coefficients
where
DCTuv ,0 u N ,0 v N
1
N 1
DCTuv
Cu C v x 0
2N

(2 x 1)u
(2 y 1)v
y 0 pxy cos 2 N cos 2 N
N 1

where

1
Cu C v
for u , v 0
2
Cu Cv 1 otherwise
52

The color amplitude information can


be thought of as a wave (in two
dimensions).
Youre decomposing the wave into its
component frequencies.
For the 8 X 8 matrix of color data,
youre getting an 8 X 8 matrix of
coefficients for the frequency
components.
53

We store these coefficients in


another 8 by 8 block as shown:

when we move down or to the right, we encounter coefficients


corresponding to higher frequencies, which we expect to be
less significant.

Step 4
Quantization involves dividing each
coefficient by an integer between 1
and 255 and rounding off.
The quantization table is chosen to
reduce the precision of each
coefficient to no more than
necessary.
The quantization table is carried
along with the compressed file.
55

Of course, the coefficientsFw,u, are real


numbers, which will be stored as integers.
This means that we will need to round the
coefficients; as we'll see, we do this in a
way that facilitates greater compression.
The elements in thequantization
matrixcontrol the compression ratio, with
larger values producing greater
compression.
The quantized DCT coefficients are
computed with

Step 5
This is done so that the
coefficients are in order of
increasing frequency.
The higher frequency
coefficients are more likely to be
0 after quantization.
This improves the compression
of run-length encoding.
Apply run-length encoding
In this way, the sequences of
DCT coefficients are greatly
shortened, which is the goal of
the compression algorithm.
58

Example

Reconstructing the image from the


information is rather straightforward.
The quantization matrices are stored in the
file so that approximate values of the DCT
coefficients may be recomputed.
From here, the (Y, Cb, Cr) vector is found
through the Inverse Discrete Cosine
Transform.
Then the (R, G, B) vector is recovered by
inverting the color space transform.

Das könnte Ihnen auch gefallen