DIP7

Enhancement versus Restoration
Both processes try to improve an image in some predefined

sense
Enhancement:
(1)Manipulating an image in order to take advantage of the
psychophysics of the human visual system.
(2)Techniques are usually heuristic.
(3)Example: Contrast stretching, histogram equalization.
Restoration:
(1)A process that attempts to reconstruct or recover an image
that has been degraded by using some prior knowledge of the
degradation phenomenon.
(2)Involves modeling the degradation process and applying the
inverse process to recover the original image.
(3)A criterion for goodness is required that will recover the
image in an optimal fashion with respect to that criterion.
(4) Example: removal of blur by applying a deblurring
function.
(Linear) Degradation Model
g(m,n) ( f (m,n)*h(m,n)) + (m,n)
G(u,v) H(u,v)F(u,v) + N(u,v)
f(m,n) : Degradation free image
g(m, n) : Observed image
h(m, n) : degradation function
(m, n) : Additive Noise
Problem: Given an observed image, to recover the original
image f (m, n) , using knowledge about the blur function h(m,
n) and the characteristics of the noise .
?
We need to find an image
) , (
) , ( n m f n m f
) , (
n m f
such that the error
is small.
Noise Models
With the exception of periodic interference, we will assume that
noise values are uncorrelated from pixel to pixel and with the
(uncorrupted) image pixel values.
These assumptions are usually met in practice and simplify the
analysis.
With these assumptions in hand, we need to only describe the
statistical properties of noise; i.e., its probability density function
(PDF).
Gaussian noise
The pdf of a Gaussian random variable z is given by:
)
2
) (
exp(
2
1
) (
2
2

z
z p
where z represents (noise) gray value,
is the mean, and is its standard
deviation. The squared standard
deviation
2
is usually referred to as
variance.
For a Gaussian pdf, approximately
70% of its values will be in the range
, and 95% of its values will be in the range
[ ] ) ( ), ( +
[ ] ) 2 ( ), 2 ( +
Rayleigh noise
The pdf of a Rayleigh noise is given by:
'
0
) / ) ( exp( ) (
2
) (
2
b a z a z
b
z p
a z for
a z for <
The mean and variance are given by:
4 / b a +
4
) 4 (
2
b
This noise is one-sided and the density function is skewed.
Erlang(Gamma) noise
The pdf of Erlang noise is given by:
'
0
)! 1 (
) (
1
az
b b
e
b
z a
z p
for z 0 for z 0
for z < 0
where, a > 0, b is an integer and ! represents factorial.
= b/a
2
b/a
2
This noise is one-sided and the density function is
skewed.
Exponential noise
The pdf of exponential
noise is given by:
'
0
) (
az
ae
z p
for z 0
for z <0
where, a > 0.
= 1/a
2
1/a
2
This is a special case of Erlang density with b=1.
Uniform noise
The pdf of uniform noise is given by:
'
0
1
) (
a b
z p
for
a z b
otherwise
= (a+b) /2
2
(b-a)
2
/12
Impulse (salt-and-pepper) noise
(short or bright noise)
The pdf of (bipolar) impulse noise is given by:
'
0
) (
b
a
P
P
z p
for z
a
for z
b
otherwise
Where a , b > 0
Plot of density function of different noise models
Test pattern and illustration of the effect of different types
of noise
Estimation of noise parameters
The noise pdf is usually available from sensor specifications.
Sometimes, the form of the pdf is known from physical modeling.
The pdf (or parameters of the pdf) are also often estimated from
the image.
Typically, if feasible, a flat uniformly illuminated surface is
imaged using the imaging system. The histogram of the resulting
image is usually a good indicator of the noise pdf.
If that is not possible, we can usually choose a small patch of an
image that is relatively uniform and compute a histogram of the
image over that region.
s z
i i
i
z p z ) (
Using the histogram, we can estimate the noise mean and
variance as follows:

s z
i i
i
z p z ) ( ) (
2 2

where z
i
is the gray value of pixel i in S, and p(z
i
) is the
histogram value.
The shape of the histogram identifies the closest pdf match.
The mean and variance are used to solve for the parameters a
and b in the density function.
Restoration in the presence of only noise
In this case, the degradation equation becomes:
g(m,n) f (m,n) + (m,n)
G(u,v) F(u,v) + N(u,v)
Spatial filtering is usually the best method to restore images
corrupted purely by noise. The process is similar to that of image
enhancement.
Mean filters
Arithmetic mean
Let S
ab
be a rectangular window of size a X b. The arithmetic
mean filter computes the average value of the pixels in g(m,n) over
the window S
ab
.
This operation can be thought of as a convolution with a
uniform rectangular mask of size a X b, each of whose values is
1/ab.
This smoothes out variations and noise is reduced.
( , )
1
( , ) ( , )
ab
s t S
f m n g s t
ab
Geometric mean
The geometric mean filter computes the geometric mean of the
pixels in g(m,n) over the window S
ab
.
1
( , )
( , ) ( , )
ab
ab
s t S
f m n g s t
1
1
1
]
This usually results in similar results as the arithmetic mean

filter, with possibly less loss of image detail.
Harmonic mean
The harmonic mean filter computes the harmonic mean of the
pixels in g(m,n) over the window S
ab
.
( , )
( , )
1
( , )
ab
s t S
ab
f x y
g s t
This works well for salt noise, but fails for pepper noise. It also
works well with Gaussian noise.
Contraharmonic mean
The contraharmonic mean filter is given by the expression:
ab
ab
S t s
Q
S t s
Q
t s g
t s g
y x f
) , (
) , (
1
) , (
) , (
) , (
where Q is called order of the filter.
This yields the arithmetic mean filter for Q=0 and the harmonic
mean filter for Q= -1.
For positive values of Q, it reduces pepper noise and for
negative values of Q, it reduces salt noise. It cannot do both
simultaneously.
Order Statistic filters
Order statistic filters are obtained by first ordering (or ranking) the
pixel values in a window S
ab
around a given pixel.
Median Filter
It replaces the values of a pixel by the median of the gray values
in a neighborhood S
ab
of the pixel.
{ } ) , ( ) , (
) , (
t s g median n m f
ab
S t s
Median filters are particularly suited for impulsive/salt &pepper

noise without deblurring the image. They often result in much less
loss of sharp edges in the original image.
Properties of median filter:

Smooths additive white noise
Does not degrade edges
Effective for removing salt & pepper noises alone
or simultaneously
Example of median filter
Max and Min Filter
The Max filter replaces the values of a pixel by the maximum of
the gray values in a neighborhood S
ab
of the pixel.
{ } ) , ( max ) , (
) , (
t s g n m f
ab
S t s
It is used to reduce pepper noise and to find the bright spots in

an image.
The Min filter replaces the values of a pixel by the minimum of
the gray values in a neighborhood S
ab
of the pixel.
{ } ) , ( min ) , (
) , (
t s g n m f
ab
S t s
It is used to reduce salt noise and to find the dark spots in an

image.
Usually, the max and min filters are used in conjunction.
Example
Midpoint Filter
The Midpoint filter replaces the values of a pixel by the
midpoint (average) of the maximum and minimum of the gray
values in a neighborhood S
ab
of the pixel.
{ } { } 2 / ] ) , ( min ) , ( max [ ) , (
) , ( ) , (
t s g t s g n m f
ab ab
S t s S t s
+
This filter works best for randomly distributed noise, like
Gaussian or uniform noise
Alpha-trimmed mean Filter
From the pixel values in a neighborhood S
ab
of the pixel, we first
delete (trim) the d/2 lowest and d/2 highest values. We then
compute the arithmetic mean of the remaining (ab d) values:
ab
S t s
t s g
d ab
n m f
) , (
) , (
1
) , (
This filter also combines order statistic with averaging.

When d=0, we get the regular arithmetic mean filter, when
d (ab 1) / 2, we get the median filter.
This filter is useful when there is multiple types of noise (for
example: salt-and-pepper noise in addition to Gaussian noise).
This filter also combines order statistic with averaging.
Inverse Filter
The simplest approach to restoration is direct inverse filtering.
This is obtained as follows:
) , (
) , (
) , (
v u H
v u G
v u F
In the DFT domain, the inverse filter H
-1
is defined by
H
-1
(u,v) = 1/H(u,v)
Note there is a problem of division by zero
Because of G(u,v) H(u,v)F(u,v) + N(u,v)
) , (
) , (
) , ( ) , (
v u H
v u N
v u F v u F +
Where H(u,v): degradation function
Hence noise actually gets amplified at frequencies where H(u,v)
is zero or very small. In fact, the contribution from the noise term
dominates at these frequencies.
It is therefore, seldom used in practice, in the presence of noise.
Note that the inverse filter of a low-pass distortion is always a
high-pass variety
Wiener Filter
Also known as Minimum Mean Square Error Filter
Norbert Wiener introduced this filter
The DFT of the Wiener filter is given by:
2
*( , )
( , )
| ( , ) |
Wiener
H u v
H u v
H u v K
+
where K is a constant (designer has to experiment various of K)
Assumptions:
(1)Image and noise as random process
(2) Image and noise are uncorrelated
(3) Noise is a spectrally white noise and has zero mean (in
spatial domain)
Let us look at the details of K
2
2
( , )
| ( , ) | _
( , ) | ( , ) | _
f
S u v
N u v noise power
K
S u v F u v image power

For a spectrally white noise, noise power is constant
We dont know the power spectrum of undegraded image
So consider the ratio as constant, and tune different values to
see the results of filtering
If the noise power is zero, i.e., K=0,
Wiener filter becomes the inverse filter
1
( , )
( , )
Wiener Inverse
H u v H
H u v

Increase the value of K, if noise is higher
Degraded Image Inverse Filter Wiener Filter
More Noise
(Gaussian)
Less Noise
Comments
Wiener filter is the MMSE linear filter.
Linear filters blur edges
Wiener filter may be optimal, but it isnt always good.
Linear filters work poorly with non-Gaussian noise.
Nonlinear filters can be designed using the same method-
ologies.
(Images from Rafael C. Gonzalez and Richard E.
Wood, Digital Image Processing, 2
nd
Edition.
Inverse Filter
Inverse Filter
after we obtain H(u,v), we can estimate F(u,v) by the inverse filter:
) , (
) , (
) , (
) , (
) , (
) , (
v u H
v u N
v u F
v u H
v u G
v u F +
From degradation model:
) , ( ) , ( ) , ( ) , ( v u N v u H v u F v u G +
Noise is enhanced
when H(u,v) is small.
To avoid the side effect of enhancing
noise, we can apply this formulation
to freq. component (u,v) with in a
radius D
0
from the center of H(u,v).
In practical, the inverse filter is not
Popularly used.
Inverse Filter: Example
Inverse Filter: Example
2 / 2 2 2
) ( 2222 . 2
) , (
v u
e v u H
+
Original image
Blurred image
Due to Turbulence
Result of applying
the full filter
Result of applying
the filter with D
0
=70
Result of applying
the filter with D
0
=40
Result of applying
the filter with D
0
=85
nd
Edition.
Wiener Filter: Minimum Mean Square Error Filter
Wiener Filter: Minimum Mean Square Error Filter
Objective: optimize mean square error: { }
2 2
)
( f f E e
) , (
) , ( / ) , ( ) , (
) , (
) , (
1

) , (
) , ( / ) , ( ) , (
) , (

) , (
) , ( ) , ( ) , (
) , ( ) , (
) , (
2
2
2
*
2
*
v u G
v u S v u S v u H
v u H
v u H
v u G
v u S v u S v u H
v u H
v u G
v u S v u H v u S
v u S v u H
v u F
f
f
f
f
1
1
]
1
1
1
]
1
1
1
]
1
Wiener Filter Formula:

where
H(u,v) = Degradation function
S
(u,v) = Power spectrum of noise

S
f
(u,v) = Power spectrum of the undegraded image
Approximation of Wiener Filter
Approximation of Wiener Filter
) , (
) , ( / ) , ( ) , (
) , (
) , (
1
) , (
2
2
v u G
v u S v u S v u H
v u H
v u H
v u F
f
1
1
]
1
Wiener Filter Formula:

Approximated Formula:
) , (
) , (
) , (
) , (
1
) , (
2
2
v u G
K v u H
v u H
v u H
v u F
1
1
]
1
Difficult to estimate
Practically, K is chosen manually to obtained the best visual result!
Wiener Filter: Example
Wiener Filter: Example
Original image
Blurred image
Due to Turbulence
Result of the
full inverse filter
Result of the inverse
filter with D
0
=70
Result of the
full Wiener filter
Wiener Filter: Example (cont.)
Wiener Filter: Example (cont.)
Original image
Result of the inverse
filter with D
0
=70
Result of the
Wiener filter
Blurred image
Due to Turbulence
nd
Edition.
Example: Wiener Filter and Motion Blurring
Example: Wiener Filter and Motion Blurring
Image
degraded
by motion
blur +
AWGN
Result of the
inverse filter
Result of the
Wiener filter
2
=650
2
=325
2
=130
Note: K is
chosen
manually
Degradation model:
) , ( ) , ( ) , ( ) , ( y x y x h y x f y x g +
Written in a matrix form
Constrained Least Squares Filter
Constrained Least Squares Filter
Hf g +
Objective: to find the minimum of a criterion function
[ ]

1
0
1
0
2
2
) , (
M
x
N
y
y x f C
Subject to the constraint
2
2
f H g
) , (
) , ( ) , (
) , (
) , (
2 2
*
v u G
v u P v u H
v u H
v u F
1
1
]
1
We get a constrained least square filter

where
P(u,v) = Fourier transform of p(x,y) =
1
1
1
]
1
0 1 0
1 4 1
0 1 0
where w w w
T
2
Constrained Least Squares Filter: Example
) , (
) , ( ) , (
) , (
) , (
2 2
*
v u G
v u P v u H
v u H
v u F
1
1
]
1
Constrained least square filter

is adaptively adjusted to achieve the best result.
Results from the previous slide obtained
from the constrained least square filter
Constrained Least Squares Filter: Example (cont.)
Constrained Least Squares Filter: Example (cont.)
Image
degraded
by motion
blur +
AWGN
Result of the
Constrained
Least square
filter
Result of the
Wiener filter
2
=650
2
=325
2
=130
Constrained Least Squares Filter:Adjusting
Define f H g r
It can be shown that

2
) ( r r r
T

We want to adjust gamma so that
a t
2 2
r
where a = accuracy factor
1. Specify an initial value of
1. Compute
2. Stop if is satisfied
Otherwise return step 2 after increasing if
or decreasing if
Use the new value of to recompute
1
1
2
r
a <
2 2
r
a + >
2 2
r
) , (
) , ( ) , (
) , (
) , (
2 2
*
v u G
v u P v u H
v u H
v u F
1
1
]
1


(cont.)
(cont.)
) , (
) , ( ) , (
) , (
) , (
2 2
*
v u G
v u P v u H
v u H
v u F
1
1
]
1
) , (
) , ( ) , ( ) , ( v u F v u H v u G v u R
1
0
1
0
2
2
) , (
1
M
x
N
y
y x r
MN
r
[ ]

1
0
1
0
2
2
) , (
1
M
x
N
y
m y x
MN

1
0
1
0
) , (
1
M
x
N
y
y x
MN
m
[ ]

m MN
2
2
2
r
For computing
For computing
(Images from Rafael C.
Gonzalez and Richard E.
Wood, Digital Image
Processing, 2
nd
Edition.
Original image
Blurred image
Due to Turbulence
Results obtained from constrained least square filters
Use wrong noise
parameters
Correct parameters:
Initial = 10
-5
Correction factor = 10
-6
a = 0.25
2
= 10
-5
Wrong noise parameter
2
= 10
-2
Use correct noise
parameters
Image Compression
Lossless compression
Lossy compression
Objective: to reduce the amount of data required to
represent an image.
Images usually require an enormous amount of data to
represent:
Example: A standard 8.5 by 11 sheet of paper scanned at 100
samples per inch (dpi) and quantized to two gray levels (binary
image) would require more than 100k bytes to represent.
(8.5 X 100)(11 X 100)/8 = 116876 bytes
Image compression involves reducing the amount of data (bits)
required to represent a digital image. This is done by removing
redundant information.
Image compression is crucial for efficient storage, retrieval,
transmission, and manipulation of images.
More generally, data compression involves the efficient
representation of data in digital format.
Information theory --- a field pioneered by Claude E. Shannon
in the 1940s --- is the theoretical basis for most data compression
techniques.
Compression can be either lossless (information preserving) or
lossy.
Lossless compression: Images can be compressed and restored
without any loss of information
Lossy compression: Perfect recovery is not possible, but provides
a large data compression
Data compression refers to the process of reducing the amount of
data required to represent a given quantity of information.
Various amounts of data may be used to represent/describe the
same information. Some representations maybe less efficient in the
sense that they have data redundancy.
If n
1
and n
2
are the number of information carrying units (ex.
bits) in two datasets that represent the same information, the
relative redundancy R
D
of the first dataset is defined as
R
D
C
R
1
1
where
2
1
n
n
C
R
, is called the compression ratio.

For images, data redundancy can be of three types:
(2) Interpixel redundancy: This refers to the correlation
between adjacent pixels in an image.
(3) Psychovisual redundancy: This refers to the unequal
sensitivity of the human eye to different visual information.
(1) Coding redundancy: This refers to the binary codewords
used to represent gray values.
Coding Redundancy
Fewer number of bits to represent frequently occurring symbols.
Let p
r
(r
k
) = n
k
/n,
k = 0,1,2, . ., L-1; L is the number of gray levels.
Let r
k
be represented by l (r
k
) bits.
Therefore average number of bits required to represent each
pixel is
1
0
) ( ) (
L
k
k r k avg
r p r l L
Consider
1
0
) ( ) (
L
k
k r k avg
r p r l L
It makes sense to assign fewer bits to those r
k
for which
p
r
(r
k
) are large in order to reduce the sum.
This achieves data compression and results in a variable
length code.
Example:
10% less code
A fidelity criterion or error criterion is required to quantify the
loss of information (if any) due to a compression scheme.
Objective fidelity criteria are based on some quantitive function
of the original input image and the compressed and subsequently
decompressed image. (no perception)
Example: Root-mean-square (RMS) error, which is defined as
the square-root of the MSE.
) , ( ) , (
) , ( n m f n m f n m e
: ) , (
: ) , (
: ) , (
n m f
n m f
n m e Error image
Reconstructed image
Original image
[ ]
2 / 1
1
0
1
0
2
) , ( ) , (
'
M
m
N
n
rms
n m f n m f
MN
e
A related measure is the mean-square signal-to-noise ratio
[ ]
[ ]
1
0
1
0
2
1
0
1
0
2
) , ( ) , (
) , (
M
m
N
n
M
m
N
n
ms
n m f n m f
n m f
SNR
Another measure is Peak signal-to-noise ratio (PSNR)
When the ultimate image is to be viewed by a human,
subjective fidelity criterion may be more appropriate.
[ ]
1
0
1
0
2
2
) , ( ) , (
M
m
N
n
PSNR
n m f n m f
P
SNR
Where P is input signals maximum peak-to-peak value
Here, image quality is measured by subjective evaluations by a
human observer.
Ratings by a number of human observers, based on typical
decompressed images, are averaged to obtain this subjective
fidelity criterion.
Example of an absolute comparison scale:
Image Compression model
Source Encoder is used to remove redundancy in the input image.
Channel Encoder is used to introduce redundancy in a controlled
fashion to help combat noise. Example: Parity bit.
The Channel could be a communication link or a storage/retrieval
system.
Channel Decoder and Source Decoder invert the operations of the
corresponding encoder blocks.
We will mainly concentrate on the source encoder/decoder blocks
and not on the channel encoder/decoder steps.
Source Encoder and Decoder
Source encoder is responsible for reducing or eliminating any
coding, interpixel, or psychovisual redundancy.
The first block Mapper transforms the input data into a (usually
nonvisual) format, designed to reduce interpixel redundancy. This
block is reversible and may or may not reduce the amount of data.
Example: run-length encoding, image transform.
The Quantizer reduces accuracy of the mapper output in
accordance with some fidelity criterion. This block reduces
psychovisual redundancy and is usually not invertible.
The Symbol Encoder creates a fixed or variable length codeword
to represent the quantizer output and maps the output in accordance
with this code. This block is reversible and reduces coding
redundancy.
The decoder blocks are inverse operations of the corresponding
encoder blocks (except the quantizer block, which is not invertible).
Elements of Information Theory
What is information --- how to quantify it?
What is the minimum amount of data that is sufficient to
represent an image without loss of information?
What is theoretically the best compression possible?
What is the theoretically best possible transmission rate for
reliable communication over a noisy channel?
Information theory provides answers to these and other
related fundamental questions.
The fundamental premise of information theory is that the
generation of information can be modeled as a probabilistic
process.
A discrete source of information generates one of N possible
symbols from a source alphabet set A { a
0
, a
1
,, a
N-1
} , in unit
time.
Example: A {a,b c, ,z}, {0,1}, {0,1,2,..., 255 }
The source output can be modeled as a discrete random variable
E, which can take values in set A { a
0
, a
1
,, a
N-1
}, with
corresponding probabilities {p
0
, p
1
,, p
N-1
}
We will denote the symbol probabilities by the vector
z=[P(a
0
), P(a
1
), ,, P(a
N-1
)]

[p
0
, p
1
,, p
N-1
]
.
Naturally, 1
1
0
N
i
i
p
The information source is characterized by the pair (,z ).
Observing an occurrence (or realization) of the random variable
E results in some gain of information denoted by I(E). This gain
of information was defined to be (Shannon):
) ( log
) (
1
log ) ( E P
E P
E I
The base for the logarithm depends on the units for measuring
information. Usually, we use base 2, which gives the
information in units of binary digits or bits. Using a base 10
logarithm would give the entropy in the units of decimal digits.
The entropy H(z)of a source is defined as the average amount
of information gained by observing a single source symbol:
i
N
i
i
P P z H log ) (
1
0

By convention, in the above formula, we set 0log0 0.
The entropy of a source quantifies the randomness of a source.
Higher the source entropy, more the uncertainty associated
with a source output, and higher the information associated with
a source.
For a fixed number of source symbols, the entropy is
maximized if all the symbols are equally likely (recall uniform
histogram).
Example:
Symbol a
i
probability p
i
Information (in bits)
I(a
i
) = -log p
i
0
1
2
3
4
5
6
1/2
1/4
1/8
1/16
1/32
1/64
1/64
1
2
3
4
5
6
6
bits p p z H
i
i
i
96875 . 1 log ) (
6
0

Given that a source produces the above symbols with indicated

probabilities, how do we represent them using binary strings?
Symbol a
i
probability p
i
Binary string
(codeword)
0
1
2
3
4
5
6
1/2
1/4
1/8
1/16
1/32
1/64
1/64
000
Length of
codeword l
i
3
001
010
011
100
101
110
3
3
3
3
3
3
3
6
0

i
i i aver
l p L
Is this the best we can do (in terms of L
aver
)?
For a fixed length codeword scheme, yes.
How about if we employ a variable length scheme?
Idea: Since the symbols are not all equally likely, assign shorter
codewords to symbols with higher probability and longer
codewords to symbols with lower probability, such that the
average length is smaller.
Consider the following scheme
Symbol a
i
probability p
i
Binary string
(codeword)
0
1
2
3
4
5
6
1/2
1/4
1/8
1/16
1/32
1/64
1/64
0
Length of
codeword l
i
1
10
110
1110
11110
111110
111111
2
3
4
5
6
6
96875 . 1
6
0

i
i i aver
l p L
Notice that this is the same as the source entropy!
Shannons noiseless coding theorem
Let (A,z) be a discrete source with probability vector z and entropy
H(z). The average codeword length of any distortionless
(uniquely decodable) coding is bounded by
In other words, no codes exist that can losslessly represent the
source if
) (z H I
aver
) (z H I
aver
<
.
.
Note that Shannons theorem is quite general in that it refers to
any code, not a particular coding scheme.
Also, it does not specify a scheme to construct codes whose
average length satisfies , nor does it claim that a
code satisfying exists.
) (z H I
aver
) (z H I
aver
Error-free compression
Useful in application where no loss of information is tolerable. This
maybe due to accuracy requirements, legal requirements, or less than
perfect quality of original image.
Compression can be achieved by removing coding and/or
interpixel redundancy.
Typical compression ratios achievable by lossless techniques is
from 2 to 10.
Variable Length Coding
This is used to reduce coding redundancy.
Coding redundancy is present in any image with a non-uniform
histogram (i.e. when all the gray levels are not equally likely).
Given an image with, say 256 gray levels,
{a
0
,a
1
,.. ,a
255
} {0,1,...,255}. This is our set of source symbols.
For each gray level a
k
, we need its probability p(a
k
) in the
image. This may be obtained from the image histogram:
p(a
k
) = n
k
/n, n
k
= number of pixels with value a
k
. n=total
number of pixels.
To each gray level a
k
, we need to assign a codeword (a binary
string). Suppose l
k
is the length of codeword for symbol a
k
.
Total number of bits required to represent the image is
vaver
N
k
k k
N
k
k
k
N
k
k k
nL a p l n
n
n
l n n l
,
_
1
0
1
0
1
0
) (
Naturally, we need an encoding scheme with L
aver
as small as
possible. From Shannons theorem, we know that L
aver
(z).
Huffman Code
he Huffman procedure specifies a code with
(z) L
aver
< 1+(z).
The algorithm is best illustrated by means of an example.
Given a source which generates one of six possible symbols
A {a
1
, a
1
,.., a
6
} with corresponding probabilities
{0.1, 0.4, 0.06, 0.1, 0.04, 0.3 }.
Arrange the symbols in descending order of their probability of
occurrence.
Successively reduce the number of source symbols by replacing
the two symbols having least probability, with a compound
symbol. This way, the number of source symbols is reduced by one
at each stage.
The compound symbol is placed at an appropriate location in the
next stage, so that the probabilities are again in descending order.
Break ties using any arbitrary but consistent rule.
Code each reduced source starting with the smallest source and
working backwards.
Source Reduction:
Symbol a
i
a
2
a
6
a
1
a
4
a
3
a
5
Prob. p
i
0.4
0.3
0.1
0.1
0.06
0.04
Stage 1
0.4
0.3
0.1
0.1
0.1
Stage 2
0.4
0.3
0.2
0.1
Stage 3
0.4
0.3
0.3
Stage 4
0.6
0.4
Code Assignment:
Symbol a
i
a
2
a
6
a
1
a
4
a
3
a
5
Prob. p
i
0.4 1
0.3 00
0.1 011
0.1 0100
0.06 01010
0.04 01011
Stage 1
0.4 1
0.3 00
0.1 011
0.1 0100
0.1 0101
Stage 2
0.4 1
0.3 00
0.2 010
0.1 011
Stage 3
0.4 1
0.3 00
0.3 01
Stage 4
0.6 0
0.4 1
2 . 2 5 04 . 0 5 06 . 0 4 1 . 0 3 1 . 0 2 3 . 0 1 4 . 0
6
1
+ + + + +
x x x x x x l p L
i
i i aver
bits/symbol
14 . 2 log ) (
6
0

i
i
i
p p z H
bits/symbol
The resulting code is called a Huffman code. It has some
interesting properties:
(1)The source symbols can be encoded (and decoded) one at time.
(2) It is called a block code because each source symbol is
mapped into a fixed sequence of code symbols.
(3) It is instantaneous because each codeword in a string of code
symbols can be decoded without referencing succeeding symbols.
(4) It is uniquely decodable because any string of code symbols
can be decoded in only one way.
Example: Given the encoded string, 010100111100, it can be
decoded as follows:
010100111100 Symbol a
i
a
2
a
6
a
1
a
4
a
3
a
5
Prob. p
i
0.4 1
0.3 00
0.1 011
0.1 0100
0.06 01010
0.04 01011
010100111100
a
3
0111100
a
3
0111100
a
3
a
1
1100
a
3
a
1
1100
a
3
a
1
a
2
100
a
3
a
1
a
2
a
2
a
6

a
3
a
1
a
2
100
a
3
a
1
a
2
a
2
00
Disadvantage: For a source with J symbols, we need J-2 source
reductions. This can be computationally intensive for large J (ex.
256 J for an image with 256 gray levels).
Lempel-Ziv-Welch (LZW) coding
LZW coding is also an error free compression technique.
Uses a dictionary
Dictionary is adaptive to the data
Decoder constructs the matching dictionary based on
the codewords received.
used in GIF, TIFF and PDF file formats.
LZW encoder sequentially examines the images pixels, gray
level sequences that are not in the dictionary are placed in
algorithmically determined locations.
Consider an example: 4 by 4, an 8-bit image
1
1
1
1
]
1
126 126 39 39
126 126 39 39
126 126 39 39
126 126 39 39
The dictionary values 0-255
correspond to the pixel values
0-255. Assume a 512 word
dictionary formats
Dictionary Location Entry
0 0
1 1

255 255
256 256

511 511
The image is encoded by processing its pixels in a left-to-
right, top-to-down manner.
LZW Decoding
Just like the compression algorithm, it adds a new string to the
string table each time it reads in a new code. All it needs to do
in addition to that is translate each incoming code into a string
and send it to the output. Just like the compression algorithm, it
adds a new string to the string table each time it reads in a new
code. All it needs to do in addition to that is translate each
incoming code into a string and send it to the output.
Decompression
The companion algorithm for compression is the decompression
algorithm. It needs to be able to take the stream of codes output
from the compression algorithm, and use them to exactly recreate
the input stream. One reason for the efficiency of the LZW
algorithm is that it does not need to pass the sequence dictionary
to the decompression code.
The dictionary can be built exactly as it was during
compression, using the input stream as data.
Bit-plane Coding
A grayscale image is decomposed into a series of binary images
and each binary image is compressed by some binary compression
method.
This removes coding and interpixel redundancy.
Bit-plane decomposition:
Given a grayscale image with 2
m
gray levels, each gray value can
be represented by m-bits, say , (a
m-1
, a
m-2
,, a
1
, a
0
).
The gray value r represented by (a
m-1
, a
m-2
,, a
1
, a
0
) is given by
the base 2 polynomial
0
1
1
2
2
1
1
2 2 2 a a a a r
m
m
m
m
+ + + +

This bit representation can be used to decompose the gray
scale image into m binary images (bit-planes).
Alternatively, one can use the m-bit Gray code (g
m-1
, g
m-2
,,
g
1
, g
0
) to represent a given gray value.
The Gray code (g
m-1
, g
m-2
,, g
1
, g
0
) can be obtained from
(a
m-1
, a
m-2
,, a
1
, a
0
) by the following relationship:
where denotes exclusive OR of bits.
g
i
= a
i
a
i+1

g
m-1
= a
m-1
for 2 i m-2
The Gray code of successive gray levels differ at only one
position.
127 01111111 (binary code representation) 01000000 (Gray)
128 10000000 (binary code representation) 11000000 (Gray)
The resulting binary images are then compressed (error-free).
We will study a popular encoding scheme called run-length
encoding (RLC).
Runlength encoding
Each row of a bit plane (or binary image) is represented by a
sequence of lengths (integers) that denote the successive runs of
0 and 1 pixels.
Two approaches:
(1)Start position and lengths of runs of 1s for each row is used:
(1,3)(7,2) (12,4) (17,2) (20,3)
(5,13) (19,4)
(1,3) (17,6)
(2) Only lengths of runs, starting with the length of 1 run is used:
3,3,2,3,4,1,2,1,3
0,4,13,1,4
3,13,6
This technique is very effective in encoding binary images
with large contiguous black and white regions, which would give
rise to a small number of large runs of 1s and 0s.
The run-lengths can in turn be encoded using a variable
length code (ex. Huffman code), for further compression.
The concept of run-length can be extended to a variety of 2-D
coding procedures.
Let a
k
be the fraction of runs of 0s with length k. Naturally,
(a
1
, a
2
,, a
M
) would represent a vector of probabilities (the
probability of a run of 0s being of length k).
Let be the entropy associated with
(a
1
, a
2
,, a
M
) and be the average length of runs
of 0s.

M
i
i i
a a H
1
0
log
M
i
i
ia L
1
0
Let b
k
be the fraction of runs of 1s with length k. Naturally,
(b
1
, b
2
,, b
M
) would represent a vector of probabilities (the
probability of a run of 1s being of length k).
Let be the entropy associated with
(b
1
, b
2
,, b
M
) and be the average length of runs
of 1s.

M
i
i i
b b H
1
1
log
M
i
i
ib L
1
1
The approximate runlength entropy of the image is
run symbols L L
run bits H H
H
RL
/ ) (
/ ) (
1 0
1 0
+
+
H
RL
provides an estimate of the average number of bits per
pixel required to code the run lengths in a binary image, using a
variable-length code.
Lossless Predictive Coding
Does not require decomposition of grayscale image into
bitplanes.
Eliminate interpixel redundancy by extracting and coding
only the new information in each pixel.
New information in a pixel is the difference between its
actual and predicted (based on previous pixel values)
values.
Encoder
Decoder
n n n
f f e

1
]
1
m
i
i n i n
f round f
1
Example: 1-D first order linear predictor

In each row, a pixel value is predicted based on the value of the
pixel to its left.
[ ] ) 1 , ( ) , (
n m f round n m f
The resulting prediction error ) , (
) , ( ) , ( n m f n m f n m e
is encoded.
The first element of each row (i.e., first column of image) is
also encoded (using, for example, a different Huffman code).
Decoder reconstructs e (m, n) based on the codewords and
obtains the original pixel values using
) , (
) , ( ) , ( n m f n m e n m f +
Transform Coding
Approach: Image transform are able to concentrate the image in
a few transform coefficients.
Energy packing
a large number of coefficients can be discarded

compression.
Transform coding techniques operate on a reversible linear
transform coefficients of the image (ex. DCT, DFT, DWT etc.)
Input image is subdivided into subimages of size .
N N
n n
n n subimages are converted into transform arrays. This
tends to decorrelate pixel values and pack as much information as
possible in the smallest number of coefficients.
Quantizer selectively eliminates or coarsely quantizes the
coefficients with least information.
Symbol encoder uses a variable-length code to encode the
quantized coefficients.
Any of the above steps can be adapted to each subimage
(adaptive transform coding), based on local image information, or
fixed for all subimages.
Problem: block artifact: boundaries between subimages
become visible.
Discrete Cosine Transform (DCT)
Given a two-dimensional N N image f (m, n), its discrete
cosine transform (DCT) C(u,v) is defined as:
Similarly, the inverse discrete cosine transform (IDCT) is
given by
The more recent JPEG2000 standard uses wavelet transforms
instead of DCT.
The DCT is the most popular transform for image
compression algorithms like JPEG (still images), MPEG (motion
pictures).
The DCT is
(1)Separable (can perform 2-D transform in terms of 1-D
transform).
(2)Symmetric (the operations on the variables m, n are
identical)
(3)Forward and inverse transforms are identical
Blocking artifact less pronounced in DCT than in DFT
Transform Selection
Commonly used ones are Karhunen-Loeve (Hotelling)
transform (KLT), discrete cosine transform (DCT), discrete
Fourier transform (DFT), discrete Wavelet transform (DWT),
Walsh-Hadamard transform (WHT).
Choice depends on the computational resources available
and the reconstruction error that can be tolerated.
This step by itself is lossless and does not lead to compression.
The quantization of the resulting coefficients results in
compression.
The KLT is optimum in terms of packing the most information
for any given fixed number of coefficients.
However, the KLT is data dependent. Computing it requires
computing the correlation matrix of the image pixel values.
Subimage Size Selection
Images are subdivided into subimages of size to reduce
the correlation (redundancy) between adjacent subimages.
n n
Usually n =2
k
, for some integer k. This simplifies the
computation of the transforms (ex. FFT algorithm).
Typical block sizes used in practice are and . 16 16 8 8
Bit Allocation
After transforming each subimage, only a fraction of the
coefficients are retained. This can be done in two ways:
(1) Zonal coding: Transform coefficients with large variance
are retained. Same set of coefficients retained in all subimages.
(2)Threshold coding: Transform coefficients with large magnitude
in each subimage are retained. Different set of coefficients retained
in different subimages.
The retained coefficients are quantized and then encoded.
The overall process of truncating, quantizing, and coding the
transformed coefficients of the subimage is called bit-allocation.
Zonal Coding:
Transform coefficients with large variance carry most of the
information about the image. Hence a fraction of the coefficients
with the largest variance is retained.
(1) Compute the variance of each of the transform coefficients; use
the subimages to compute this.
(2) Keep X% of their coeff. which have maximum variance.
(3) Variable length coding (proportional to variance)
For each subimage,
Bit allocation: in general, let the number of bits allocated be
made proportional to the variance of the coefficients. Suppose
the total number of bits per block is B. Let the number of
retained coefficients be M. Let v(i) be variance of the i-th
coefficient. Then
+
M
i
i v
M
i v
M
B
i b
1
2 2
) ( log
2
1
) ( log
2
1
) (
The retained coefficients are then quantized and coded. Two
possible ways:
(1)The retained coefficients are normalized with respect to
their standard deviation and they are all allocated the same
number of bits. A uniform quantizer then used.
(2) A fixed number of bits is distributed among all the
coefficients (based on their relative importance). An optimal
quantizer such as a Lloyd-Max quantizer is designed for each
coefficient.
Zonal Mask & bit allocation: an example
Threshold Coding:
In each subimage, the transform coefficients of largest magnitude
contribute most significantly and are therefore retained.
For each subimage
(1)Arrange the transform coefficients in decreasing order of
magnitude .
(2) Keep only the top X% of the coefficients and discard rest.
(3) Encode the retained coefficient using variable length
code.
Threshold mask and zigzag ordering: example
The thresholding itself can be done in three different ways,
depending on the truncation criterion:
(1)A single global threshold is applied to all subimages. The
level of compression differs from image to image depending on
the number of coefficients that exceed the threshold.
(2) N-largest coding: The largest N coefficients are retained in
each subimage. Therefore, a different threshold is used for each
subimage. The resulting code rate (total # of bits required) is
fixed and known in advance.
(3)Threshold is varied as a function of the location of each
coefficient in the subimage. This results in a variable code rate
(compression ratio).
The thresholding and quantization steps can be together
represented as:
1
]
1
) , (
) , (
) , (
v u Z
v u T
round v u T
Original Transform coefficient
Normalization Factor
Thresholded and quantized value
(1) Z(u, v) is a transform normalization matrix. Typical example
is shown below.
(2)The values in the Z matrix weigh the transform coefficients
according to heuristically determined perceptual or psycho-
visual importance. Larger the value of Z(u, v), smaller the
importance of that coefficient.
(3)The Z matrix maybe scaled to obtain different levels of
compression.
(4)At the decoding end, is used to
denormalize the transform coefficients before inverse
transformation.
) , ( ) , (
) , (
~
v u Z v u T v u T
The JPEG standard
For the new JPEG-2000 check out the web site
www.jpeg.org.
JPEG is a lossy compression standard using DCT.
Activities started in 1986 and the ISO in 1992.
Four modes of operation: Sequential (baseline), hierarchical,
progressive, and lossless.
Arbitrary image sizes; DCT mode 8-12 bits/sample.
Luminance and chrominance channels are separately encoded.
We will only discuss the baseline method.
Reconstructed
image data
Table
specification
Table
specification
Compressed
image data
Entropy
encoder
Quantizer DCT
Source
image data
Table
specification
Table
specification
Compressed
image data
Entropy
encoder
Quantizer DCT
Block diagram of JPEG encoding and decoding
JPEG-baseline
DCT: The image is divided into 8x8 blocks. Each pixel is level
shifted by 2
n-1
where 2
n
is the maximum number of gray levels in
the image.
Thus for 8 bit images, you subtract 128. Then the 2-D DCT of
each block is computed. For the baseline system, the input and
output data precision is restricted to 8 bits and the DCT values
are restricted to 11 bits.
Quantization: the DCT coefficients are threshold coded using a
quantization matrix, and then reordered using zig-zag scanning to
form a 1-D sequence.
The non-zero AC coefficients are Huffman coded. The DC
coefficients of each block are DPCM coded relative to the DC
coefficient of the previous block.
JPEG baseline method
(1) Consider the 8x8 image (s)
(2) Level shifted (s-128)
(3) 2d-DCT
(4) After dividing by quantiization matrix qmat
(5) Zigzag scan as in threshold coding.
An 8x8 sub-image (s)
2D DCT (dcts) and the quantization matrix (qmat)
Division by qmat (dctshat)=dcts/qmat
Zig-zag scan of dcthat
Threshold coding -revisited
The coefficients along the zig-zag scan lines are mapped into
[run,level] where the level is the value of non-zero coefficient,
and run is the number of zero coeff. preceding it. The DC
coefficients are usually coded separately from the rest.
Zig-zag scanning of the coefficients
Zigzag scan as in threshold coding
[20, 5, -3, -1, -2,-3, 1, 1, -1, -1, 0, 0, 1, 2, 3, -2, 1, 1, 0, 0, 0, 0, 0,
0,1, 1, 0, 1, EOB].
(1) The DC coefficient is DPCM coded (difference between the
DC coefficient of the previous block and the current block.)
(2)The AC coef. are mapped to run-level pairs.
(0,5), (0,-3), (0, -1), (0,-2),(0,-3),(0,1),(0,1),(0,-1),(0,-1), (2,1),
(0,2), (0,3), (0,-2),(0,1),(0,1),(6,1),(0,1),(1,1),EOB.
(3)These are then Huffman coded (codes are specified in the
JPEG scheme.)
(4) The decoder follows an inverse sequence of operations. The
received coefficients are first multiplied by the same quantization
matrix. (recddctshat=dctshat.*qmat.)
(5) Compute the inverse 2-D dct. (recdsd=idct2(recddctshat); add
128 back.
Decoder
Recddcthat=dcthat*qmat
Inverse 2-D DCT
(recdsd=idct2(recddctshat);
Inverted 2-D DCT + 128 = Reconstructed Signal
(recdsd=idct2(recddctshat);
Reconstructed s
Reconstructed s Original s
Motion Estimation and Compensation
(Digital Video Compression)
Compressing digital video involves:
(1) Exploiting the spatial redundancy within a frame
(e.g., JPEG)
(2) Exploiting the temporal redundancy between frames
Simplest temporal coding: DPCM:
(1)Frame 0 (still image)
(2) Difference Frame 1=Frame 1- Frame 0
(3) Difference Frame 2=Frame 2- Frame 1, ect.
(3) If the video has no movement, all difference frames
are zero.
(4) If the video has movement, we can see it in the
difference image. If you can see something in the the
difference image and recognize it, there is still
correlation in the difference frame.
(5) Goal: remove the correlation by compensating for
the motion.
Causes of differences between frames:
(1) Global motion: camera motion (zoom, pan, tilt)
(2) Local motion: object motion (translation, rotation, zoom)
(3) Variations in illuination
(4) Sensor noise
The actual motions involved can be modeled in a variety of ways.
Object motions can be modeled as:
(1) Translation: move from
(2) Rotation: Rotate object by (r rads)
(3) Zoom: Move (in/out) from object to increase its size by t times
( , ) ( , ) x y x x y y + +
Motion estimation
(1) The goal of motion estimation techniques is not to provide a
careful analysis of the motion
(2) The goal is to achieve a given representation of the video
while globally minimizing the sum of two terms: motion
overhead information and prediction error information
(3) If the additional overhead for a complex motion model is
not counterbalanced by the gain due to more accurate motion
vectors, then the more complex models may in fact lead to a
globally lower performance.
General methods:
(1) Determine the parameters for motion description
(2) For some portion of the frame, estimate its movement between
2 frames current and reference frame.
(3) What is some ort?
-- individual pixels
-- lines/edges (have to find them first )
-- objects (must define them)
-- uniform region (just chop up the frame)
Block-based motion estimation
(1) Consider a block of pixels in the current frame
(2) Search for the best match in a rectangular segment of the
reference frame
Motion Vectors
A motion vector describes the offset between the block being
coded (in the current frame) and the location of the best-match
block in the reference frame.
Motion Vector search techniques
How to determine the best-match block in the search
window? MPEG do not specify it.
First we need an error measure:
(1) Mean squared error: select the block from the reference
frame B
ref
which minimizes
2
( )
ref curr
B B
MES P P
(2)Sum of the absolute value of differences (SAVD) or the

mean absolute error (MAR): select the bock which minimizes
ref curr
B B
SAVD P P
1
MAE SAVD
n
With an error measure, one can test blocks in the search window to
see which one is the best. Some common search strategies are:
(1)Full search: evaluate every position in the search window. Best
results, most computation
(2) Hierarchical: Downsample the image, and do motion estimation
on reduced-size image first.
(3)Logarithm search: search positions decrease by 2 each step.
Accuracy of motion vectors
-- Digital image are sampled on a grid. What if the actual motion
does not move in grid step?
-- Solution:interpolation of grid points in reference frames adds a
half-pel grid.
-- Reference frame now has effectively four times as many pixels,
four times as many positions for the best-match block to be found.
--The motion vectors have half-pel accuracy.
Motion compensation for chrominance
(1)Luminance is highly correlated, chrominance has been
downsampled and is less correlated
(2)Therefore, the best motion vectors are obtainable by performing
motion estimation on the luminance
(3)Motion vectors for chrominance are simply scaled, They are not
computed separately.
Motion estimation over several frames:
Suppose we want to perform motion
estimation on several frames in a row. The
reference frame 0 is coded very well-all
motion vectors should be determined with
respect it.
Telescoping search: frame 1 has a special search window of size W
x H. What should the search window be fro frame 2? 2W x 2H ?
--Search over 2W x 2H would take too long time.
--Solution: telescoping searching
--size of research window is constant.
-location of search window is offset by motion vector for block in
previous frame
Motion estimation
--forward predicted blocks: the best-match block occurs in the
reference frame before the current frame.
--Backward predicted blocks: the best-match block occurs in the
reference frame after the current frame
--Interpolatively predicted blocks: the best-match block is the
average of the best-match blocks from the surrounding frames.
Motion compensation
(1) The video coding standards do not specify HOW to find
motion vectors.
(2) The standards do specify
--The allowed sizes of the search window.
--The syntax for encoding the motion vectors for the decoder
--what the decoder will do with the motion vectors.
(3)What the decoder does is to grab the indicated block from the
reference frame, and glue it in place.
ENEE631 Digital
Image Processing
(Spring'04)
Lec18 Video Coding (1) [149]
MPEG Video Coding
U
M
C
P

E
N
E
E
6
3
1

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

2
0
0
4
)
ENEE631 Digital
Image Processing
(Spring'04)
About MPEG
MPEG Moving Pictures Experts Group
Coding of moving pictures and associated audio
Basic compression idea on the picture part
Can achieve compression ratio of about 50:1 through storing only

the differences between successive frames
Some claim higher compression ratio
Depends on how we calculate
Notice color is often downsampled, and interleaving

odd/even fields
Audio part
Compression of audio data at ratios ranging from 5:1 to 10:1
MP3 ~ MPEG-1 audio Layer-3

U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
ENEE631 Digital
Image Processing
(Spring'04)
Progressive vs. Interlaced scan
From B.Liu EE330
S01 Princeton
U
M
C
P

E
N
E
E
6
3
1

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

2
0
0
1
)
ENEE631 Digital
Image Processing
(Spring'04)
Compression Ratio
Raw video
24 bits/pixel x (720 x 480 pixels) x 30 fps = 249 Mbps
Potential cheating points => contributing ~4:1 inflation
Color components are actually downsampled
30 fps may refer to field rate in MPEG-2 ~ equiv. to 15 fps
( 8 x 720 x 480 + 16 x 720 x 480 / 4 ) x 15 fps = 62 Mbps

U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
ENEE631 Digital
Image Processing
(Spring'04)
MPEG Generations
MPEG-1 ~ 1-1.5Mbps (early 90s)
For compression of 320x240 full-motion video at rates around 1.15Mb/s
Applications: video storage (VCD)
MPEG-2 ~ 2-80Mbps (mid 90s)
For higher resolutions
Support interlaced video formats and a number of features for HDTV
Address scalable video coding
Also used in DVD
MPEG-4 ~ 9-40kbps (later 90s)
For very low bit rate video and audio coding
Applications: interactive multimedia and video telephony
MPEG-21 ~ ongoing
U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
ENEE631 Digital
Image Processing
(Spring'04)
MPEG Generations (contd)
Format Video
Parameters
Compressed
bit rate
SIF 352x240 at
30Hz
1.2-3 Mbps
MPEG-1
CCIR 601 720x486 at
30Hz
5-10 Mbps
EDTV 960x486 at
30Hz
7-15 Mbps
HDTV 1920x1080 at
30Hz
20-40 Mbps
MPEG-2
(From Ken Lam HK Poly Univ.
short course in summer2001)
U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
ENEE631 Digital
Image Processing
(Spring'04)
MPEG-1 Video Coding Standard
Standard only specifies decoders capabilities
Prefer simple decoding and not limit encoders complexity
Leave flexibility and competition in implementing encoder
Block-based hybrid coding (DCT + M.C.)
8x8 block size as basic coding unit
16x16 macroblock size for motion estimation/compensation
Group-of-Picture (GOP) structure with 3 types of frames
Intra coded
Forward-predictively coded
Bidirectional-predictively coded
U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
ENEE631 Digital
Image Processing
(Spring'04)
MPEG-1 Picture Types and Group-
of-Pictures
A Group-of-Picture (GOP) contains 3 types of frames (I/P/B)
Frame order
I
1
BBB P
1
BBB P
2
BBB I
2

Coding order
I
1
P
1
BBB P
2
BBB I
2
BBB
(From R.Liu Handbook Fig.3.13)
U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
ENEE631 Digital
Image Processing
(Spring'04)
Adaptive Predictive Coding in
MPEG-1
Half-pel M.V. search within +/-64 pel range
Use spatial differential coding on M.V. to remove M.V. spatial redundancy
Coding each block in P-frame
Predictive block using previous I/P frame as reference
Intra-block ~ encode without prediction
use this if prediction costs more bits than non-prediction
good for occluded area
can also avoid error propagation
Coding each block in B-frame
Intra-block ~ encode without prediction
Predictive block
Use previous I/P frame as reference (forward prediction),
Or use future I/P frame as reference (backward prediction),
Or use both for prediction and take average

U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
ENEE631 Digital
Image Processing
(Spring'04)
Coding of B-frame (contd)
Previous frame
Current frame
Future frame
A
B
C
B = A forward prediction
B = C backward prediction
or B = (A+C)/2 interpolation
one motion vector
two motion vectors
U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
(Fig. from Ken Lam HK Poly Univ. short
course in summer2001)
ENEE631 Digital
Image Processing
(Spring'04)
Quantization for I-frame (I-block) &
M.C. Residues
Quantizer for I-frame (I-block)
Different step size for different freq. band (similar to JPEG)
Default quantization table
Scale the table for different compression-quality
Quantizer for residues in predictive block
Noise-like residue
Similar variance in different freq. band
Assign same quantization step size for each freq. band

U
M
C
P

E
N
E
E
4
0
8
G

S
l
i
d
e
s

(
c
r
e
a
t
e
d

b
y

M
.
W
u

&

R
.
L
i
u

2
0
0
2
)
Revised from R.Liu Seminar Course 00 @ UMD

DIP7

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

DIP7

Hochgeladen von

Copyright:

Verfügbare Formate

Enhancement versus Restoration

Both processes try to improve an image in some predefined

This usually results in similar results as the arithmetic mean

Median filters are particularly suited for impulsive/salt &pepper

Properties of median filter:

It is used to reduce pepper noise and to find the bright spots in

It is used to reduce salt noise and to find the dark spots in an

This filter also combines order statistic with averaging.

Wiener Filter Formula:

(u,v) = Power spectrum of noise

Wiener Filter Formula:

We get a constrained least square filter

Constrained least square filter

It can be shown that

Constrained Least Squares Filter:Adjusting

, is called the compression ratio.

Given that a source produces the above symbols with indicated

Example: 1-D first order linear predictor

a large number of coefficients can be discarded

(2)Sum of the absolute value of differences (SAVD) or the

MPEG Moving Pictures Experts Group

Coding of moving pictures and associated audio

Basic compression idea on the picture part

Can achieve compression ratio of about 50:1 through storing only

Some claim higher compression ratio

Depends on how we calculate

Notice color is often downsampled, and interleaving

Compression of audio data at ratios ranging from 5:1 to 10:1

MP3 ~ MPEG-1 audio Layer-3

24 bits/pixel x (720 x 480 pixels) x 30 fps = 249 Mbps

Potential cheating points => contributing ~4:1 inflation

Color components are actually downsampled

30 fps may refer to field rate in MPEG-2 ~ equiv. to 15 fps

( 8 x 720 x 480 + 16 x 720 x 480 / 4 ) x 15 fps = 62 Mbps

MPEG-1 ~ 1-1.5Mbps (early 90s)

For compression of 320x240 full-motion video at rates around 1.15Mb/s

Applications: video storage (VCD)

MPEG-2 ~ 2-80Mbps (mid 90s)

For higher resolutions

Support interlaced video formats and a number of features for HDTV

Address scalable video coding

Also used in DVD

MPEG-4 ~ 9-40kbps (later 90s)

For very low bit rate video and audio coding

Applications: interactive multimedia and video telephony

Standard only specifies decoders capabilities

Prefer simple decoding and not limit encoders complexity

Leave flexibility and competition in implementing encoder

Block-based hybrid coding (DCT + M.C.)

8x8 block size as basic coding unit

16x16 macroblock size for motion estimation/compensation

Group-of-Picture (GOP) structure with 3 types of frames

A Group-of-Picture (GOP) contains 3 types of frames (I/P/B)

Half-pel M.V. search within +/-64 pel range

Use spatial differential coding on M.V. to remove M.V. spatial redundancy

Coding each block in P-frame

Predictive block using previous I/P frame as reference

Intra-block ~ encode without prediction

use this if prediction costs more bits than non-prediction

good for occluded area

can also avoid error propagation

Coding each block in B-frame

Intra-block ~ encode without prediction