Sie sind auf Seite 1von 34

Unit IV : Image Restoration

Two mark Questions


1. What is image restoration?
Restoration is a process of reconstructing or recovering an
image that has been degraded by using a priori knowledge of the
degradation phenomenon. Thus restoration techniques are
oriented towards modeling the degradation and applying the
inverse process in order to recover the original image.
Restoration attempts to reconstruct or recover an
image that has been degraded by using a clear knowledge
of the degrading phenomenon.
2. What is meant by unconstrained restoration?
In the absence of any knowledge about the noise n, a
meaningful criterion function is to seek an f^ such that H f^
approximates of in a least square sense by assuming the noise
term is as small as possible.
It is also known as least square error approach.
n = g-Hf
To estimate the original image f^, noise n has to be
minimized and
f^ = g/H
Where,
H = system operator.
f^ = estimated input image.
g = degraded image.
3. What is meant by constrained restoration?
It is also known as maximum square error approach n = g-Hf. To
estimate the original image f^, noise n has to be maximized and
f^ = g/H.
4. What is inverse filtering?
The simplest approach to restoration is direct inverse filtering,
an estimate F^(u,v) of thetransform of the original image simply
by dividing the transform of the degraded image G^(u,v) by the
degradation function.
F^ (u,v) = G^(u,v)/H(u,v)
Inverse filtering is the process of recovering the
input of the system from its output.
5. What is interactive restoration?
In general, iterative restoration refers to any technique that
attempts to minimize a function of the form
) (f M
using an
updating rule for the partially restored image.
6. What is a pattern?
Pattern is a quantitative or structural description of an object
or some other entity of interest in an image. It is formed by one
or more descriptors.
7. What is a pattern classifier?
It is a family of patterns that share some common
properties. Pattern classes are denoted as w1 ,w2 ,w3 , wM ,
where M is the number of classes.
8. What are optimal statistical classifiers?
In most of the fields measuring and interpreting of physical
events, probability considerations are dealt with and it has
become much important in pattern recognition because of the
randomness under which pattern classes normally are
generated. It is possible to derive a classification approach that is
optimal in the sense that, on average, its use yields the lowest
probability of committing classification errors.
9. Give the mathematical form of the Bayes decision function.
10. What are artificial neural networks?
11. What is a multilayer feedforward neural network?
It consists of layer of structurally identical computing nodes
(neurons) arranged so that the output of every neurons in one
layer feeds into the input of every neuron in next layer. The
number of neurons in the first layer, called A, is N
A.
Often, N
A
=n,
the dimensionality of the input pattern vectors. The number of
neurons in the output layer, called layer Q, is denoted N
Q.
The
number of N
Q
equals w, the number of pattern classes that the
neural networks has been trained to recognize. The network
recognize a pattern vector
X
as belonging to classes if the i
th
output of the network is high while all other outputs are low.
12. What is meant by training in artificial neural network?
Set weights to minimize Levenberg-Marquad Algorithm (multi-
dim steepest descent). Training is very expensive
computationally. If there are x input nodes, t output nodes, and
p hidden nodes, then the weights =
(x+t)p.
13. What is the concept behind algebraic approach to
restoration?
The concept of algebraic approach is to estimate the original
image which
minimizes a predefined criterion of performances.
14. What are the methods to estimating the degradation
function?
The three methods of degradation function are,
Observation
Experimentation
Mathematical modeling
15. How the blur is removed caused by uniform linear
motion?
An image f(x,y) undergoes planar motion in the x and y-
direction and x
0
(t) and y
0
(t) are the time varying components
of motion. The total exposure at any point of the recording
medium (digital memory) is obtained by integrating the
instantaneous exposure over the time interval during which
the imaging system shutter is open.
16. What is meant by least mean square filter?
The limitation of inverse and pseudo inverse filter is very
sensitive noise.
The wiener filtering is a method of restoring images in the
presence of blurr as well as noise.
17. Give the difference between Enhancement and
Restoration.
Enhancement technique is based primarily on the pleasing
aspects it might present to the viewer. For example: Contrast
Stretching. Where as Removal of image blur by applying a
deblurrings function is considered a restoration technique.
18. What are the properties of Linear Operator?
+ Additivity
+ Homogenity
19. What are the methods of algebraic approach?
+Unconstraint restoration approach
+Constraint restoration approach
20. What are the types of noise models?
Guassian noise
Rayleigh noise
Erlang noise
Exponential noise
Uniform noise
Impulse noise
21. What is meant by noise probability density function?
The spatial noise descriptor is the statistical behavior of gray
level values in the noise component of the model.
22. What is meant by blind image restoration?
Degradation may be difficult to measure or may be time varying in
an unpredictable manner. In such cases information about the
degradation must be extracted from the observed image either
explicitly or implicitly. This task is called blind image restoration.
23. What are the approaches for blind image restoration?
Direct measurement
Indirect estimation
24. What is blur impulse response and noise levels?
Blur impulse response: This parameter is measured by
isolating an image of a suspected object within a picture.
Noise levels: The noise of an observed image can be estimated
by measuring the image covariance over a region of constant
background luminance.
25. What is meant by indirect estimation?
Indirect estimation method employs temporal or spatial
averaging to either obtain a restoration or to obtain key
elements of an image restoration algorithm.
Twelve mark Questions
1. Explain degradation model for (i) continuous function (ii)
discrete formulation
Restoration attempts to reconstruct or recover an image
that has been degraded by using a priori knowledge of the
degradation phenomenon.
Restoration techniques are oriented toward modeling the
degradation and applying the inverse process in order to
recover the original image.
Degradation models:
Many types of degradation can be approximated by linear,
space invariant processes
+ Can take advantages of the mature techniques
developed for linear systems
Non-linear and space variant models are more accurate
+ Difficult to solve
+ Unsolvable
Estimating degradation function
Estimation by image observation
Degradation system H is completely characterized by
its impulse response
Select a small section from the degraded image
( , ) g x y
s
Reconstruct an unblurred image of the same size

( , ) f x y
s
The degradation function can be estimated by
( , )
( , )

( , )
s
s
s
G u v
H u v
F u v

By ignoring the noise term, G(u,v) = F(u,v)H(u,v). If F(u,v) is


the Fourier transform of point source (impulse), then G(u,v)
is approximates H(u,v).
Fig: A model of the image degradation / restoration
process
Continuous degradation model
Motion blur. It occurs when there is relative motion between
the object and the camera during exposure.

'

otherwise , 0
2 2
if ,
1
) (
L
i
L
L
i h
Atmospheric turbulence. It is due to random variations in the
reflective index of the medium between the object and the
imaging system and it occurs in the imaging of astronomical
objects.

,
_

+

2
2 2
2
exp ) , (

j i
K j i h
Uniform out of focus blur

'

otherwise , 0
if ,
1
) , (
2 2
R j i
R
j i h

Uniform 2-D blur
1
, if ,
2 2 2
( ) ( , )
0, otherwise
L L
i j
L h i j

'

Two dimensional discrete degradation model. Circular


convolution
Suppose we have a two-dimensional discrete signal
) , ( j i f
of size
B A samples which is due to a degradation process. The
degradation can now be modeled by a two dimensional discrete
impulse response
) , ( j i h
of size
D C
samples.
We form the extended versions of
) , ( j i f
and
) , ( j i h
, both of size
N M
, where
1 + C A M
and
1 + D B N
, and periodic with period
N M
. These can be denoted as
) , ( j i f
e and
) , ( j i h
e . For a space
invariant degradation process we obtain

+
1
0
1
0
) , ( ) , ( ) , ( ) , (
M
m
e e
N
n
e e
j i n n j m i h n m f j i y
Using matrix notation we can write the following form
n Hf y +
Where,
f and
y
are
MN
dimensional column vectors that
represent the lexicographic ordering of images
) , ( j i f
e and
) , ( j i h
e respectively.
1
1
1
1
]
1

0 2 M 1 M
2 0 1
1 1 M 0
H H H
H H H
H H H
.

.
.
H
1
1
1
1
]
1

) 0 , ( ) 2 , ( ) 1 , (
) 2 , ( ) 0 , ( ) 1 , (
) 1 , ( ) 1 , ( ) 0 , (
j h N j h N j h
j h j h j h
j h N j h j h
e e e
e e e
e e e
j
.

.
.
H
The analysis of the diagonalisation of H is a straightforward
extension of the one-dimensional case.
In that case we end up with the following set of
N M
scalar
problems.
)) , ( )( , ( ) , ( ) , ( v u N v u F v u MNH v u Y +
1 , , 1 , 0 , 1 , , 1 , 0 N v M u . .
2. Explain algebraic approach to (i) unconstrained restoration
and (ii) constrained restoration.
(i) unconstrained restoration
In the absence of any knowledge about the noise n, a
meaningful criterion function is to seek an f^ such that H f^
approximates of in a least square sense by assuming the noise
term is as small as possible.
It is also known as least square error approach.
n = g-Hf
To estimate the original image f^, noise n has to be
minimized and
f^ = g/H
Where,
H = system operator.
f^ = estimated input image.
g = degraded image.
(ii) constrained restoration
The set-based approach described previously can be generalized
so that any number of prior constraints can be imposed as long
as the constraint sets are closed convex.
If the constraint sets have a non-empty intersection, then a
solution that belongs to the intersection set can be found by the
method of POCS. Any solution in the intersection set is consistent
with the a priori constraints and therefore it is a feasible solution.
Let
m
Q Q Q , , ,
2 1
.
be closed convex sets in a finite dimensional
vector space, with
m
P P P , , ,
2 1
.
their respective projectors.
The iterative procedure
,
1 2
P P P
m

+
f f
k k 1
K
converges to a vector that belongs to the intersection of the sets
, 1, 2, , Q i m
i
K
, for any starting vector
0
f
. An iteration of the
form
1 2
P P
+
f f
k k 1
can be applied in the problem described
previously.
where we seek for an image which lies in the intersection of the
two ellipsoids defined by
2
2
{ | } Q E f y Hf
f|y
and
2
2
{ | } Q f Cf
f
The respective projections
f
1
P
and
f
2
P
are defined by
1
( )
1 1 1
P
_

,

+ +
T T
f f I H H H y Hf
1
[ ]
2 2 2
P
_

,

+
T T
f I I C C C C f
3. What is the Inverse Filtering method for restoration of
images? Explain.
Inverse filtering:
Degradation model
( , ) ( , ) ( , ) ( , )
( , ) ( , ) ( , ) ( , )
g x y f x y h x y x y
G u v F u v H u v N u v
+
+
This expression tells us that even if we know the
degradation function we cannot recover the undegraded
image exactly because of the random noise, whose Fourier
transform is not known. Image restoration is an ill-posed
problem. When H (u, v) is very small, the noise term
dominates the restoration result.
Inverse filter
( , ) ( , )

( , )
( , ) ( , )
( , )
( , )
( , )
G u v N u v
F u v
H u v H u v
N u v
F u v
H u v
+
+
Assume h is known (low-pass filter)
Inverse filter G(u,v) = 1 / H(u,v)
Problems with Inverse Filtering
H(u,v) = 0, for some u, v
In noisy case,
X(u,v) Y(u,v) G(u,v)
%
y x h n
n: additive noise
+
The simplest approach to restoration is direct inverse filtering,
where we compute an estimate, f(u,v), of the transform of the
original image simply by dividing the transform of the degraded
image, G(u,v), by the degradation function:



This is an interesting expression. It tells us that even if we know the
degradation function we cannot recover the undegraded image
exactly because N(u,v) is a random function whose Fourier
transform is not known.
There is more bad. News. If the degradation has zero or very small
values, then the ratio N(u,v)/H(u,v) could easily dominate the
estimate F(u,v). this in fact is frequently the case.
One approach to get around the zero or small value problem is to
limit the filter frequencies to values near the origin.

By this equation we known that H(0,0) is equal to the average value
of h(x,y) and that this is usually the highest value of H(u,v) in the
frequency domain. Thus by limiting the analysis to frequencies near
the origin, we reduce the probability of encountering zero values.
The objective is to minimize
2 2
( ) ( ) J f n f y Hf
We set the first derivative of the cost function equal to
zero
( )
0 2 ( )
J

f
T
H y Hf
f
0
If N M and
1
H

exists
then

-1
f H y
According to the previous analysis if H (and therefore
-1
H
) is block circulant the above problem can be solved as a
set of N M scalar problems as follows
( )
T -1 T
f H H H y
( , ) ( , ) ( , ) ( , )
1
( , ) ( , )
2 2
( , ) ( , )
H u v Y u v H u v Y u v
F u v f i j
H u v H u v
1
1
1
1
1
]


Computational issues concerning inverse filtering
(I) Suppose first that the additive noise
) , ( j i n
is negligible. A
problem arises if
) , ( v u H
becomes very small or zero for
some point
) , ( v u
or for a whole region in the
) , ( v u
plane.
In that region inverse filtering cannot be applied. Note that
in most real applications
) , ( v u H
drops off rapidly as a
function of distance from the origin. The solution is that if
these points are known they can be neglected in the
computation of
) , ( v u F
.
(II) In the presence of external noise we have that
( )
( , ) ( , ) ( , )

( , )
2
( , )
H u v Y u v N u v
F u v
H u v

( , ) ( , ) ( , ) ( , )
2 2
( , ) ( , )
H u v Y u v H u v N u v
H u v H u v

+
( , )

( , ) ( , )
( , )
N u v
F u v F u v
H u v
+
If
) , ( v u H
becomes very small, the term
) , ( v u N
dominates the
result. The solution is again to carry out the restoration process
in a limited neighborhood about the origin where
) , ( v u H
is not
very small. This procedure is called pseudoinverse filtering. In
that case we set
( , ) ( , )
( , )
2
( , )

( , )
0 ( , )
H u v Y u v
H u v T
H u v
F u v
H u v T

'

<
The threshold T is defined by the user. In general, the noise
may very well possess large components at high frequencies
) , ( v u
, while
) , ( v u H
and
) , ( v u Y
normally will be dominated by
low frequency components.
4. Discuss the Wiener Filter method for restoration of
images.
Wiener filter :
E It removes the additive noise and inverts the blurring
simultaneously.
E The Wiener filtering is optimal in terms of the mean square
error.
In most images, adjacent pixels are highly correlated, while
the gray level of widely separated pixels are only loosely
correlated.
Therefore, the autocorrelation function of typical images
generally decreases away from the origin.
Power spectrum of an image is the Fourier transform of its
autocorrelation function, therefore we can argue that the
power spectrum of an image generally decreases with
frequency.
Typical noise sources have either a flat power spectrum or
one that decreases with frequency more slowly than typical
image power spectrum.
Therefore, the expected situation is for the signal to
dominate the spectrum at low frequencies, while the noise
dominates the high frequencies.
O Where S
xx
(f
1
,f
2
), S

(f
1
,f
2
) are respectively power spectra of
the original image and the additive noise, and H(f1,f2) is the
blurring filter.
O Wiener filter has two separate parts, an inverse filtering part
and a noise smoothing part.
O It performs the deconvolution by inverse filtering (highpass
filtering) and removes the noise with a compression
operation (lowpass filtering).
Wiener Filter Formulation
Degradation model
( , ) ( , ) ( , ) ( , )
( , ) ( , ) ( , ) ( , )
g x y f x y h x y x y
G u v F u v H u v N u v
+
+
Wiener filter
2
( , )
1

( , ) ( , )
2 ( , )
( , ) ( , ) / ( , )
2
( , )
1
( , )
2 ( , )
( , )
H u v
F u v G u v
H u v
H u v P u v P u v
n
f
H u v
G u v
H u v
H u v K

+
Least Mean Square Filter :-
*
H (u,v)
G(u,v)
2
H(u,v) S (u,v)/S (u,v)
x n
1
]

+

In practice :-
*
H (u,v)
G(u,v)
2
H(u,v) K

+
Wiener filtering - problems
The power spectra of the undegraded image and noise must
be known.
Weights all errors equally regardless of their location in the
image, while the eye is considerably more tolerant of errors
in the dark areas and high-gradient areas in the image.
In minimizing the mean square error, Wiener filter also
smooth the image more than the eye would prefer.
Assumptions
o The original image and noise are statistically
independent
o The power spectral density of the original image and
noise are known
o Both the original image and noise are zero mean
Wiener Filtering: Special Cases
Balancing between two jobs for deblurring noisy image
HPF filter for de-blurring
(undo H distortion)
LPF for suppressing noise
1 2 wiener
*
1
( , )

uu
G
S
H
H S


+
Noiseless case ~ S = 0
Wiener filter becomes pseudo-inverse filter for S

0
*
1 2
1 2 1 2 0 0 2
1 2
1
, if |H( , )| 0
( , ) ( , )
| | /
0, if |H( , )| 0
S S
uv
H
H G
H S S


'
+

No-blur case ~ H = 1 (Wiener Smoothing Filter)


Zero-phase filter to attenuate noise according to SNR
at each freq.
1 2 1 2
1 2 wiener 1
1 2 1 2 1 2
( , ) ( , )
( , )
( , ) ( , ) ( , ) 1
uu SNR
H
uu SNR
S S
G
S S S


+ +
In practice, we often approx SNR as a constant to walk
around the need of estimating image p.s.d .

Note the phase response of Wiener is the same as
inverse filter 1 / H( ), i.e. not compensate phase
distortion due to noise
Comparisons:
1 2 wiener
*
1
( , )

uu
G
S
H
H S


+
1 2 1 2
1 2
1
( , ) for|H( , )|
( , )
choose G as
H



Fig : Wiener Filter Characteristics
5. What is constrained least squares restoration? Explain.
Only the mean and variance of the noise is required
The degradation model in vector-matrix form
1 1 1 MN MN MN MN MN
+

g H f
The objective function
The solution
*( , )

( , ) ( , )
2
( , ) ( , )
H u v
F u v G u v
H u v P u v

+
0 1 0
( , ) 1 4 1
0 1 0
p x y
1
1
1
1
1
1
]

In that case we seek for a solution that minimizes the


function
( ) M
2
f y Hf
A necessary condition for
) (f M
to have a minimum is that its
gradient with respect to f is equal to zero. This gradient is
given below
( )
( ) 2(
M
M

f
T T
f H y H Hf)
f f
And by using the steepest descent type of optimization we
can formulate an iterative rule as follows:

T
f H y
0
2
1 1
2
min [ ( , )]
0 0
2
2
M N
C f x y
x y
subject to




g Hf
( )
( ) ( )
M

+ +
+
f
T T T
k
f f f H y Hf H y I H H f
k k k k f
k
k 1
Constrained least squares iteration
In this method we attempt to solve the problem of
constrained restoration iteratively. As already mentioned the
following functional is minimized
2 2
( , ) M + f y Hf Cf
The necessary condition for a minimum is that the gradient of
) , ( f M
is equal to zero. That gradient is
( ) ( , ) 2[( ) ] M +
T T T
f f H H C C f H y
f
The initial estimate and the updating rule for obtaining the
restored image are now given by

T
f H y
0
[ ( ) ] + +
+
T T T
f f H y H H C C f
k k k 1
It can be proved that the above iteration (known as Iterative
CLS or Tikhonov-Miller Method) converges if
2
0
max

< <
where max

is the maximum eigenvalue of the matrix


( ) +
T T
H H C C
If the matrices H and
C
are block-circulant the iteration can
be implemented in the frequency domain.
6.What is meant by interactive restoration? Explain.
In general, iterative restoration refers to any
technique that attempts to minimize a function of the
form
) (f M
using an updating rule for the partially
restored image.
Motivation of iterative method
Wiener filter needs prior knowledge of power spectral
density of original image, which is often unavailable
The challenge is to estimate power spectral density of
original image from a single copy of degraded image
Rationale of iterative method
Use the restored image as an improved prototype of the
original image, estimate its power spectral density, and
construct Wiener filter iteratively.
Basic iterative algorithm
The degraded image is used as an initial estimate of
original image, and a restored image is attained from
the corresponding Wiener filter.
The restored image is used as an updated estimate of
the original image and leads to a new restoration.
The iterations continue until the estimate converges.
Additive iterative algorithm
It can be proved that in basic iterative algorithm the
estimate converges, but not to its true value.
Correction item is added in each iteration.
Implementation
Power spectral density is estimated using periodogram
Degradation model is designed to be a low pass filter (a
circulant matrix)
Iterative Procedure
They refer to a class of iterative procedures that
successively use the Wiener filtered signal as an improved
prototype to update the covariance estimates of the original
image as follows.
Step 0: Initial estimate of
R
ff
(0) { } E
T
R R yy
yy
ff
Step 1: Construct the
th
i restoration filter
( 1) ( ) ( ( ) ) i i i

+ +
T T
W R H HR H R
nn
ff ff
1
Step 2: Obtain the
th
) 1 ( + i
estimate of the restored image

( 1) ( 1) i i + + f W y
Step 3: Use
) 1 (

+ i f
to compute an improved estimate of
ff
R
given by

( 1) { ( 1) ( 1)} i E i i + + +
T
R f f
ff
Step 4: Increase
i
and repeat steps 1,2,3,4.
7. Explain (i) minimum distance classifier (ii) matching by
correlation.
i)Minimum distance classifier
suppose that we define the prototype of each pattern class to be
the mean vector of the patterns of the class;
(1)
Where Nj is the number of pattern vectors from class j and the
summations is taken over these vectors. As before, W is the
number of pattern classes. One way to determine the class
membership of an unknown pattern vector x is to assigh it to the
class of its closest prototype, as noted previously. Using the
Euclidean distance to determine closeness reduces the problem
to computing the distance measure:
.(2)
we then assign x to class j if Di(x) is the smallest distance.
That is, the smallest distance implies the best match in this
formulation. It is not difficult to show that selecting the smallest
distance is equivalent to evaluating the functions
(3)
And assigning x to class
i
if d
i
(x) yields the largest numerical
value. This formulation agrees with the concept of a decision
function, as define in Eq.1 .
For Equation 2 and 3 the decision boundary between classes i
and j for a minimum distance classifier is
..(4)
The surface given by eq.(4) is the perpendicular bisector of the
line segment joining m
i
and m
j
. For n=2 the perpendicular
bisector is a line, for n=3 it is a plane and for n=3 it is called a
hyper plane.
Minimum distance classifier works well when the distance
between means is large compared to the spread or randomness
of each class with respect to its mean. The minimum distance
classifier yields optimum performance when the distribution of
each class about its mean is in the form of a spherical hyper
cloud in n- dimensional pattern space.
The simultaneous occurrence of large mean
separations and relatively small class spread occur seldomly in
practice unless the system designer controls the nature of the
input. An excellent example is provided by systems designed to
read stylized character fonts, such as the familiar American
bankers association font character set.
Its shows, this particular font set consists of 14 characters that
were purposely designed on a 9 7 grid in order to facilitate their
reading. The characters usually are printed in ink that contains
finely ground magnetic material. Prior to being read, the ink is
subjected to a magnetic field, which accentuates each character
to simplify detection. In other words. The segmentation problem
is solved by artificially highlighting the key characteristics of
each character
ii) Matching by correlation
Let w(x,y) be a sub image of size JK within the image f(x,y) of
size MN, where we assume that JM and KN. although the
correlation approach can be expressed in vector form working
directly with an image or sub image format is more intuitive.
In its simplest form, the correlation between f(x,y) and
w(x,y)
(5)
For x= 0,1,2,.,M -1, y=0,1,2,,N-1 and the summation is taken
over the image region whre w and F overlap. Note by comparing
this equation with Equation it can be noted that its is implicitly
assumed that the functions are real quantities and that we left
out the MN constant. The reason is that we are going to use a
normalized function in which these constants cancel out and the
definition given in in equation 12.2.7 is used comomonly in
practice. We also used the symbols s and t in Eq.(5) to avoid
confusion with m and n
The above figure illustrates the procedure where we assume
that the origin of f is at its top left and the origin of w is at is
center. For one value of (x,y), say,(x0,y0) inside f, application of
Eq.(5) yields one value of c. As x and y are varied, w moves
around the image area, giving the function c(x,y). the maximum
value(s) of c indicates the position(s) where w best matches .
note that accuracy is lost for values of x and y near the edges of
f, with the amount of error being in the correlation proportional
to the size of w.
The correlation function given in Eq in (5) has the
disadvantage of being sensitive to changes in the amplitude of f
and w. for example doubling all the values of doubles the value
of c(x,y). an approach frequently used to overcome this difficulty
is to perform matching via the correlation coefficient, which is
defined as
Where x=0,1,2,..,M-1,y=0,1,2,.,N -1, w is the average value
of the pixels in w( computed only once), f is the average value
of f in the region coincident with the current location of w, and
the summations are taken over the coordinates common to both
f and w. the correlation coefficient (x,y) is scaled in the range
-1 to 1, independent of scale changes in the amplitude of and
.
Although the correlation function can be normalized for
amplitude changes through the correlation coefficient, obtaining
normalization for changes in size and rotation can be difficult.
Normalizing for changes in size and rotation can be difficult.
Normalizing for size involves spatial scaling, a process that in
itself adds a significant amount of computation. Normalizing for
rotation is even more difficult. If a clue regarding rotation can be
extracted from (x,y). However, if the nature of rotation is
unknown, looking for the best match requires exhaustive
rotations of (x,y). this procedure is impractical and as a
consequence, correlation seldom is used in cases when arbitrary
or unconstrained rotation is present.
The correlation also can be carried out in the frequency domain
through the FFT. If and are the same size, this approach can
be more efficient than direct implementation of correlation in the
spatial domain. Equation (5) is used when is much smaller
than .
8. What are (i) optimal statistical classifiers (ii) Bayes classifier
for Gaussian pattern classes? Explain.
i) Optimal statistical classifiers
In most of the fields measuring and interpreting of physical
events, probability considerations are dealt with and it has
become much important in pattern recognition because of the
randomness under which pattern classes normally are
generated. It is possible to derive a classification approach that is
optimal in the sense that, on average, its use yields the lowest
probability of committing classification errors.
Foundation:
The probability that a particular pattern x comes from class
wj, is denoted p(wj/x). if the pattern classifier decides that x
came from wj, it incurs a loss, denoted Lij. As pattern x may
belong to nay of W classes under consideration, the average loss
incurred in assigning x to class wj is
.(1)
This equation often is called the conditional average risk or loss
in decision theory terminology.
From basic probability theory, we know that p(A/B)
=[p(A)p(B/A)]/p(B).
Using this expression, we write equation 1 in the form,
..(2)
Where p(x/wk) is the probability density function of the patterns
from class wk and p(wk) is the probability of occurrence of class
wk. because 1/p(x) is positive and common to all the rj(x), j= 1,2,
,W, it can be dropped form eq.(2) without affecting the
relative order of theses function form the smallest to the largest
values. The expression for the average loss then reduces to
(3)
The classifier has W possible classes to choose from for any give
unknown pattern. If is computes r1(x), r2(x),rw(x) for each
pattern x and assigns the pattern to the class with the smallest
loss, the total average loss with respect to all decisions will be
minimum.
The classifier that minimizes the total average loss is called the
bayes classifier. Thus the bayes classifier assigns an unknown
pattern x to class wj if ri(x),rj(x) for j= 1,2,..,W; ji. in other
words x is assigned to class wi if
..(4)
For all j;j jI, the loss for a correct decision generally is
assigned a values of zero and the loss for any incorrect decision
usually is assigned the same nonzero value. Under these
conditions, the loss function becomes
.(5)
Where ij=1 if i=j and ij=0 if ij. equation 5 indicates a loss of
zero for correct decisions. Substituting equation 5 in equation in
3 yields

(6)
The bayes classifier then assigns a pattern x to class wi if all ij
.(7)
Or equivalently ,if
(8)
We see that the Bayes classifier for a 0-1 loss function is nothing
more that computation of decision function of the form.
(9)
Where a pattern vector x is assigned to the class whose decision
function yields the largest numerical value
The decision functions given in following eq
.(10)
It is optimal in the sense that they minimize the average loss in
misclassification. For this optimality to hold, however, the
probability density functions of the patterns in each class, as well
as the probability of occurrence of each class, must be known.
The latter requirement usually is not a problem. For instance, if
all classes are equally likely to occur, the P(w j)=1/M. even if this
condition is not true, these probabilities generally can be inferred
from knowledge of the problem. Estimation of the probability
density functions p(x/ w j) is a function of n variables, which, if its
form is not known, requires methods from multivariate
probability theory for its estimation.
These methods are difficult to apply in practice, especially if the
number of representative patterns from each class is not large or
if the underlying form of the probability density functions is not
well behaved. For these reasons, use of the bayes classifier
generally is based on the assumption of an analytic expression
for the various density functions and then as estimation of the
necessary parameters from sample patterns from each class. By
far the most prevalent form assumed for p(x/ w j) is the Gaussian
probability density function. The closer this assumption is to
reality, the closer the bayes classifier approaches the minimum
average loss in classification.
ii) Bayes classifier for Gaussian pattern classes
Let us consider a one dimensional problem (n=1) involving
two pattern classes (W=2) governed by Gaussian densities, with
means m1 and m2 and standard deviation of 1 and 2,
respectively. The bayes decision functions have the form
.(1)
Where the patterns are now scalars, denoted by x. figure shows
a plot of the probability density functions for the two classes
Fig : Probability density functions for 1-D pattern
classes.
The point x
0
shown is the decision boundary if the
two classes are equally likely to occur
The boundary between the two classes is a single point, denoted
x
0
such that d
1
(x
0
)= d
2
(x
0
). If the two classes are equally likely to
occur, the P(w
1
)=P(w
2
)= and the decision boundary is the
value of x
0
for which p(x
0
/w
1
)=p(x
0
/w
2
).
This point is the intersection of the two probability density
functions are shown in the figure. Any pattern to the right of x
0
is
classified as belonging to class w
2
. When the classes are not
equally likely to occur, x
0
moves to the left is class w
1
is more
likely to occur or, conversely, to the right if class w
2
is more likely
to occur. This result is to be expected, because the classifier is
trying to minimize the loss of misclassification. For instance, in
the extreme case, if class w
2
never occurs, the classifier would
never make a mistake by always assigning all patterns to class
w
1
.
In the n- dimensional case, the Gaussian density of the vectors in
the jth pattern class has the for
(2)
Where each density is specified completely by its mean vector
mj and covariance matrix cj, which are defines as
(3)
And
.(4)
Where Ej{.} denotes the expected value of the argument
over the patterns of class wj. In Eq (3) & (4), n is the
dimensionality of the pattern vectors, and |Cj| is the determinant
of the matrix Cj. Approximating the expected value Ej by average
value of the quantities in question yields and estimate of the
mean vector and covariance matrix :
. (5)
And
.. (6)
Where Nj is the number of pattern vectors from class wj
and the summation is taken over these vectors. Later in this
section we give an example of how to use these two expressions.
According to eq. (1) the bayes decision function of
class wj is dj(x)=p(x/wj)P(wj). However, because of the
exponential form of the Gaussian density, working with natural
logarithm of this decision function is more convenient. In other
words, we can use the form
.(7)
This expression is equivalent to Eq. (7) in terms of
classification performance because the logarithm is a
monotonically increasing function.
In other words, the numerical order of the decision
functions in Eq .(1) and (7) is the same. Substituting Eq (2) in
Eq. (7) yields
(8)
The term (n/2) in 2 is the same for all classes, so it can be
eliminated from Eq. (8) which then becomes
..(9)
For j=1,2,W. Equation (5) represents the Bayes decision
functions for Gaussian pattern classes under the condition of a 0-
1 loss function.
The decision functions. In Eq (9) are hyper quadrics because no
terms higher that the second degree in the components of x
appear in the equation. Clearly, then, the best that a bayes
classifier for Gaussian patterns can do is to place a general
second order decision surface between each pair of pattern
classes.
If the pattern populations are truly Gaussian, however, no other
surface would yield a lesser average loss in classification if all
convariance matices are equal, the Cj=C, for j=1,2,..,W. by
expanding Eq. (9) and dropping all terms independent of j we
obtain
.(10)
Which are linear decision functions for j=1,2,..,W
If in addition, C=I whre I is the identity matrix and also
(wj)=1/W, for j= 1,2,..,W, then
(11)
9.Explain perceptron for two pattern classes.
10. Explain multilayer feedforward networks in pattern
recognition.

Das könnte Ihnen auch gefallen