Beruflich Dokumente
Kultur Dokumente
A Thesis
Presented to
The Academic Faculty
by
Douglas Frank Britton
In Partial Fulllment
of the Requirements for the Degree
Doctor of Philosophy in the
School of Electrical and Computer Engineering
Approved by:
Dr. Russell Mersereau,
Committee Chair
School of Electrical and Computer
Engineering
Georgia Institute of Technology
Glory be to Almighty God who is the source of all knowledge, wisdom and
understanding.
Trust in the Lord with all your heart, and lean not on your own understanding, but in all your ways acknowledge Him, and He will make
straight your path. Proverbs 3:5-6
... the wisdom that comes from above leads us to be pure, friendly, gentle,
sensible, kind, helpful, genuine, and sincere. James 3:17
iii
ACKNOWLEDGEMENTS
Life is as much about the journey as it is the destination, and what makes the journey
interesting are the friendships and experiences that dot the path. So many people
have contributed in so many dierent ways to make my journey at Georgia Tech
what it has become today. Some have walked beside me the entire time, while others
cheered for a season while our paths ran parallel. Some have quietly been praying,
while others have joined my trip to form a caravan. Ive been picked up and carried
from time-to-time when I could not make it on my own, but throughout the trip I
am certain that I have never been alone. So I thank all of you who played a role,
both big and small, in making my journey more than just a trip but a real adventure.
And while my ships log may not list your name, you know the part you played, and
I want to say thank you from the bottom of my heart.
I would like to thank my advisors, Dr. Mark J. T. Smith and Dr. Russell
Mersereau, for their support and guidance throughout my tenure as a Ph.D student.
I have learned so much from them and consider it a privilege to have worked under
their guidance. I would also like to thank Dr. George Vachtsevanos, Dr. Paul
Benkeser, Dr. Bonnie Heck-Ferri, and Dr. Sheldon Jeter for serving on my
dissertation committee.
My parents, Skip and Joan Britton, have been a constant source of encouragement, support, and prayer throughout this entire process.
They instilled in me a
love of learning that has enabled me to achieve all that I have accomplished. Their
unconditional love and comfort has been called upon often, and they always freely
gave. But most of all, I want to thank them for leading me into a personal relationship with my Heavenly Father, and for teaching me His ways from a young age. I
iv
am forever grateful for the home and family that they provided, and I honor them
for being faithful to Gods calling. I want to thank all of my family who have been
diligent prayer partners and a constant source of encouragement and support.
A special debt of gratitude goes to Fran LaMattina, who came along side and
provided the extra spark it took for me to complete this thesis.
Her patience is
amazing, and without her steadfast coaching and advice, I would not have nished.
She is an incredible woman whom God has truly blessed with the gift of insight and
encouragement.
I am truly blessed to have friends such as these. One of my closest friends, Peter
Cardillo has been a part of this adventure from the beginning.
He is one of the
brightest and funniest people I know, and his encouragement and friendship has been
unwavering.
became very close friends during our time together at Georgia Tech.
consistently cheered me on and I am thankful for their friendship.
They have
I would never
have nished if my good friend Caroline Clower had not insisted on helping me
organize, plan, and schedule my research proposal. It was her condence in me that
enabled me to take that next step, and for that I am forever grateful.
GTRI has been a fantastic place to work, and I want to thank everyone in the
Food Processing Technology Division for all of their support throughout my Ph.D.
studies.
I want to say a special thank you to Wayne Daley, who took a chance
and supported me as a graduate student in the very beginning. Even though he was
ahead of me by a few years, our journeys have taken similar paths. He has shared a
lot with me, and I have learned a lot from him. But most of all, I am thankful for
his genuine friendship and the encouragement he has provided along the way.
To all of my extended family and friends, and to our church small groups who
have been great source of encouragement and faithful in praying I want to express
my deep appreciation.
A special thanks to all of the people I have met and worked with in CSIP throughout my time as a student: Cheol Park, Tami Randolph, Gerardo Jose Gonzales, Quoc Pham, Tim Brosnan, Sang Park, and Paul Hong.
Dr. Robert Bob Stephens planted this seed almost twenty years ago, and
while he is now with our Heavenly Father, I want to thank him for making a dierence
in at least one missionary kids life.
More than anyone else, I want to thank my wonderful wife Susan for patiently
standing beside me these past several years. It has not been easy, and yet she has
faithfully supported me in this endeavor.
provided the motivation I needed to continue, even when I was ready to quit. She
has been the bedrock of our family and a wonderful mother to our precious gift from
God, Kari Rose.
vi
TABLE OF CONTENTS
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xi
SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
xv
II
1.3 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
1.4 Analysis-by-Synthesis . . . . . . . . . . . . . . . . . . . . . . . . .
14
18
20
22
24
28
2.3.1
30
2.3.2
31
2.3.3
32
2.4 Applications
III
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
33
34
3.1 Introduction
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
3.1.1
35
3.1.2
40
3.1.3
43
3.1.4
44
vii
IV
3.2.1
47
3.2.2
48
3.2.3
51
3.2.4
56
61
64
3.4.1
Parameter Extraction . . . . . . . . . . . . . . . . . . . . .
64
3.4.2
Parameter Modication . . . . . . . . . . . . . . . . . . . .
66
67
3.5.1
Subjective Validation
. . . . . . . . . . . . . . . . . . . . .
67
3.5.2
72
3.5.3
72
77
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
80
81
83
4.3.1
84
4.3.2
85
4.3.3
86
4.3.4
87
88
4.4.1
88
4.4.2
89
4.4.3
91
4.4.4
93
96
viii
4.5.1
4.5.2
4.5.3
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
ix
LIST OF TABLES
1
Multiplication factors used to modify the analysis and synthesis Gaussians that resulted in the best performace for the SAR ATR application. 72
Classication results for the 22.5 degree MSTAR target chip data classied by the ATR process with and without resolution enhancement.
73
Resulting parameter and ENL values for the GGABS model when optimizing the ENL for each parameter independently. . . . . . . . . . .
77
Resulting parameter and ENL values for the GGABS model when using
the values from table 4 as the initial starting points for the optimization
over ENL. The rst row gives the baseline results, while the last row
shows the eect of the synthesis multiplier on the ENL value. . . . . .
78
The ENL results of processing the image region from gure 29 using
the dierent traditional despeckling lters with various window sizes.
78
86
97
4
5
6
7
8
68
The sensitivity matrix resulting from the neural network with the
twelve generalized Gaussian ABS model congurations as inputs, and
the corresponding benchmark complexity scores as outputs. The cumulative neural network sensitivity scores for each conguration is
given in the last column. The top ve congurations were chosen
as features for the overall clutter complexity measure. . . . . . . . . . 100
10
Confusion matrix of correlation coecients between the clutter complexity scores and LVQ ATR false alarm rates for ten partitions of
the COMANCHE FLIR image database when the clutter complexity
measure is trained over only one of the partitions [59]. . . . . . . . . . 109
11
Multiplication factors used to modify the analysis and synthesis Gaussians for the ultrasound application. . . . . . . . . . . . . . . . . . . . 119
LIST OF FIGURES
1
4
5
6
7
8
9
10
11
12
13
(a) Linear chirp signal x[n] = sin(2 1.9 105 )n2 shown with a
Hamming window N = 400 (b) Magnitude of the discrete short-time
Fourier transform using the Hamming window shown in (a). . . . . .
Joint time-frequency domain expansion of a signal into elementary Gabor functions with associated coecients, cnk . . . . . . . . . . . . . .
Gabor Filter (a) impulse response and (b) frequency response with
uo = 0.9, n1 = 5, and n2 = 10. . . . . . . . . . . . . . . . . . . . . .
11
13
15
17
25
26
27
28
Block diagram of the generalized Gaussian analysis-by-synthesis decomposition and synthesis technique. . . . . . . . . . . . . . . . . . .
30
xi
14
35
15
(a) The range geometry of a side looking imaging radar with points A
and B shown with a separation of Rg along the ground and Rs along
the slant. (b) The timing of illumination pulses and scattering returns
required to eliminate overlap between the two along range points A
and B. This species the slant resolution of the SAR. . . . . . . . . .
37
39
17
41
18
43
45
48
Weighting functions for the Lee, Kuan, and Modied Lee speckle lters
using Cu = 0.450 and Cmax = 0.705. . . . . . . . . . . . . . . . . . . .
51
(a) The original SAR chip image HB03333 of a BMP2 from the MSTAR
database. (b) Speckle reduction illustration using the Lee lter. (c)
Speckle reduction illustration using the Kuan lter. (d) Speckle reduction illustration using the Modied Lee lter and a 5 5 window. The
top left 32 32 pixel region of the original image was used to establish
Cu = 0.450 and Cmax = 0.705 for the modied Lee lter. . . . . . . .
52
(a) Subregion of the original SAR clutter image HB06211 from the
MSTAR database. (b) The speckle reduction results using the Gaussian MAP lter with a 7 7 window. (c) The speckle reduction results
using the Gamma MAP lter with a 7 7 window. The bottom
right 50 50 pixel region of the original image was used to establish a
Cu = 0.496, while Cmax = 0.558 for Gamma MAP lter. . . . . . . . .
55
24
63
25
69
16
19
20
21
22
23
xii
26
70
27
SAR clutter image processed using the Lee lter with a 5 5 window.
71
28
75
(a) Original SAR clutter image and (b) the region of this image used
to evaluate the ENL for the various traditional despeckling lters and
the GGABS model. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
76
84
(a) The original FLIR image, (b) the Gaussian density prole image
with the analysis constrained to nd target sized objects, and (c) the
Gaussian density prole image with the analysis unconstrained. . . .
90
(a) The original FLIR image, (b) the Gaussian density prole image
with the analysis performed globally, and (c) the Gaussian density
prole image with the analysis performed on independent sub-blocks
of the image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
92
(a) The original FLIR image, (b) the negative FLIR Image, (c) constrained Gaussian density prole image performed on original image,
and (d) the constrained Gaussian density prole image performed on
the negative image. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
94
95
99
29
30
31
32
33
34
35
36
xiii
37
Example of images from [60] used to compute statistically based complexity features: (a) Original FLIR image, (b) Standard deviation, (c)
Schmieder Weathersby, (d) FBm Hurst, (e) Target interference ratio,
(f) Energy, (g) Entropy, (h) Homogeneity, (i) Outlier. . . . . . . . . . 103
38
39
Screen shot of the CAU clutter complexity software interface package that shows the current conguration (input features, false alarm
threshold, and database), the false alarm vs. clutter complexity scatter
plot, the target chip, and a sample image with false alarm locations
identied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
40
107
41
42
xiv
SUMMARY
This thesis presents a new technique for performing image analysis, synthesis, and
modication using a generalized Gaussian model. The joint time-frequency characteristics of a generalized Gaussian are combined with the exibility of the analysisby-synthesis (ABS) decomposition technique to form the basis of the model.
The
good localization properties of the Gaussian make it an appealing basis function for
image analysis, while the ABS process provides a more exible representation with
enhanced functionality. ABS was rst explored in conjunction with sinusoidal modeling of speech and audio signals [42], [41]. A 2D extension of the ABS technique is
developed here to perform the image decomposition. This model forms the basis for
new approaches in image analysis and enhancement.
The major contribution is made in the resolution enhancement of images generated using coherent imaging modalities such as Synthetic Aperture Radar (SAR) and
ultrasound. The ABS generalized Gaussian model is used to decouple natural image
features from the speckle and facilitate independent control over feature characteristics and speckle granularity. This has the benecial eect of increasing the perceived
resolution and reducing the obtrusiveness of the speckle while preserving the edges
and the denition of the image features. A consequence of its inherent exibility, the
model does not preclude image processing applications for non-coherent image data.
This is illustrated by its application as a feature extraction tool for a FLIR imagery
complexity measure.
xv
CHAPTER I
this alternate space is a very powerful analytical tool as evidenced by the substantial
body of work involving Fourier analysis techniques. One key benet of the DFT is
that it is particularly well suited in the analysis of discrete signals and linear timeinvariant systems.
signal attributes at a particular instant in time or location in space and the associated transform coecients is lost during the transformation. This inherent mutually
exclusive analysis in either the time or frequency domains becomes a limitation in classical Fourier analysis. As a result it becomes less attractive for representing rapidly
varying signals, tracking local phenomena, or analyzing signals of short duration.
In many signal processing applications, such as speech and image processing, it
is precisely the spatial organization of the data that conveys the information in the
signal. As a result, transforms that do not preserve any spatial connection are not
very useful in the analysis and extraction of highly localized signal features.
The
ability to localize the analysis of the signal in both the spatial and frequency domains
can be a very powerful tool. In order to overcome this time-frequency localization
limitation, a number of alternative techniques have been developed to study the timedependent spectra of signals. These techniques form the basis of joint time-frequency
analysis and include among others, the short-time Fourier transform and the Gabor
transform [33], [18], [86].
1.1
One way to localize a transform spatially is to window the signal and then perform
the decomposition on the contents of each windowed region independently.
One
x[m]w[n m]ej(2/K)km ,
k = 0, ..., K 1,
K M.
(1)
m=
In one interpretation of the DSTFT, the output, Xn [k], can be viewed as the discrete
Fourier transform (DFT) of a section of the input signal, x[m], at index n as viewed
through the sliding analysis window w[n m] that typically has a nite length,
M. The window serves to limit the extent of the signal being transformed with the
expectation that the spectral characteristics are approximately stationary over its
duration. The DSTFT keeps track of the window location in time through the index
n.
This preserves the linear phase information, or delay, associated with the time
2
x[n]
400
1000
(a)
2000
1000
Time, n
(b)
2000
Frequency
0.1
Figure 1: (a) Linear chirp signal x[n] = sin(21.9105 )n2 shown with a Hamming
window N = 400 (b) Magnitude of the discrete short-time Fourier transform using
the Hamming window shown in (a).
oset of the window that would be lost by simply taking the DFT of the data within
the window.
m n M + 1 m,
(2)
where m spans the range of the original indices for all the associated signal segments.
Figure 1 shows a linear chirp signal with time-varying frequency content and
the magnitude plot of its short-time discrete Fourier transform using a window. The
diculty becomes choosing an appropriate window for the analysis of the entire signal.
A longer time-domain window has a narrower main lobe in the frequency domain,
and inversely, a shorter time-domain window has a wider main lobe in the frequency
domain. Since windowing a signal in the time-domain is equivalent to convolving the
window and signal in the frequency domain , it would be best to have a window with a
very narrow main lobe width (an impulse function is ideal) if good frequency domain
resolution is desired.
In the
case of the DSTFT, good time-domain localization (i.e. a short window) is usually
desired, which results in degraded frequency domain resolution. So once again there
is an inherent tradeo between localization in the time and frequency domains.
The Fourier transform of a rectangular window is a sinc function of the form
sin()/. While the sinc function has a narrow main lobe, it also has rather large
side lobe amplitudes.
leakage (or blurring) when convolved with closely spaced spectral peaks in the signal.
The amount of leakage distortion for a given window depends on the amplitude of
the side lobes relative to the main lobe, with lower side lobes resulting in less leakage.
The amplitude of these side lobes can be signicantly reduced by smoothly tapering
the window edges to zero in contrast to the abrupt transition of the rectangular
window. A number of windows exist that reduce the eect of the edge transitions
including the Kaiser, Hamming, and Hanning windows.
rectangular and Hamming windows and their Fourier transforms. While it is clear
from this plot that the Hamming window has lower side lobe amplitudes than the
rectangular window, it is achieved at the cost of having a wider main lobe [82], [32].
Even with design of special windows to reduce the eects of the side lobe distortion,
the inherent tradeo between time and frequency domain localization remains.
1.2
frequency relationship for one-dimensional (1D) signals, he began looking for a new
representation in which a continuous signal could be represented as a sum of elementary functions that are localized in both the time and frequency domains.
With
t and f representing the eective extent of the signal in the time and frequency
domains, respectively, Gabor posed the following question: What is the shape of the
signal for which the product tf actually assumes the smallest possible value . . . ?
20 log10|W[k]|
0
20
40
60
80
100
w[n]
1
0.5
0
10
20
30
40
50
256
512
(b)
768
1024
256
512
(d)
768
1024
256
512
(f)
768
1024
256
512
(h)
768
1024
(a)
20 log10|W[k]|
0
20
40
60
80
100
w[n]
1
0.5
0
10
20
30
40
50
(c)
20 log10|W[k]|
0
20
40
60
80
100
w[n]
1
0.5
0
10
20
30
40
50
(e)
20 log10|W[k]|
0
20
40
60
80
100
w[n]
1
0.5
0
10
20
30
40
50
(g)
[39]
Gabors pursuit for an answer to this question parallels the derivation of Heisenbergs uncertainty principle in the eld of quantum mechanics. Heisenbergs uncertainty principle simply states that the more precisely the position, x, of a subatomic
particle is known, the less precisely the momentum (mass times velocity), p, of the
particle is known at that instant, and vice versa [51]. This uncertainty relationship
is given by
1
xp h
2
(3)
where h is Plancks constant, and x and p are the position and momentum uncertainties respectively. Heisenberg went on to postulate that the uncertainty in the
measurement of the canonically conjugate position and momentum variables at a
specic point in time is not a function of the accuracy of the tools for measurement,
but an inherent limitation of quantum mechanics [15].
Similarly, Gabor recognized that time and frequency are also canonical conjugates
and cannot be precisely determined simultaneously. As an alternative he sought to
minimize both the spatial extent, t, and the frequency extent, f , of these elementary signals in the two-dimensional (2D) time-frequency domain. Following the
mathematical developments in quantum mechanics, Gabor derived the uncertainty
relationship
tf
1
2
(4)
between the time and frequency resolution of a signal. This function places a lower
bound on the compactness of a signal in both time and frequency.
He further
(t) = e
ej(2fo t+)
(5)
provides the optimal tradeo between eective spatial duration and frequency bandwidth, with the two being inversely related by equation (4).
6
The , t0 , f0 , and
|(t)|
2
(t t )
|(t)| = e
time, t
(a)
|(f)|
2
(/) (f f )
|(f)| = e
f0
frequency, f
(b)
(f ) = e( )
2
0)
ej(2to (f fo )+) ,
(6)
which is very similar in form to the original. The magnitude plots of the elementary
function and its Fourier transform given in gure 3 illustrate how both have Gaussian
shapes.
Using this Gaussian basis function, Gabor sought to develop a decomposition that
would permit the simultaneous description of a signal in both the time and frequency
t
(n+1)t
nt
cn+1,k-1
cn+1,k
cn-1,k+1
cn,k-1
cn,k
cn-1,k+1
cn-1,k-1
cn-1,k
cn-1,k+1
t
(n-1)t
1/(2t)
(k-1)f
f = 1/(2t)
kf
(k+1)f
(7)
n= k=
where g(t) is a window function and t and f are the time and frequency sampling
8
intervals (or translation and modulation parameters). While it has since been shown
that several other analysis functions can be used in the Gabor expansion [56], the
classical choice for the window function g(t) as rst proposed by Gabor is a normalized
Gaussian of the form
g(t) =
a 14
eat .
(8)
The parameter a dictates the balance of concentration between time and frequency
[18], [86]. As has been stated previously, the optimal compactness in the joint timefrequency domain is the reason for choosing the Gaussian.
Unfortunately, the set of elementary functions in the Gabor representation are not
generally orthogonal.
coecients is not a trivial task, which is further complicated by the desire to choose
the best set [86], [95] [39], [33], [18].
The expansion into logons [elementary functions] is, in general, a rather inconvenient
process, as the elementary signals are not orthogonal [39].
Even
so, the biorthognal function method only resulted in a unique solution under the
critically sampled case (when tf = 1/2) or when other analytical constraints were
imposed, and even then a solution was not guaranteed [95], [7], [86]. Needless-to-say,
the lack of a simple and ecient method for generating unique transform coecients
has hindered the broad acceptance of the Gabor expansion in signal processing.
The application of Gabor functions to image texture segmentation, extraction,
and analysis evolved from research pertaining to the visual cortex.
Daugman [23]
and Marcelja [77] successfully applied 2D Gabor functions in a model of the response
of simple cells of the visual cortex.
able to show that these functions closely represented the 2D receptive eld proles in
cortical simple cells in terms of spatial localization, orientation selectivity, frequency
selectivity, and quadrature phase relationship [24]. While the Gabor expansion and
functions do not yield a comprehensive model of the entire visual system with all of
its complexity, they certainly have served as useful tools in understanding biological
vision [33].
Based on the eectiveness of the Gabor expansion in describing the biological
attributes of mammal vision, Porat and Zeevi [85] proposed a Gabor based image
analysis tool for the nonuniform segmentation of images. Their goal was to describe
a decomposition that mimicked the behavior of the rst stages of the visual system.
The biological evidence indicated that some level of discrimination was being accomplished in a combined spatial-frequency space.
ment of a set of tunable Gaussian modulated lters, they laid the foundation of what
would become Gabor ltering.
The primary application of multi-channel Gabor ltering [11], [54], [53] has been
the analysis, segmentation, and extraction of local texture features from digital images. The canonical 2D spatial domain Gabor lter is given by
1 n1 2 n2 2
cos(2uon1 + )
+ 2
h(n1 , n2 ) = exp
2 n2 1
n2
(9)
(10)
where uo and are the frequency and phase of the sinusoidal plane wave along the
n1 -axis, and n1 and n2 are the spatial constraints on the Gaussian.
Examples
of a Gabor lter impulse and frequency response are shown in gure 5, where the
orientation of the Gabor lter is specied through the parameter in the rotation
equation (10).
The ability to tune the lters to extract local features at specic frequencies is very
10
0.5
80
|G( , )|
160
g(n1,n2)
0.5
1
50
0
80
160
50
50
0
n2
0
50
50
50
0
2
(a)
0
50
50
(b)
Figure 5: Gabor Filter (a) impulse response and (b) frequency response with uo =
0.9, n1 = 5, and n2 = 10.
attractive. Combinations of features extracted from a bank of Gabor ltered images
can be used to classify and segment various patterns in an image. Gabor lters have
proven particularly useful in analyzing both natural and articially textured images
due to the localized nature of the image information. The optimal joint localization
in both spatial and frequency domains makes the Gabor lter an excellent choice for
this type of analysis.
1.3
Wavelets
from the notion that this is a localized wave that has nite extent [92]. The concept
of the wavelet was rst mentioned in 1909 by Alfred Haar [48] in his doctoral thesis
which discussed new orthogonal decompositions. Since then a signicant amount of
work has been presented in the area of wavelet analysis by the likes of Levy, Weiss,
Coifman, Morlet, Mallat, Meyer, Vetterli, Daubechies, and Donoho [45], [22]. Like
many other decompositions, the wavelet transform seeks to represent a signal as the
11
weighted superposition of elementary functions, which in this case are called wavelets.
These wavelets are subject to the admissibility and regularity conditions. The rst
condition implies that a wavelet must generally have zero mean and therefore be
oscillatory in nature.
and somewhat smooth [93]. A whole family of basis wavelets can be generated by
translating and scaling a single prototype function known as the analyzing or mother
wavelet. It is this scaling that plays a special role in the wavelet decomposition and
leads to the notion of a time-scale representation (similar to time-frequency which
is associated with the Fourier and Gabor representations).
dictates the bandwidth and center frequency of the wavelet [93], [90].
The discrete wavelet (which is actually continuous in time) is given by
1
s,l (t) = s
ao
t
l bo
ao s
(11)
where the s and l parameters indicate the level of scaling and translation (location)
of the mother wavelet, (t). The variable bo is the translation step size and is usually
equal to 1. The scaling step size, ao , usually equals 2 resulting in dyadic sampling
along the frequency axis [3]. The Discrete Wavelet Transform (DWT) and its inverse
then become
1
t
s l x(t) dt
Wx [s, l] =
2
2s
t
1
Wx [s, l] s l .
x(t) =
2
2s
s= l=
(12)
(13)
The 2D discrete wavelet is typically generated by taking the outer dot product of two
1D discrete wavelets
s,l (n1 , n2 ) = s,l (n1 ) s,l (n2 ).
(14)
It is interesting to note that for the DWT representation, wavelet generation and
processing is essentially a variation of traditional lter bank based subband processing
theory that had been developed long before the introduction of the DWT [91], [22].
12
30
0.04
15
0
0.04
1
0
0.5
0.5
n2
(a)
0.5
0.5
n1
0.5
0
0.5
(b)
8
0.15
4
0
0.15
1
0
0.5
0.5
n2
(c)
0.5
0.5
n1
0.5
0
0.5
(d)
2
0.4
1
0
0.4
1
0
0.5
0.5
n
(e)
0.5
0.5
0.5
0
0.5
(f)
creasing the scaling parameter, s, reduces the spatial extent of the wavelet while the
frequency extent of the wavelet is indeed expanded. What has made wavelet analysis particularly interesting is the ability to generate sets of orthogonal wavelet basis
functions [21]. The combination of its localization properties and the discoveries of
13
fast, ecient, and nonredundant implementations has made the DWT a useful tool
for image analysis and ltering [75], [3], [90].
1.4
Analysis-by-Synthesis
Systems that perform analysis and synthesis of speech signals have been the focus of
a signicant body of research. The introduction of voice coders (vocoders) [29] led
to the development of parametric models based on the acoustic model of speech production. Further development in the area of parametric modeling led to the process
known as Analysis-By-Synthesis (ABS). It is a method that was developed to obtain
valid parameters for the speech production models. This was accomplished by iteratively optimizing some error function (for example between an original speech signal
and the synthetic representation) over the set of available model parameters [87]. It
was rst proposed and implemented by Halle and Stephens [49] as a procedure for
estimating the time-dependent Fourier representation of a speech signal by adjusting
parameters for a speech production model that included representations of the vocal
tract transfer function and the glottal waveform. The same ABS approach was used
by Bell et al. [8] in a more rened model of speech spectra, while Pinson [84] adapted
ABS for the time-domain analysis of speech formant frequencies and bandwidths.
One class of vocoders models the vocal tract characteristics and associated spectral shaping using linear predictive coding (LPC). It is assumed that the speech
signal can be classied as either voiced or unvoiced and that the pitch period for
the voiced portion is known. For voiced speech, the excitation for the LPC vocoder
is a quasi-periodic impulse train with delta functions at the pitch period intervals.
The excitation for the unvoiced case is simply white noise [37], [87], [35], [36], [50],
[27].
One of the diculties associated with this excitation model includes accu-
rately detecting the voiced/unvoiced state since it does not allow for mixed speech
segments.
14
SPEECH
LPC
SYNTHESIZER
V[n]
[n]
e[n]
SYNTHETIC
SPEECH
EXCITATION
GENERATOR
PULSE
AMPLITUDES
AND
LOCATIONS
PERCEPTUAL
WEIGHTING
ERROR
WEIGHTED
ERROR
ERROR
MINIMIZATION
Figure 7: Block diagram of the analysis-by-synthesis procedure for multi-pulse excitation of a low bit rate LPC speech coder [6].
Atal and Remde [6] proposed a multi-pulse excitation model that was independent
of the voiced/unvoiced state and without constraints on the pitch periodicity. Their
ABS procedure determined the locations and amplitudes of the excitation pulses
for an LPC synthesizer by considering the dierence between the synthetic speech
and the original.
15
is minimized at each step. The contributions of the previous pulse are included when
generating an updated error signal for determining the current pulse. It is this key
aspect of the procedure which denes the ABS approach.
ABS is a Gauss-Seidel type of successive approximation that has been successfully
applied in a variety of applications under several dierent names. Juang and Gray
[58] investigated this concept for data compression; the method used in the context
of compression is widely known as multiple stage vector quantization (VQ) or residual
VQ. The primary constraints associated with single stage VQ implementations [13]
are the large computation and storage requirements. These constraints grow exponentially with the number of quantization bits when using the full-search algorithm
and a single stage quantizer.
tistage approach where the subsequent quantization stages are designed around the
residual signal from the previous stage. The computational complexity and storage
requirements in the multistage approach are simply the sum of the individual stages.
As a result, adding more bits to the overall quantization by adding stages results in
a signicant computational and storage savings over a comparable single stage VQ
[58]. The signal reconstruction is accomplished by simply summing the vectors from
each stage. Figure 8 shows the process for a residual VQ system designed for speech
coding.
The ABS process was further investigated by George and Smith [42], [41] as an
eective tool for performing the decomposition of speech signals in conjunction with
sinusoidal modeling. They used an overlap-add (OLA) windowing technique to analyze short quasi-stationary segments of the original speech signal. The ABS process
successively extracted sinusoids from the original segments, where the sinusoidal amplitudes, frequencies, and phase terms were chosen to minimize residual error. This
was accomplished by performing a search along candidate frequencies for amplitude
and phase values that minimize the successive error. The extracted values are the
16
1st
CODE BOOK
INPUT
VECTORS x i
1st
QUANTIZER
q1
2nd
CODE BOOK
q1(x i) =
y j1
x i - y j1 =
e1
2nd
QUANTIZER
q2
q2(e1) =
z j2
INDEX j 2
INDEX j 1
Similar
The de-
compositions are accomplished by matching residues and nding the best waveforms
in the dictionary to characterize the remaining signal structure.
17
1.5
Objective Statement
In this thesis a new model is presented that represents an image as a sum of weighted
generalized Gaussian functions. The fundamental advantage of the Gaussian, compared to other traditional basis functions like sinusoids, is its time-frequency localization property. This allows the Gaussian to easily accommodate local variations and
features associated with a variety of image scenes and artifacts. The features that
separate this new image model from a traditional Gabor ltering representation are
the lack of modulating sinusoids and the addition of shape parameters associated with
the generalized Gaussian function. When adapted for a 2D implementation, the ABS
procedure provides a tractable method for performing the basis function extraction,
modication, and synthesis. The general formulation of the model does not assume
any a priori knowledge of the image. However, such knowledge can be exploited to
tune the model for a given application.
functions with the ABS decomposition technique, this model provides a powerful tool
for image analysis, modication, and synthesis.
The primary application of the generalized Gaussian ABS model is to perform
resolution enhancement on coherent imaging data such as Synthetic Aperture Radar
(SAR). This imaging modality suers from an artifact, known as speckle or speckle
noise, inherent in coherent imaging modalities. Speckle can contribute to poor edge
and feature denition, which in turn leads to lower perceived resolution. The ultimate
objective of any speckle reduction process is to allow one to resolve small objects in
the scene accurately. The true image resolution in the SAR application is largely
related to the sensor hardware and the signal bandwidth [80]. Ultrasound is another
coherent imaging modality that suers from similar speckle artifacts. In the case of
an ultrasound image, resolution is primarily a function of the acoustic signal frequency
and the absorption characteristics of the tissue being imaged [89]. While the SAR
application has been the primary application for this thesis, preliminary investigations
18
ABS procedure, and then combines the two in a discussion of the proposed model.
Chapter 3 explores the application of the model to the resolution enhancement task for
SAR images, and touches briey on some of the preliminary work done on ultrasound
data.
19
CHAPTER II
This chapter lays the foundation and basic structure of the generalized Gaussian
model that is the subject of this thesis.
developed that characterize the complex nature of image structures like edges and
textures.
pertaining to particular local image features is distributed across many of the basis
vectors. Processing these features involves identifying the feature information in each
basis vector and then manipulating this information appropriately to aect the desired
modication.
speckle noise in a coherent image has frequency components that are distributed
throughout the spectrum.
20
While this approach may work well for texture classication or edge and feature detection, it does not necessarily provide a means for straightforward modication or
enhancement of these local features.
In the generalized Gaussian image processing model presented here, local image
attributes are represented accurately in a way that permits meaningful modication
and enhancement. The adaptation of a redundant decomposition for such a model
is attractive because of the potential exibility it provides in representing local image
features and phenomena. This, however will only be possible if the basis functions
are also spatially localized.
As discussed in chapter 1, Gabor functions have been shown to have optimal
time-frequency localization properties [39], [18], [33], and as such would make ideal
candidates for this model. However, the sinusoidal modulation term limits the Gabor functions ability to represent highly localized image features spatially, which is
required to eect certain kinds of enhancement. Removing this sinusoidal modulation term from the Gabor function results in a Gaussian waveform.
Another way
of viewing this is to let the basis functions be Gabor functions with a modulation
frequency of zero. The result is a set of Gaussian functions that can be viewed as a
subset of the traditional Gabor functions.
The generalized Gaussian, (which we will introduce in section 2.2) of which the
standard Gaussian is a special case, further provides a mechanism for parameterizing
the waveform in terms of its shape. The result is a basis function with a considerable
amount of exibility.
21
image features and phenomena but also providing a means for modication and enhancement [12].
2.1
Analysis-by-Synthesis Decomposition
Linear transforms that represent a signal as a weighted sum of basis vectors have a
long history of development in a wide variety of applications. As an example, the
discrete Fourier transform discussed in Chapter 1 represents a time/spatial domain
signal as a set of weighted orthogonal complex exponentials.
Linear expansions
of this nature provide the mechanisms for generating closed-form solutions that are
manageable and ecient.
solutions are very computationally burdensome, and they do not aord the exibility
in the signal decomposition necessary to achieve the application objective.
For
example, the information value of an image is often dependent on the level of local or
low-level detail such as edges, texture and structure. While an image is completely
characterized by a decomposition over a basis, it is often the case that these lowlevel image features are distributed across many of the basis elements. This makes
analysis and modication by means of linear transforms very dicult and cumbersome
to implement.
perform the signal decomposition that provides the exibility necessary to extract,
analyze, and modify those particular features of interest in the image.
The ABS
procedure provides this level of versatility by providing a framework for the analysis,
modication and synthesis of a signal regardless of the elementary functions used,
which in this model are generalized Gaussians.
The presentation of the ABS method follows closely the development of George
[41] by providing a concise description of the ABS algorithm using nite-dimensional
vector theory. Consider the real p-dimensional vector
x = (x1 , x2 , ..., xp ) .
22
(15)
J
x
j .
(16)
j=1
Assuming that the rst l 1 components have already been obtained, the approximation vector becomes
x
l1 =
l1
x
j
(17)
j=1
l1
x
j .
(18)
j=1
(19)
l ,
el = el1 x
(20)
p
i=1
= el1 x
l
{e2i }l
(21)
recalculated over an ensemble of values for the nonlinear variable to nd the nal
value that minimizes the overall error.
The ABS process converges in a mean-square sense, because from equation (21)
the minimum value of El according to the Pythagorean theorem can be given by
El = el1 2
xl 2 = El1
xl 2 .
(22)
This indicates that the approximation error at each iteration is less than the previous
one as long as
xl 2 > 0. By trying to minimize El we are actually modeling the
residual error of the approximation after l 1 iterations in terms of the component
vectors x
l . While most of the development of the ABS process has been associated
with speech and audio signal processing, it will be shown as a very suitable technique
for performing image decompositions as well.
within which to perform the analysis and synthesis necessary to implement localized
image feature modeling and enhancement.
2.2
2D Generalized Gaussians
(23)
where , typically recognized as the mean, is the center location that corresponds
to the peak of the waveform, and is the standard deviation of the waveform that
dictates the lobe width. While the Gaussian function is most commonly recognized
as the probability density function of a normal distribution in statistics, we are interested in the Gaussian function as a waveform for signal representation. As such the
constant area constraint associated with the statistical form of the Gaussian function
is eliminated by removing the
1
2
24
g(x)
=3
=2
=1
where
|x |
,
g(x) = exp
(24)
12
1
(25)
and () is the gamma function, provides the ability to change the shape through the
additional parameter, .
that it controls the exponential rate of decay (or rollo) of the function as illustrated
in gure 9 for three distinct values of .
and (24) that the traditional Gaussian function is a special case of the generalized
Gaussian function with a shape parameter of = 2.
Similar to the 1D generalized Gaussian function, the 2D function is characterized
by a center peak location, (1 , 2 ), a pair of lobe width parameters, 1 and 2 , that
dictate the spatial extent of the Gaussian along the n1 and n2 axes, and , the
25
1
g[n ,n ]
1
n2
Figure 10: A 2D generalized Gaussian function with a contour plot detailing the
center location, lobe widths, and rotation angle.
associated shape parameter.
2
2
r1 [n1 , n2 ] + r2 [n1 , n2 ]
g [n1 , n2 ] = exp
()
(26)
where
1 2
() =
,
2
2
(1 cos ) + (2 sin )
arctan n2 , n1 = 0
n1
,
=
,
n1 = 0
2
(27)
(28)
where is given in equation (25). Equations (27) and (28) provide for a circularly
smooth transition between the two lobe width values, 1 and 2 . The spatial translation and rotation of the 2D Gaussian in equation (26) is accomplished through the
functions
r1 [n1 , n2 ] =
(29)
26
(a)
(b)
(c)
The shape
(30)
(31)
where i = 1, 2 and identies the axis of the 1D function and is given in equation
(25). This representation is attractive because it is separable and enables independent
consideration along each spatial dimension of the 2D function.
If computational
constraints are an issue, this could save valuable time in the analysis and synthesis
of such functions.
functions with major and minor axes that can lie only in the direction of the image
coordinate axes, n1 and n2 .
(a)
(b)
(c)
as the shape parameters of the associated 1D functions deviate from the standard
Gaussian. This is illustrated in gure 12.
2.3
The Model
The generalized Gaussian ABS model combines the spatial localization and parametric control properties of the generalized Gaussian function with the tractability and
exibility of the ABS procedure. The generalized Gaussians serve as the component
functions for the ABS engine which performs the image decomposition, parameter
modication, and synthesis.
ABS procedure is unique in that it enables decoupled parameter specications between the analysis and synthesis stages. We seek to approximate a discrete image
i[n1 , n2 ] as a nite sum of weighted generalized Gaussian functions, gk [n1 , n2 ], as
[n1 , n2 ] =
N
ak gk [n1 , n2 ] ,
(32)
k=1
where [n1 , n2 ] is the reconstructed image, and ak represents the associated weights.
During the analysis a search is conducted to nd the Gaussian function and corresponding parameters that best approximate the residual image or match the model of
some feature in that image. In SAR and ultrasound applications, we wish to model
coherent imaging speckle, whereas in FLIR complexity assessment applications we
28
mining the best image components (or generalized Gaussian functions) is inherently
application dependent.
Once the best generalized Gaussian image component has been identied, it becomes the extracted Gaussian function, gl [n1 , n2 ], of the lth iteration. It is interesting
to note that the standard implementation of the ABS algorithm would require that
this extracted function be subtracted from the previous residual to form the new
residual. However, it might be to our advantage in terms of our image processing objectives to modify the extracted function before performing the subtraction. This new
potentially modied version of the extracted function becomes the analysis Gaussian,
which is used to form the residual or error function
gl [n1 , n2 ]} ,
el [n1 , n2 ] = el1 [n1 , n2 ] A {
(33)
called the synthesis Gaussians, are then accumulated and summed to form the new
processed output image
[n1 , n2 ] =
N
S {
gk [n1 , n2 ]} ,
(34)
k=1
functions, A {} and S {}, in the ABS process provides signicant modeling and
enhancement exibility that will be discussed in chapter 3. A block diagram of the
generalized Gaussian ABS process is shown in gure 13.
This process is repeated recursively until some stop criterion is met.
If the
analysis and synthesis functions are exact replicas of the extracted function (i.e. no
29
i[n1 , n2 ] +
ek [n1 , n2 ]
EXTRACT
BEST
GAUSSIAN
MODIFY
ANALYSIS
GAUSSIAN
MODIFY
SYNTHESIS
GAUSSIAN
ACCUMULATE
GAUSSIANS
ACCUMULATE
GAUSSIANS
~
ik [n1 , n2 ]
Figure 13: Block diagram of the generalized Gaussian analysis-by-synthesis decomposition and synthesis technique.
As was just presented, one unique aspect of the ABS process is that it enables coupled and decoupled modes of operation in terms of the analysis and synthesis function
specications/modications. In the coupled operational mode of generalized Gaussian ABS, the modication processes are identical in both the analysis and synthesis
stages
A {} = S {} .
(35)
This reduces computation as the extracted functions are only modied once and
then used in both the analysis and synthesis stages.
In the decoupled mode the nature of the modications to the extracted Gaussian
functions performed in the analysis and synthesis stages is quite dierent. As a result
the analysis and synthesis Gaussian functions are not the same. While this requires
more overhead in terms of computation, the decoupled operational mode is the key
30
attribute of the model that provides the additional exibility needed to perform useful
image enhancement.
2.3.2
As was presented in the ABS section previously, one approach to identifying suitable
analysis components is to nd some closed-form solution to the optimization problem.
An alternative is to search the entire generalized Gaussian parameter space for a
function that minimizes the global residual error.
representations, they do not exploit any a priori knowledge we may have of the image
content that we seek to model.
Instead, an approach is pursued whereby a search is conducted for a set of suitable
Gaussian parameters that best model a particular image feature. The search can be
conducted across the entire image or in a localized region, depending on the application and the type of image features being modeled. The amplitude, lobe widths,
shape factors, and spatial location parameters are extracted for each 2D generalized
Gaussian function in the decomposition.
these parameters can be reduced to those values that yield meaningful representations
of the image features being modeled. This signicantly reduces the parameter search
space while maximizing the probability of obtaining a set of reasonable parameter
values.
Once a reasonable range of expected values has been established, the general
approach for estimating the individual parameters is as follows. First, a candidate
feature location and amplitude is identied using a priori knowledge of the image
features being sought.
or have low image intensity values then a peak-picking approach on the negative
image might be more suitable. Once the amplitude and central spatial location of
31
a feature are established, the search for suitable lobe width and shape factor values
can be conducted by searching radially outwards from this position. Once again, the
range of values for these lobe width and shape parameters can be limited by a priori
knowledge of the image feature. A less complicated optimization can now be done
for parameter values within these reduced ranges that minimize the error between the
generalized Gaussian representation of the feature and the feature itself. However,
even this can potentially be simplied even further depending on the nature of the
image features being modeled.
in shape, then use of the separable generalized Gaussian function could reduce the
search to two independent optimizations of only two variables (one lobe width and one
shape factor) along two perpendicular axes. This further reduces the computational
complexity of the search. Once a complete set of parameters has been determined,
the associated generalized Gaussian can be extracted from the original signal. The
process is then repeated resulting in a successive approximation based model.
2.3.3
The recursive nature of the ABS process requires a mechanism to stop the successive
approximation process.
as this, tend to be image scene and feature dependent, and it is dicult to choose
an appropriate threshold for all possible image scene scenarios. An error or energy
based stop condition could result in incomplete or inadequate image decompositions.
Another approach is to specify a Gaussian density requirement that stops the ABS
process once a specied number of Gaussians has been used in the decomposition.
The number of Gaussians can be set as a percentage of the image size or simply xed
to a set amount.
32
The peak-picking generalized Gaussian search method can end up causing the
model to neglect dark and low contrast regions of the image regardless of which stop
criterion is used. One solution to this is to apply the model in a non-overlapping block
processing mode. In this implementation the image is segmented into smaller (and
usually more similar) regions, and the generalized Gaussian ABS model is applied
to each region independently. This provides a more uniform spatial distribution of
Gaussians across the image, and it tends to reduce the amount of amplitude variation
presented to the search routine which performs the parameter estimation.
2.4
Applications
33
CHAPTER III
3.1
Introduction
Synthetic aperture radar (SAR) has, over the past few years, proven to be an invaluable tool in remote sensing applications such as hydrology, vegetation mapping
and monitoring, reconnaissance, geology, glaciology, and tectonic activity assessment
[80]. Image data of the earth can be collected using both airborne and spaceborne
SAR platforms. While earth and atmospheric science applications tend to favor the
spaceborne systems, most military applications rely on airborne platforms with rapid
deployment capabilities. The military reconnaissance objectives consist of the detection and recognition of various vehicles or targets within a scene. Human analysts
and automatic target recognition (ATR) systems that perform these tasks require
SAR image resolutions of better than 1m in order to extract meaningful information
from the image data [80].
elevation
L
w
h
ut h
im
z
a
range
r
Rs
Sa
Rg
V = velocity
L = antenna length
W = antenna width
h = antenna elevation
= look angle
r,a = range & azimuth
beam width
Sr,a = range & azimuth
swath width
R s,g = slant & ground
range
Sr
In synthetic aperture radar (SAR) imaging, very narrow beamwidth angle microwave
pulses are used to illuminate an object surface, while the reected or backscattered
microwave energy is measured between the pulses.
tablish the position and contrast of the various targets on the object surface. The
general geometry for an imaging radar system is shown in gure 14. This conguration is known as a side looking airborne radar (SLAR), where a swath of land is
illuminated by the radar beam on just one side of the vehicle track. The distance,
known as the slant range, Rs , between the antenna and a point on the scattering
surface can be determined by measuring the round trip time delay, , between the
35
(36)
where the microwave signal travels at the speed of light, c. A time log of backscattered returns allows the signal processor to generate a distance map between surface
scatterers and the antenna. By limiting the radar swath to just one side of the airborne platform, left-to-right ambiguities that might occur between two equi-distant
points on either side of the platform are eliminated [30].
The slant range resolution is a function of the duration or pulse width, p , of the
illumination pulses. As illustrated in gure 15, the return signals of the two point
scatterers, A and B, separated by a slant distance of Rs will have a time delay
dierence at the receiving antenna of t = 2Rs /c. In order to resolve these two
point scatterers, the radar pulse width will need to meet the constraint,
p < t =
2Rs
.
c
(37)
Solving for Rs , the best achievable slant range resolution, rs , for a given pulse width
becomes
rs =
p c
.
2
(38)
In the extreme, an innitely high slant range resolution (small rs ) appears possible
given an innitely small pulse width (p ). However, with the reduction in pulse width
comes a drop in transmitted power that eventually results in very little scattered
energy being returned to the sensor.
frequency modulated (FM or chirp) signal pulse that has both a large p and a large
bandwidth, Bp . The reected modulated pulse can be compressed later during image
formation to yield higher slant range resolution with more instantaneous signal power
[20], [80], [14]. The new slant range resolution then becomes
rs =
c
.
Bp 2
36
(39)
Rs
Rs
Rg
Rg
A
Rs
range
(a)
Tp
Wp
A
t
t
A
B
B
2Rs
(b)
Figure 15: (a) The range geometry of a side looking imaging radar with points A
and B shown with a separation of Rg along the ground and Rs along the slant. (b)
The timing of illumination pulses and scattering returns required to eliminate overlap
between the two along range points A and B. This species the slant resolution of
the SAR.
37
In real aperture radar (RAR), the resolution of the imaging system in the azimuth
direction depends directly on the width of the microwave beam along the azimuth.
Referring to gure 16(a), the only way points A and B can be resolved is if they are
not in the radar beam at the same time. The beam width, a = /L, is in turn a
function of the length of the antenna in the azimuth direction, L, and the carrier
wavelength, . The resolution then becomes simply the azimuth swath width, Sa , at
a given slant range, Rs , as
rRARa = Sa = 2Rs sin(
a
) Rs a
2
(40)
for small a . For a given carrier wavelength, the two ways to improve resolution are
to use a longer antenna aperture or to shorten the slant range distance.
Physical
constraints on both of these variables put limitations on the practical design and
operation of antennae and the achievable resolution for a real aperture radar systems
[20].
Using a method rst proposed by Wiley in the 1950s, SAR establishes the position
of targets along the azimuth by looking at the Doppler frequency shift of the returned
backscatter [96].
shown in gure 16(b) at equal slant range distances, Rs , from the antenna that is
moving at a velocity, V . With a platform velocity of zero, the reected frequencies at
the antenna from both points A and B would be fo = c/ or the original transmitted
frequency. Both of the return signals would reach the antenna at the same time due
to the identical slant range distances given a platform velocity of zero. This leads to
a front-to-back ambiguity between points A and B. However, if the platform has a
known velocity, V , then the return signal frequencies become
fA,B =
fo fD
fA,B =
2VD
,
(41)
where fD is the Doppler frequency shift, and c is the speed of light. VD is the Doppler
38
azimuth
Rs
Sa
a
Rs
range
V
L
azimuth
(a)
Rs
+fD A
2I
Rs
2x
-fD B
Figure 16: (a) The azimuth geometry of a Real Aperture Radar (RAR). If the
points A and B both fall within the azimuth swath width, Sa , of the radar beam they
will not be resolved. (b) The timing of illumination pulses and scattering returns
required to eliminate overlap between the two along range points A and B. This
species the slant resolution of the SAR.
39
velocity for the given angular oset, , of the targets and is given by
VD = V sin().
(42)
By frequency
ltering the return signals, the SAR processor can distinguish between two point
scatterers at equal slant distances based on the Doppler shifts and eliminate the frontto-back position ambiguity. This is known as unfocused SAR processing. Focused
SAR processing compensates for the nonlinear phase behavior of the return signals
resulting from the relative target motion during the observation interval [20]. This
slight shift in target position relative to the antenna platform results in a Doppler
bandwidth (as opposed to a xed Doppler frequency) that must be accounted for in
order to achieve the theoretical SAR azimuth resolution limit of
rSARa = L/2.
3.1.2
(43)
Consequently, the
magnitude of the microwave return signal can be aected by the objects local surface
geometry (like the slope), texture (or roughness), and inhomogeneities (like porosity)
as illustrated in gure 17.
constant, absorption, and conductivity will also aect the scattering/reection of the
signal [30], [20], [1].
Natural surfaces can be represented as the sum of large scale (low frequency)
straight facets and the small scale (high frequency) surface roughness. The surface
roughness can be characterized statistically in terms of the standard deviation relative to the mean of the surface and the correlation length, which is dened as the
40
Figure 17: Illustration of how various surfaces can aect a radar return signal.
the superposition of the individual elds from each of the facets. The fundamental
limitation of the facet scattering model is that it is only valid at high angles of
incidence [30].
Two other models, the point scatterers model and the Bragg model, are used in
conjunction with low angles of incidence. It is at these lower angles that the surface
roughness starts to dominate the backscatter return signal. As the name insinuates,
the point scatterers model represents the surface as a homogeneously distributed
collection of radiating points that are uncoupled. The total backscatter is considered
41
as the sum of the individual point scatterers. The individual responses are derived
from the projection of the incident wavefront on the surface of the point scatterers.
Elachi [30] presents a simplied model with a backscatter equation given by
() = N0 cos2 ,
(44)
for N point scatterers each with a backscatter cross section (also called a radar cross
section or RCS) of 0 and an angle of incidence of .
In the case of the Bragg model, the backscatter is assumed to result from resonance between specic spectral components of the surface and the incident wavefront.
Specically, those spectral components that are integer multiples of half the signal
wavelength, , given by
= n/2 sin , n = 1, 2, . . .
(45)
where is the angle of incidence [30]. Thus at low angles of incidence the backscatter
is also partially a function of the spectral composition of the surface roughness.
Several other factors also aect the magnitude of the returned radar signal. These
include absorption losses in both the atmosphere and the reecting surface and volume
scattering (or the transformation of energy into heat/vibration).
The polarization
(either horizontal or vertical) of the incident waveform also has an eect on the
returned signal resulting in highlighting of dierent surface features. As mentioned
in the discussion of the Bragg scattering model, the signal frequency can have a major
impact on the signal backscatter.
frequency is the primary parameter that inuences the extent to which a radar signal
can penetrate the surface. In general, the penetration depth is linearly proportional
to the signal wavelength [30].
42
SAR
Sensor
Signal Data
(complex)
SAR
Imagery
Image
Formation
Processor
Image
Processor
Scene
Information
SAR Platform
Motion Data
Motion
Compensation
Image
Exploitation
3.1.3
A SAR imaging system is usually comprised of the basic elements shown in gure
18.
The image exploitation block then classies the image data based on the
43
lters after both have been transformed into the frequency domain by means of fast
Fourier transforms. In the range compression direction, the matched lter consists
of the complex conjugate of the originally transmitted chirped signal. The matched
lter used in the azimuth compression is derived from the Doppler history of the received signal and takes into account the pulse repetition frequency of the SAR system
as well as sensor platform motion [14], [38]. While the image formation process illustrated in gure 19 involves 1D processing, it is not dicult to realize that the entire
process can be accomplished using full 2D matched lters and fast Fourier transform
processing. Various windows, such as the Hamming, Taylor, and Kaiser-Bessel windows, may be applied to the matched lters (also known as reference functions) prior
to compression as a way to achieve side lobe suppression and control the peak and
integrated side lobe ratios. However, this comes at the expense of reduced resolution
and blurring from the resulting wider mainlobes [20], [38], [26].
3.1.4
at the antenna is the superposition of backscattering interactions between the transmitted signal and a multitude of elemental scatterers within a resolution cell, dA,
on the target surface. However, even over a nominally homogeneous target, signicantly dierent surface interactions can result due to the natural variability of surface
characteristics as described in section 3.1.2. As a result, the cross section or RCS
parameter, , which represents the amount of backscattering from a resolution cell, is
really better described as a random variable with a given probability density function.
Ideally, a SAR image would represent a collection of the mean cross section values,
= E{e }, of each of the dA resolution cells.
of a collection of cross section values governed by the targets surface cross section
probability density function and some variation about its mean value, [20], [30].
44
SAR
Sensor
Raw Signal
Data (complex)
FFT
Range Compression
Matched
Filter 1
Matched
Filter 2
Range Migration
Patch Processing
FFT
Azimuth Compression
Platform
Velocity,
PRF,
Absolute
Range
45
This variation is what makes up speckle noise in a SAR image. It manifests itself
in the image as spurious high and low intensity pixel regions (usually only 2 to 3
pixels across) distributed spatially throughout the scene. The statistical nature of
the speckle is not constant across the SAR image as it is an artifact of the scattering
properties of the various surfaces in the scene. As a result, the statistical properties
of speckle noise within an image are closely related to the properties of the various
target cross sections.
3.2
While the Forward Problem described in section 3.1.2 seeks to predict the speckle
formation process using complicated models of the scattering medium and the physics
of the transmitted wave, the Inverse Problem attempts to derive the radar cross
section (RCS) or true reectivity, (), of the scattering surface from the resulting SAR
intensity image. Despite the rather thorough understanding of speckle characteristics,
there are no unique solutions for obtaining from the intensity image because of
the stochastic contribution of the speckle to the image [80].
As a result, several
speckle reduction and image enhancement methods have been proposed, and they
can be divided into two general categories: processes implemented during the image
formation stage and those applied to the image after formation.
The following
sections 3.2.1, 3.2.2, and 3.2.3 describe some of the traditional post-image-formation
speckle reduction techniques, while section 3.2.4 provide a brief overview of the the
pre-image-formation resolution enhancement methods.
The generalized Gaussian analysis-by-synthesis (GGABS) model proposed in this
chapter is also a post-image-formation process.
tional post-image-formation processes seek to eliminate speckle from the scene altogether through image smoothing processes, the goal of the GGABS model is to reduce
the granularity and size of the speckle without losing the texture in the image. The
46
resulting images take on the appearance of higher resolution image data, and yet they
do not compromise the structure and texture of the original SAR image.
3.2.1
One of the traditional methods for reducing speckle is multilook processing [34]. Since
the occurrence of speckle can be considered a random process, one can conceivably
reduce the speckled eect through averaging.
apply windowed averaging to the magnitude image (hence the common name of Multilook Averaging). While this implementation does reduce the amount of speckle and
requires only one SAR image, it reduces the image resolution in both the azimuth
and slant range directions as a function of the window size.
Another method of multilook processing averages multiple coherent data sets (or
looks) of the same target region. These data sets might consist of several independent
single-look complex SAR images of the same target region that are spatially aligned
and then averaged on a pixel-by-pixel basis.
function of the number of independent looks, and the results are often valuable only
for the limited region where the alignment of all the data sets is attainable.
A third approach exploits the SAR image formation process and tends to be the
most popular.
antenna length (see section 3.1.1), and it is usually several time higher than the slant
range resolution, which is a function of the signal characteristics of the illumination
pulses. As a result, more lines of raw data are collected in the slant range direction
(or along the azimuth) yielding dierent sampling rates in each direction. Averaging
the appropriate number of the slant range data lines along the azimuth direction
47
Range Direction
Azimuth
Direction
Multilook
Averaging
Figure 20: Multilook processing reduces the eects of speckle noise in a SAR image
by exploiting the ability of SAR systems to collect higher resolution data in the
azimuth direction than in the slant range direction. The higher resolution data is
averaged along the azimuth to produce square pixels with reduced speckle.
achieves some speckle reduction, while at the same time producing square pixels that
are easier to display.
exists, however it is constrained to only a single dimension in the SAR image. Figure
20 illustrates this most common implementation of multilook averaging.
Li et al.
were able to show that similar results could be achieved by simply low-pass ltering
a single-look image [69]. As a result, the multilook approach is not considered very
valuable for enhancing image understanding or preserving detail.
3.2.2
The Lee lter [65], [66], and the Kuan lter [63] are two adaptive lters commonly
used to reduce speckle in SAR imagery. They are based on the multiplicative model
of speckle noise
I(x, y) = RCS(x, y) u(x, y),
(46)
where I is the image intensity, RCS is the radar cross section, and u is the speckle
noise.
The Lee lter approximates the multiplicative noise model with a linear
48
model that resembles a Taylor series expansion and then applies the minimum mean
square error (MMSE) criterion to the linear model. The Kuan lter transforms the
multiplicative model into a signal-based additive model and then in a similar fashion
to the Lee lter applies the MMSE criterion to the additive model [88]. The lters
have very similar forms as both vary the output by minimizing the mean squared error
between the current intensity and the expected t, which is based on local image
statistics.
explicit model of the underlying image [80]. The Kuan and Lee lters use weighted
linear combinations of the mean pixel intensity, I w , of a local window around the
center pixel, Ic , to generate an output, I, given by
I = I w + k(Ic I w ).
(47)
Lopes [72] generalized the ltering process by dening a set of coecients that characterize the nature of the speckle noise and the local texture in the image. These
are the coecients of variation
Cu =
u
,
u
CIw =
Iw
Iw
(48)
Because the statistics, u
and u, of the speckle are not known explicitly, they are approximated as the standard
deviation and mean of a representative (relatively large but homogeneous) subregion
within the speckled image. The Lee and Kuan lter weights can then be dened in
terms of the coecients of variation as
Cu2
CIw 2
(1 (Cu2 /CI2w ))
kL = 1
kK =
(1 + Cu2 )
(49)
(50)
CIw Cu ,
k = kL , kK , kLm ,
k = 1,
CIw Cmax .
(51)
Cmax is a threshold that is often chosen to be the largest valued local variation coecient (calculated over small windows) observed within the representative speckle
subregion. In addition, the modied Lee weighting function was proposed that provides smoother transition between the lter weights across all three classes.
The
kLm
0.1(CIw Cu )
= 1 exp
.
(Cmax CIw )
(52)
For each of these three lters (Lee, Kuan, modied Lee), as the pixel intensity variation within the current local window approaches the expected speckle variation, the
output value trends toward the mean intensity, Iw , of the local window. The larger
the pixel intensity variation becomes within the window the more the lter output
approaches the current center pixel value, Ic . Figure 21 shows a plot of the weighting
functions for all three lter implementations using the Lopes criteria in equation (51).
The modied Lee lter provides a smooth transition through the entire range, while
the standard Lee and Kuan lters result in a discontinuity at Cmax [72].
As the
images in gure 22 illustrate, the processed output of all three lters is very similar.
This would be expected given the similarity in the weighting functions.
50
All three
0.9
0.8
0.7
Weight
0.6
0.5
0.4
0.3
0.2
0.1
0
0.4
0.45
0.5
0.55
0.6
0.65
Coefficient of Variation
0.7
0.75
Figure 21: Weighting functions for the Lee, Kuan, and Modied Lee speckle lters
using Cu = 0.450 and Cmax = 0.705.
Model based despeckling techniques assume prior knowledge of the statistical characteristics of the radar cross section (RCS) and use this information to generate
improved estimates of the same.
(MAP) lter, which assumes that the probability density function (PDF) of the
RCS is either known or can be modeled a priori [63], [80].
(a)
(b)
(c)
(d)
Figure 22: (a) The original SAR chip image HB03333 of a BMP2 from the MSTAR
database. (b) Speckle reduction illustration using the Lee lter. (c) Speckle reduction
illustration using the Kuan lter. (d) Speckle reduction illustration using the Modied
Lee lter and a 5 5 window. The top left 32 32 pixel region of the original image
was used to establish Cu = 0.450 and Cmax = 0.705 for the modied Lee lter.
52
P (I|R)P (R)
,
P (I)
(53)
where I is the observed SAR image intensity that has been corrupted by speckle noise
(u) as given in the multiplicative noise model in equation (46).
The logarithm of
the conditional probability in (53) is then maximized by taking the partial derivative
with respect to the RCS (R) and setting it to zero resulting in
ln P (I|R) ln P (R)
+
R
R R=R
= 0.
(54)
M AP
This yields the maximum likelihood (rst term) of the detected image intensity once
the RCS model is given, and the maximum a priori (second term) value of the mean
image intensity [71].
stationary variance model, where the underlying scene has a Gaussian PDF,
[R R]2
1
exp
P (R) = !
,
2R2
2R2
(55)
where the mean, R, and variance, R2 , are estimated from the SAR image based on the
local statistics of a moving window (similar to the Lee lter). Solving the maximum
a priori component of equation (54) using a Gaussian PDF leads to the cubic MAP
equation
3
2
2
RM
AP R RM AP + R (RM AP I) = 0,
(56)
where the real root of the equation, RM AP , has a value that lies between the observed
intensity I and the local mean R. The image ltering is achieved by replacing the
window center pixel with the real solution/root of the Gaussian MAP equation. An
example of a Gaussian MAP processed output image is shown in gure 23.
The Gamma distribution has been presented as an improvement to the Gaussianbased MAP lter. It has been established [55], [57], [81], that the detected intensity
in a SAR image can be described by a K-distribution for a wide variety of natural
53
scattering scenes. Lopes et al. [72], [71] claim that the interaction of a coherent wave
with a Gamma distributed surface leads to a K-distributed observation, suggesting
the use of the Gamma PDF,
R
R1 ,
exp
P (R) =
R
R ()
(57)
to represent the underlying scene RCS. This provides a more realistic representation
of the underlying texture when looking at ecological materials in natural wooded and
vegetated regions. The term is a heterogeneity parameter given by
=
(1 + Cu 2 )
,
(CIw 2 Cu 2 )
(58)
that is approximated from the coecients of variation of the speckle noise, Cu , and
the lter window, CIw , as given in equation (48). Substituting the Gamma PDF into
the Bayesian MAP equation (54) leads to the Gamma MAP lter equation
2
RM
AP
1
I R
+ 1 + 2 + R RM AP
= 0,
Cu
Cu 2
(59)
where I is the current pixel observation and R is the mean of the local lter window.
The only positive root of the Gamma MAP lter equation is
"
2
2
C1u 2 1 R + R C1u 2 1 + 4
RM AP =
2
I R
Cu 2
(60)
where and Cu are given in equations (58) and (48) respectively. The Gamma MAP
lter is a non-linear solution for the underlying scene as a function of the current
observation, I, and the local a priori mean, R, as dictated by the coecients of
variation, Cu and CIw . Figure 23 shows sample images of the Gaussian and Gamma
MAP lter outputs.
The correlated neighborhood model uses a similar PDF modeling and Bayesian
approach with the dierence being that local statistics are used to ensure the reconstruction is as smooth as possible and yet consistent with the local region.
54
Other
(a)
(c)
(b)
Figure 23: (a) Subregion of the original SAR clutter image HB06211 from the
MSTAR database. (b) The speckle reduction results using the Gaussian MAP lter
with a 7 7 window. (c) The speckle reduction results using the Gamma MAP lter
with a 7 7 window. The bottom right 50 50 pixel region of the original image
was used to establish a Cu = 0.496, while Cmax = 0.558 for Gamma MAP lter.
55
derivations and combinations of these methods have also been proposed such as
the correlated neighborhood maximum likelihood and the correlated neighborhood
gamma MAP methods [80].
reduction (either in uniform regions, regions with known texture, or point targets).
The selection of a window size presents a tradeo between the amount of speckle
reduction and the preservation of image detail in each of these approaches.
Enhancements to all of these adaptive lters can be achieved by taking into account the scene structure in the SAR image. The simplest way of doing this is to
identify edges and varying texture regions within the scene. By choosing an appropriate lter and window size, a reconstruction can be generated that accommodates
these various natural features, without over or under smoothing the content.
Another approach is to apply the adaptive despeckling techniques iteratively to
the same image.
Resolution enhancements beyond those achieved through the traditional post processing techniques have attracted a signicant amount of interest in the SAR community.
Most of the work has concentrated on the application of spectral estimation techniques to the problem of accurately estimating the power spectral density of SAR
phase history data. The eld of spectral estimation can be broadly subdivided into
two groups: classical or Fourier methods and modern spectral estimation methods.
The classical methods are non-parametric and include the Fourier based periodogram,
windowed periodogram, and Blackman-Tuckey spectral estimation methods.
The
56
generate a spectral estimate. In this case, the problem becomes one of accurately estimating the model parameters, and as a result they are often referred to as parametric
spectral estimation methods [61].
The motivation for pursuing spectral estimation techniques for the SAR application is the hope of enhancing the resulting SAR images by modifying the phase
history data prior to image formation.
k=
1
1
f ,
2
2
(61)
for a wide sense stationary random signal, x[n], where rxx [k] is the autocorrelation
function of x[n]. However, by noting that the limits on the integration go to innity
one can quickly recognize that the true PSD of a signal is not realizable for real
world limited duration signals. Instead, the best that can be done is to generate an
accurate estimate of the PSD for a signal that is limited in scope. It is the quest for
this PSD estimate that forms the basis of spectral estimation theory and practice.
Simply put, the goal is to estimate the PSD of an innite sequence (with an innite
autocorrelation function) using a nite portion of that sequence (and consequently a
limited autocorrelation function).
Classical Fourier spectral estimation methods are the most widely used.
They
only operate on those values within a nite observation range and consider all values
outside that range to be zero. The periodogram and the Blackman-Tukey spectral
estimators are the most common of the classical PSD estimators with the periodogram
given by
2
N
1
1
(62)
However, as with most Fourier based techniques there is a tradeo in this case as
well between the variance and the bias or resolution of the PSD estimate.
In the
case of the periodogram, the estimate average does converge to the true average
57
value as the length of the data set N large, but the variance remains constant
while the bias grows (and the resolution gets worse).
to be an inconsistent estimator.
As a result it is considered
segments the input sequence into blocks, calculates the individual periodograms of
the windowed blocks, and then averages them to produce the resulting estimate. This
has the eect of reducing the variance while keeping the bias small through the use of
smaller sequences. The net result is better resolution and lower variance [61], [32].
The Blackman-Tukey spectral estimator [10] is also a Fourier based technique and
is dened as
PBT (f ) =
M
w[k]
rxx [k] exp(j2f k),
(63)
k=M
where rxx [k] is the autocorrelation estimate of x[n], and w[k] is a windowing function
called a lag window with the properties:
0 w[k] w[0] = 1
w[k] = w[k]
(64)
However the length of the lag window, M, can be adjusted to achieve the best compromise between the two for the given application [61].
In contrast to the classical methods, modern spectral estimation techniques employ models of the PSD, which can be represented by a limited number of variables,
to generate the spectral estimates.
the known autocorrelation values derived from a nite signal. In general, these models do not put any constraints on the values of the sequence outside the observation
range. As a result, the models can be used to extrapolate the PSD sequence outside
58
the observed range. If the model is accurate, then the theory implies that expanding
the PSD sequence through extrapolation will lead to higher spatial resolution (superresolution) in the image. However, all of this relies on the selection of an accurate
model that does not articially introduce artifacts into the PSD estimate that are
not part of the true PSD. Very little guidance exists on the appropriate selection
of models [61]. In addition, these modern spectral estimation techniques eliminate
the window functions used in the classical Fourier methods and the distortions introduced by those windows. This alone accounts for some of the realized improvement
in terms of resolution and image quality [26].
With this modeling approach, the spectral estimation problem shifts to being
more of a parameter estimation problem. As a result, the broad eld of statistical
parameter estimation can be employed in generating accurate models of the PSD.
These models are often based on a priori knowledge of the SAR signal generation
process. The only drawback to modeling-based spectral estimation is the lack of a
generally accepted denition of what it means to be optimal, and this in turn has led
to a proliferation of models with all sorts of claims regarding ecacy.
The suite of rational transfer function models consists of the autoregressive (AR)
model, the moving average (MA) model, and the autoregressive moving average
(ARMA) model, which is given as
x[n] =
p
a[k]x[n k] +
k=1
q
b[k]u[n k],
(65)
k=0
values for a minimum number of a[k] and b[k] parameters that allow an accurate
representation of the spectrum. A closer inspection of equation (65) shows the two
distinct AR and MA components in the rst and second terms respectively.
The
all-pole AR term is good for modeling sharp peaks in the spectrum, but it does not
represent the valleys very well. In contrast, the all-zero MA term does a good job of
modeling the deep valleys in the spectrum, but it is insucient in representing sharp
59
peaks. The combined ARMA model provides the capability to adequately model both
extremes at the cost of a higher number of parameters.
Choosing an appropriate
model order becomes an iterative process, and while some knowledge of the process
being modeled can be helpful, the nal choice is typically derived experimentally [61].
Of these three, the AR models remain the most popular, as the model parameters
can be found by solving a series of linear equations, known as Yule-Walker equations.
The Levinson algorithm [67] further simplies the establishment of a suitable AR
model by providing an ecient method for recursively solving these equations.
While the last few paragraphs have focused on some of the more traditional spectral estimation processes, DeGraaf [26] provides an excellent review and comparison
of some of the latest spectral estimation techniques and models.
In addition to
speckle by using what is equivalent to a series of xed lters, but at the expense of
resolution and higher sidelobe distortion. Both of the MVM and RRMVM methods
reduced sidelobe distortion while reducing speckle eects by using adaptive lters.
However, the resolution improvement is dependent to some extent on the nature of
the image content, and computation complexity starts to become a concern. The AR
60
methods, the EV/MUSIC processes, and the Pisarenko algorithm all produce good
resolution images with reduced speckle and limited sidelobe interference. Again, the
extent of the goodness of these results is somewhat dependent on the nature of the
image scene content, and these methods are computationally burdensome [26].
In
addition there are multiple variations to these models that allow them to perform
better for certain SAR implementations [97], [70], [46]. One thing that most of them
tend to have in common is the goal of trying to produce images that appear to have
been obtained from higher resolution sensor hardware while reducing the impact of
the speckle noise.
3.3
The approach introduced here does not attempt to derive quality improvement through
power spectrum modeling like the superresolution schemes, nor does it attempt to
suppress noise based on statistical characteristics as some of the ltering and denoising
schemes attempt to do. Rather, the approach taken here is to provide a decomposition where texture, object and terrain features, and speckle can be decoupled (to
some extent), modied, and reconstitutedthe idea being that having these components isolated, modications can be tuned to make the enhanced images approximate
SAR images of higher resolution.
This is
important because SAR images are routinely reviewed by image analysts trained to
recognize specic threats. Thus, for them, preserving the familiar appearance of the
images is an important propertya property not present in many enhancement and
superresolution techniques.
summed up as follows.
Ultimately, image understanding is rooted in experience and empiricism,
61
both as regards our broad understanding of how the world is and in experimental ndings that guide our perceptions of what aspects of the image
should be examined. pg. 81.
Another measure of eectiveness for the SAR application is automatic target
recognition (ATR) performance improvement. ATR performance is signicant, because automated screening is often desired as a way to handle massive amounts of
acquired SAR images and to assist in detecting threats and identifying areas of interest.
resolution SAR imagery [83]. Thus, the extent to which an enhancement algorithm
can improve ATR performance can be viewed as a measure of its value.
To facilitate control over the speckle texture, the Gaussian waveforms in the
GGABS model are biased to have sizes approximately equal to or less than the size
of the speckle grains.
(66)
where determines the relative weighting of the two components. A block diagram
of the algorithm is given in gure 24. For this particular application, a 9 9 median
lter is used to perform this low frequency extraction, although a more sophisticated
denoising lter could be applied if the need called for it.
62
i [n1,n2]
EXTRACT
BEST
GAUSSIAN
+
_
DENOISED
IMAGE
EXTRACTOR
MODIFY
ANALYSIS
GAUSSIAN
MODIFY
SYNTHESIS
GAUSSIAN
ACCUMULATE
GAUSSIANS
ACCUMULATE
GAUSSIANS
MODIFICATION/
ENHANCEMENT
~
i [n1,n2]
RECOMBINATION
63
3.4
the analysis-by-synthesis process. All of these topics are discussed in the following
sections.
3.4.1
Parameter Extraction
The GGABS models search objective for the SAR application is to nd generalized
Gaussian functions that best represent the speckle in the image. This is accomplished
by targeting the higher amplitude, very narrow acircular peaks that characterize the
speckle using a peak-picking based search method. While more sophisticated search
routines that navigate the parameter space in some optimal way can be employed, for
this application, subjective quality is a major motivation. Tying the search process
directly to the spatial characteristics of the speckle in a simple yet pragmatic way
ensures that the resulting extractions are directly tied to the motivation, and not
subject to some arbitrary optimization constraint. While this might not be the most
ecient process, it is shown to be sucient for achieving the desired results.
Reasoning suggests that the largest amplitude peak in the residual image represents a large portion of the overall energy in the image.
rst we can be fairly certain that the residual error is decreased appreciably. While
the separable formulation of the 2D generalized Gaussian has what might appear to
64
be some major drawbacks in terms of a circularly smooth representation (see section 2.2), the combination of the successive approximation of the ABS and the very
narrow nature of the speckle peaks being modeled makes these negative eects somewhat inconsequential.
that once a peak has been identied, the remaining lobe width and shape analysis can
be performed independently along each axis. This further reduces the computational
complexity of the search. The generic search algorithm works as follows:
1. Find the largest peak in the current residual image.
2. Determine the spatial position and maximum value of the peak and store these
as the weight (Al ) and position (1,l , 2,l ) of the current extracted function. This
immediately xes three of the generalized Gaussian parameters and reduces the
parameter search space under consideration.
3. Search outward along each axis in both directions and identify candidate lobe
width () and shape () parameters that fall within the expected range of values
for speckle.
4. For each axis chose the smallest parameter values and store them as the actual
generalized Gaussian lobe width (1,l , 2,l ) and shape parameters (1,l , 2,l ).
The reason for choosing the smallest lobe width and shape parameters is to ensure that the extracted function ts (as much as possible) within the shape of the
peak it is modeling.
The
search criterion for identifying candidate lobe width values entails nding image pixel
intensity values that correspond to the associated Gaussian standard deviation values
(which is approximately 0.6 times the peak value, Al ) by searching outwardly from
the current peak along the row or column paths. Once a pixel is identied that meets
this criterion, the spatial oset between the peak location and this pixel is recorded
65
as a candidate lobe width. If the search condition is not met prior to encountering a
local minimum, then the local minimum location is used instead. Again the smallest
value in each axis direction becomes the extracted Gaussian lobe width. Once the
lobe widths in each direction have been identied, the shape parameters, 1 and 2 ,
are chosen to minimize the error between the generalized Gaussian model and the
speckle at that spatial position.
3.4.2
Parameter Modication
In general, the packing density of the synthesis Gaussians and resulting speckle in
the output image is controlled by the size of the analysis function.
An analysis
Gaussian with lobe widths that are some fraction smaller than the extracted function
will generate residual lobes during the subtraction process. These residual lobes are
then modeled at some later iteration by the ABS using smaller Gaussian functions
that are spatially near the originally extracted Gaussian.
forcing the model to use more Gaussians to approximate the speckle. The result is
an image with eectively higher speckle density.
The synthesis Gaussians tend to dictate the size and spatial extent of the speckle
granularity in the nal image. Accumulating synthesis functions, which are reduced
lobe width versions of the extracted functions, will result in a reconstructed image
with narrower speckle. This eect can be further enhanced by reducing the shape
parameters that sharpen the synthesis Gaussians. The result is an image where the
spatial extent of the speckle has been reduced.
By operating the GGABS model in the decoupled mode and carefully modifying
the Gaussian parameters in both the analysis and synthesis stages one can eectively
control the nature of the speckle. This results in a synthesis image that eectively
has ner granularity and higher speckle density.
66
3.5
SAR Results
The two relevant indicators of the generalized Gaussian models eectiveness for the
SAR application are subjective quality assessment and automatic target recognition
(ATR) performance improvement. In addition, some insight into the uniqueness of
this model and approach can be gained by comparing it to some of the traditional despeckling methods discussed in section 3.2. The subjective quality between processed
and unprocessed SAR image scenes was assessed using the formal stimulus-comparison
categorical judgement scale as dened in the CCIR Recommendation 500-4 [2]. In
a similar fashion, ATR performance improvement was evaluated by comparing the
classication results between a set of original unprocessed images and those that had
been resolution enhanced by the generalized Gaussian model.
showed gains in both subjective quality and ATR performance after the images had
been processed using the generalized Gaussian model.
A set of general urban image scenes was used during the subjective evaluation,
while the public MSTAR database was used in the ATR performance testing. Both
sets of images were processed by the generalized Gaussian Model in the decoupled
mode using a density based stop condition of 75% to terminate the analysis-bysynthesis process.
Subjective Validation
Resolution enhancement was performed on a set of urban scene SAR clutter images.
These enhanced images were viewed by 7 dierent assessors and rated in terms of
resolution improvement over the original SAR images using the CCIR Recommendation 500-4 stimulus-comparison subjective test. In this method the assessor is shown
a test and reference image and is permitted to switch freely between the two images
until a relative grade for the test image is established.
67
Table 1: Subjective analysis mean opinion score for the top ve images.
Image
Image
Image
Image
Image
1
2
3
4
5
Four one-
meter resolution images were used to generate the 20 test images. These test images
were generated to have samples distributed throughout the scoring range in terms of
enhancement (i.e. it had samples of good and bad enhancement), and they represented a subset of possible reconstruction congurations of the generalized Gaussian
ABS model.
random order. The viewing distance was 4 times the image height.
The highest scoring image had a mean opinion score of 2.07. The majority of the
images (80%) had a mean score greater than zero which is parity.
The average of
the mean scores for the top ten images was 1.41. The results for the ve best scoring
images are shown in table 1.
These results clearly show that resolution enhancement by means of the generalized Gaussian model does provide signicant subjective improvement over the
originals. Examples of an original SAR clutter image, a resolution enhanced version
of this image, and an image ltered by the Lee lter and a 5 5 window are shown
in gures 25, 26, and 27. The higher density and ner granularity speckle is clearly
evident in the resolution enhanced image, while edge and feature denition associated
with the original have been preserved. In contrast, the Lee ltered image is clearly
smoothed and much of the texture in the image has been removed.
68
69
Figure 26: Resolution enhanced SAR clutter image with modication factors of 0.8,
0.4, 0.6, and 0.4 for the analysis lobe width, synthesis lobe width, analysis shape, and
synthesis shape parameters respectively.
70
Figure 27: SAR clutter image processed using the Lee lter with a 5 5 window.
71
3.5.2
The public MSTAR database, which contains both training and testing images of
the T-72, the BMP-2, and the BTR-70, was used to evaluate the eectiveness of the
resolution enhancement for ATR performance. In order to establish a benchmark for
comparison, a higher order neural network (HONN) ATR algorithm [52] was trained
and tested on the database of images without any resolution enhancement.
The
same set of images were then processed using the generalized Gaussian model and
evaluated by the same ATR algorithm. The generalized Gaussian model parameter
modication conguration is given in table 2. In this application the best conguration was established through an empirical process that systematically adjusted the
model parameters and observed the classication output.
proved overall results were kept and recorded while those that decreased performance
were discarded.
Table 2: Multiplication factors used to modify the analysis and synthesis Gaussians
that resulted in the best performace for the SAR ATR application.
Gamma
0.6
Analysis
mf mf
0.6 0.6
Synthesis
mf mf
0.3
0.8
The results given in table 3 show the classication performance improved signicantly for the BMP2 targets, while the T72 targets achieved a 100% correct classication score. The BTR70 classication results improved by 1.4% with the resolution
enhanced data.
is the highest reported score to date for this ATR algorithm using the 22.5 degree
MSTAR target chip data set.
3.5.3
The primary goal of the traditional despeckling techniques discussed in section 3.2
is to eliminate the speckle from the image by means of various statistically based
72
Table 3: Classication results for the 22.5 degree MSTAR target chip data classied
by the ATR process with and without resolution enhancement.
Without Enhancement
Resolution Enhanced
Test Images T72 BMP2 BTR70 T72 BMP2 BTR70
582 T72
99.7% 0.15% 0.15% 100%
0%
0%
587 BMP2
1.5% 90.3%
8.2%
0.5% 94.4%
5.1%
196 BTR70 2.7%
1.4%
95.9% 2.0% 0.7%
97.3%
% correct
95.3%
97.2%
ltering processes. The objective the GGABS, however, is to modify (not eliminate)
the existing speckle in such a way that the resulting image assumes an appearance
of higher resolution with smaller speckle while retaining the inherent structure in the
scene. As a result, the images generated by the GGABS process are very dierent
in appearance from the images produced by the traditional speckle reduction lters.
Comparing and contrasting the eectiveness of each of these methods is really a
function of the measure used and the attributes of the images that contribute to that
measure.
In the world of image compression the peak signal-to-noise ratio (PSNR) is an
established measure that has been used extensively. It is easy to compute but has
the drawback that it often does not correlate well with subjective quality.
In the
world of SAR speckle suppression, the eective number of looks (ENL) [40] measure
plays a similar role. Consequently, the ENL is examined as a performance measure
for the purposes of comparison.
The ENL is a simple statistical measure given by
ENL = (/)2,
(67)
where and are the mean and standard deviation respectively of the intensity values
within a relatively uniform region in terms of the image texture.
The assumption
is that the intensity variation observed in this uniform region is due exclusively to
the speckle noise present and not the result of underlying image structure or image
73
comparable result as the eective number of looks that would be required in the
multi-look speckle reduction method.
clearly indicates that it simply quanties the reduction in pixel intensity variability
as a ratio of the mean. This essentially equates to measuring the smoothness of the
image and not the resolution of the image. Figure 28 shows the ENL performance on
two images of the exact same image scene but with dierent resolutions. A 32 32
region in the bottom left corner of each image was used to calculate the measure. The
original resolution image (1m) had an ENL value of 2.5996, while the low resolution
image (2m) yielded a much higher ENL value of 7.4372. The higher ENL value would
indicate better speckle suppression performance, however the higher resolution image
clearly has better feature and edge denition.
Results were generated using the ENL as a criterion for the Lee lter, the enhanced
Lee lter, the Kuan lter, the enhanced Kuan lter, and the Gamma MAP lter.
A uniform region within the SAR clutter image shown in gure 29 was selected
to generate the ENL values for the dierent lters with various window sizes.
In
addition, the GGABS model was used to process the same region of the test image
using a couple of dierent parameter congurations. In order to generate the best
possible results for the GGABS, a dynamic optimization approach was devised to
choose the Gaussian parameter values that yielded the highest ENL value.
The
74
(b)
(a)
Figure 28: The HB03424.0015 SAR chip image from the MSTAR database of a T72
tank shown in (a) the original resolution (1m) with an ENL = 2.5996 and (b) a low
resolution (2m) version with an ENL = 7.4372. This illustrates the ENL tendency
to measure image smoothness and not resolution.
2. Process the image region and calculate the ENL for the baseline case.
3. For the single Gaussian parameter, increase the value by 0.1 while keeping all
the other values constant, and recalculate the ENL value.
4. Continue to increment the Gaussian parameter value by 0.1 if the ENL increases,
otherwise decrement the Gaussian parameter value by 0.1. Recalculate the new
ENL value and repeat this entire step.
5. After a xed number of iterations of the previous step, stop the process and
determine if a local maximum has been reached in terms of the ENL criteria.
6. If not at a local maximum, repeat step 4.
7. If truly at a local maximum, record the Gaussian parameter value, and then
reset it to the baseline value of 1.
75
(a)
(b)
Figure 29: (a) Original SAR clutter image and (b) the region of this image used
to evaluate the ENL for the various traditional despeckling lters and the GGABS
model.
76
Table 4: Resulting parameter and ENL values for the GGABS model when optimizing the ENL for each parameter independently.
Parameter
Analysis Multiplier
Synthesis Multiplier
Analysis Alpha
Synthesis Alpha
Density
Value
ENL
0.9
2.4847
the larger the higher the ENL
1
2.4297
1.4
2.5167
1.1
2.6608
had a direct and very strong correlation with the ENL value. However, continuous
adjustment of this parameter led to image reconstructions that were meaningless in
terms of image content.
the experiments. The GGABS model was then congured with these values as the
starting point, and steps 3-8 in the search routine were repeated to arrive at the nal
set of optimal Gaussian parameter values based on the ENL. These parameter values
along with the associated ENL values are given in table 5 along with the baseline
ENL value and an example of how the synthesis multiplier directly impacts the ENL
value.
The ENL results of processing the same region using the traditional despeckling
lters and various window sizes is given in table 6. By examining the results of tables
5 and 6, it is clear that the GGABS model does a poor job of eliminating the speckle
as measured by the ENL.
3.6
Table 5: Resulting parameter and ENL values for the GGABS model when using
the values from table 4 as the initial starting points for the optimization over ENL.
The rst row gives the baseline results, while the last row shows the eect of the
synthesis multiplier on the ENL value.
Analysis
Multiplier
1
0.9
0.9
0.9
0.9
1
Synthesis
Multiplier
1
1
1
1
1
3
Analysis
Alpha
1
1
1
1.4
1.6
1
Synthesis
Alpha
1
1.4
1.1
1.4
1.5
1.4
Density
ENL
1
1.1
1.1
1.1
1.1
1.1
2.4182
2.8110
2.8586
2.8898
2.8990
4.3838
Table 6: The ENL results of processing the image region from gure 29 using the
dierent traditional despeckling lters with various window sizes.
Filter
Lee
Enhanced Lee
Kuan
Enhanced Kuan
Gamma MAP
78
GGABS model was congured to operate in the decoupled mode using a basic peakpicking approach for extracting the Gaussian parameters. The GGABS processing
results were evaluated subjectively by means of the CCIR Recommendation 500-4
stimulus-comparison subjective test as well quantitatively in terms of ATR performance improvement. The results in both cases showed signicant improvement over
the unprocessed SAR images. Finally, the GGABS model was compared with some
of the traditional speckle reduction lters by means of the eective number of looks
(ENL) measure.
However, as the results indicate, the GGABS model was not in-
post processing on SAR data where the speckle could be decoupled from the structure in the image, modied in some meaningful way, and then resynthesized with the
appearance of an image of higher resolution.
The ability to tune the Gaussian functions to model the speckle in the images
provides a signicant amount of exibility not only when extracting the speckle,
but in implementing meaningful modications. Again the ABS process provided a
framework for iteratively extracting the speckle in an intuitive and straight forward
manner. These key aspects of the generalized Gaussian ABS model make it ideally
suited for SAR image data for the purpose of achieving resolution enhancement.
79
CHAPTER IV
This chapter presents a unique application of the generalized Gaussian analysis-bysynthesis model.
4.1
Introduction
The Defense Advanced Research Projects Agency (DARPA) is seeking to develop the
next generation of combat systems for the US Army through the Future Combat Systems (FCS) program [25]. Its goal is to develop concepts and designs for a network of
lethal manned and unmanned units that are able to provide mobile-networked command, control, communication, and computer capabilities on the battleeld. These
functions would support multi-mission objectives including but not limited to adverseweather reconnaissance, surveillance, and targeting and acquisition [5].
A key component to ensuring battleeld superiority is the ability to achieve information and intelligence domination. This requires the combination of computational
and cognitive data from a variety of sensors and processes including Automatic Target Recognition (ATR). However, when assessing the battleeld space and making
command decisions, it is important to place more emphasis on the information from
those subsystems with higher reliability than the information from the subsystems
with lower reliability.
80
well the individual subsystems perform at the desired tasks given the current inputs.
The resulting metric values could provide commanders with a condence indication
regarding the outputs from the various subsystems when making command and control decisions. In the case of ATR systems a simple metric that correlates well with
performance could be used to assess the ATR contributions to the overall decision
process. This could also serve as a predictor of images that might suer from potential target misclassications in ATR and thus provide an opportunity to implement
an alternative classication process.
A clutter complexity measure could also aid in the development of future ATR
algorithms by enabling researchers to benchmark the complexity of image databases.
Currently, most ATR systems are developed around a single database, and good
performance across a broad variety of databases can not be assumed. The disparity
across the image databases, which makes some more dicult than others, is the
result of a number of factors including dierent target types, varying environmental
conditions, and varying levels of clutter.
4.2
One of the main obstacles to comprehending and predicting ATR performance is understanding the eect that clutter has on the detection process. The characterization
and quantication of clutter in infra-red (IR) ATR has remained an ambiguous task
because of its multifaceted nature. While sensor characteristics can be determined
81
by measurements in a lab, and man-made (and even some natural) targets can be
quantied by their shape and material compositions, clutter is not as easily dened.
The U.S. Army once described clutter as spurious or extraneous indications that
can cause the sensor to respond as though a target were present when it is not, can
cause the sensor not to respond when a target is present, or can cause the location
of the target to be sensed with substantial error [4]. The diculty lies in that the
real world contains an abundance of indications that arespurious or extraneous,
and dening them, even in broad terms, is no easy task.
In IR imagery, the manifestations of clutter result from an innumerable quantity
of sources. Consider all of the potential image scene congurations from a seemingly
innite array of terrain and target characteristics and geometries that can yield an
incredible amount of very diverse image content and clutter. Other system related
variables that can also impact clutter (albeit to a lesser extent) include the various
target and sensor geometries, such as the target aspect angles, ranges, and poses and
the sensor elevations.
directly impact image formation and the extent of the clutter that can be a confusor
when detecting a targets IR signature. This can lead to potential misclassications
by the ATR algorithms. Other examples of clutter include the instance where a hot
target leaves a temperatureshadow on a cool background, even after the target is
gone, and the case where vehicle exhaust combines with dust to produce a signature
cloud that resembles a target [64]. These simple examples just scratch the surface
of the multitude of possible manifestations of clutter that can lead to poor ATR
performance.
In addition, special attention must be given to ambiguous situations like images
with poor contrast, images of a single target among other non-target trac, images
with multiple target types, and images that contain high target densities. Each of
these scenarios presents a dierent view on how one might dene clutter complexity.
82
4.3
The objective of the clutter complexity measure is not to predict the exact performance of a specic ATR system, but rather to place outer bounds on the ATR
performance potential for a given image. As such, an image with a low clutter complexity score would suggest that on average ATR algorithms will perform with a low
rate of false alarms on the given image. In contrast, a high image clutter complexity score would indicate that no ATR algorithm will achieve a low false alarm rate.
In addition, the functional requirements of the Future Combat Systems application
dictate that the computational burden must be relatively low.
Kaplan developed a general approach for measuring the clutter complexity of a
FLIR image that collapses several image derived features into one value that correlates
with ATR performance bounds. This involved (1) partitioning the image database to
minimize all other attributes of the images except for the clutter, (2) evaluating the
ground truth clutter complexity by means of the ATR bounds, (3) training a clutter
complexity measure as the weighted sum of the image derived features such that it
correlated with ground truth over the respective database partition, (4) validating
the measure against both real ATR performance results and the ATR performance
bounds, and (5) rening the measure through the evaluation of the eectiveness of
83
PARTITION
DATABASE
ESTABLISH
GROUND TRUTH
TRAIN CLUTTER
COMPLEXITY
MEASURE
VALIDATE
MEASURE
REFINE
MEASURE
Figure 30: Block diagram of the general approach for generating a FLIR clutter
complexity measure that correlates with ATR performance bounds.
the image features being extracted [60].
The bulk of the development and testing of the measure was done using the COMANCHE FLIR image database, however it is not possible to include those results
in public documents because of the restricted classication of the data.
As a re-
sult, a smaller non-restricted database of FLIR imagery was used to generate the
results presented in this chapter.
mance and clutter complexity it is necessary to hold constant other variables such
as sensor delity and target/sensor geometries. This was accomplished by developing the measure over small partitions of the image database, where the target types,
poses, aspect angles, number of pixels on target, etc. are held relatively xed. As
84
much as was possible, the only attribute that was allowed to vary within the database
partition was the extent of the clutter.
4.3.2
Ideally the clutter complexity ground truth would correlate perfectly with ATR performance bounds for each image in the partition.
for determining bounds that exactly predicts ATR performance independent of the
ATR structure has not yet been developed. An alternative is to use a nely tuned
ATR and exploit a priori information of the targets to estimate the ATR performance
bound. This is done by training the ATR algorithm using the actual targets in the
scene and adjusting the algorithm parameters to optimize performance.
The wavelet-based learning vector quantization (LVQ) ATR algorithm developed
by the U.S. Army Research Laboratory (ARL) [17], which is one of the top performing
FLIR ATR algorithms [68], was used in this application.
algorithm computes and compares three subbands of the target chip with a set of
target codebooks, which have been computed o-line for a range of poses. A target
and pose hypothesis is generated by choosing the target codebook with the smallest
weighted mean squared error (MSE) across the three subbands.
In order to establish the upper performance bound of the ATR algorithm for
a given target, an optimal target template is generated.
simply consists of the best vector of subbands for this specic target and its given
pose. The correlation of the optimal template at all possible locations in an image
scene is compared to a threshold to determine potential targets.
The threshold is
normalized for each scene by making it proportional to the correlation value of the
optimal template with the actual target in the scene.
parameters associated with the scene (such as geometry and aspect angle) have been
held constant in the database partition, the total number of false alarms that result
85
Table 7: List of statistical based image processing features used in the FLIR clutter
complexity measure.
Feature Name
fBm Hurst Parameter
Standard Deviation
Schmieder Weathersby
Homogeneity
Energy
Entropy
Target Interference Ratio
Outlier Ratio
Description
Texture roughness
Global standard deviation
Average local standard deviation
Average pixel variations
Average histogram energy
Average histogram entropy
Average contrast
Average percentage of outliers
can be used to characterize the clutter in the scene. Using this procedure the clutter
complexity ground truth is generated for all the targets under consideration.
4.3.3
The objective of the measure is to derive a single value that represents the extent
of the clutter in the image. This is accomplished by collapsing a number of image
derived features into one value that correlates with ATR performance bounds, which
we use as the ground truth for complexity. The clutter can cause the sensor to yield
both false positives and false negatives with regards to targets in the scene.
As a
result, it is important from an ATR standpoint that the derived image features take
into account the target characteristics, as well as the background texture. The set
of features consists of the eight statistical based features given in table 7 and ve
generalized Gaussian model derived features that are the focus of the research being
presented in this chapter.
but will not be discussed. Further information on these features can be found in [79]
and [90].
The other ve features are derived from generalized Gaussian based decompositions of the scene.
analysis primarily because they are local in extent. This allows the decomposition to
86
easily accommodate local variations and features often associated with natural image
scenes.
the image given certain constraints. What makes each of the ve features unique is
the constraint on the scales of the Gaussians, the stopping strategies, and the nature
(original or negative) of the image.
13
w(n)Fi (n),
(68)
n=1
where w are the weights, and Fi are the image features for each image, i. A training
partition of the image database is used to compute the best set of feature weights.
They are selected to maximize the correlation between the clutter complexity measure
and the ground truth discussed previously.
optimal linear predictor of the ground truth false alarm count for this set of image
processing features [60].
4.3.4
Once the weights have been generated, the measure is tested over each image in the
non-training partitions of the database in two ways. In the rst method the results
of the clutter complexity measure are compared against the ground truth values (or
ATR performance bounds) for the various images. The second process compares the
clutter complexity values to actual ATR performance values, which were generated
by applying the ARL Federated Laboratory baseline FLIR detector [16].
In both
involves tuning the individual image features and weights to improve the correlation
between the ATR performance values (or bounds) and the clutter complexity values.
87
4.4
The primary contribution of this work to the complexity measure is the development
of ve Gaussian based image features that correlate with clutter complexity.
The
general underlying premise to this approach is that more Gaussians are required to
represent areas of higher image complexity than areas of lower image complexity.
Each feature is derived from the generalized Gaussian ABS decomposition of the
image and the formation of a Gaussian density prole.
number of Gaussians in the decomposition that meet the specic features criteria.
The brighter blocks or regions in the image indicate higher values that should correspond with regions of higher complexity in the scene. Examples of Gaussian density
prole images are given throughout this section in gures 31 through 33. The FLIR
image data used in this section to illustrate the various Gaussian density proles was
obtained from the Computer Vision Group at Colorado State University [19].
The exibility of the ABS process enables the model to constrain the search during
the decomposition for Gaussians that meet certain size and shape criteria. Combining this with the ability to perform the decomposition either over the entire image
or in local regions leads to a number of possible Gaussian density proles.
From
all of these proles, the ve that best correlated with clutter complexity were retained and used in the nal measure in combination with the aforementioned statistical features. The fundamental operational parameters of the generalized Gaussian
model that dierentiate the ve density proles and associated features are the constrained/unconstrained Gaussian analysis modes, the full-image/sub-block processing
modes, and the positive/negative input images.
4.4.1
The advantage of the constrained analysis mode is the ability to tune the model
to a specic set of Gaussian characteristics that closely represent the signature of a
88
target.
to retain only those Gaussian functions that fall within the specied amplitude and
lobe width ranges. The result is a reconstructed image of Gaussians that identies
the spatial locations and extent of objects in the scene that are of the expected size
of potential targets. This directly addresses one of the clutter complexity objectives
by identifying the target and target like features in the image.
However, constraining the analysis process may cause the model to ignore other
signatures that obfuscate the scene and are also denite contributors to image scene
complexity. Examples of this might include when multiple targets blend together in
the image to form a single larger signature, or when a single target is represented by
a signature consisting of multiple smaller hot spots in the image (possibly due to a
partial occlusion of the vehicle). To better accommodate these patterns in the scene
the image can be processed in the unconstrained Gaussian analysis mode as well. In
this mode no constraints are placed on the scale of the Gaussians during the analysis
of the scene.
The density proles generated in both the constrained and unconstrained modes
provide useful information in terms of clutter complexity.
provides an indication of the number of potential target like shapes of the expected
size in the region of interest. The unconstrained mode provides similar information,
but it also models objects that are larger or smaller than the expected target. Figure
31 provides an example of the Gaussian density prole images generated using the
constrained and unconstrained modes of operation.
4.4.2
The ABS process uses image amplitude or intensity characteristics during the analysis
stage to identify and extract the Gaussians. In the global processing mode, the search
space for the next Gaussian becomes the entire image. As a result, the analysis can be
89
(a)
(b)
(c)
Figure 31: (a) The original FLIR image, (b) the Gaussian density prole image with
the analysis constrained to nd target sized objects, and (c) the Gaussian density
prole image with the analysis unconstrained.
90
biased toward textured regions that contain multiple high intensity peaks. This can
cause the density based stop criterion to be satised, which would halt the analysis
before the model has considered all regions in the image.
In particular, textured
regions with lower amplitude peaks would be excluded, and they could just as easily
contain clutter that contribute to the overall complexity.
As an alternative, the search process can be made sensitive to local attributes by
dividing the image into non-overlapping sub-blocks and processing each sub-block independently. The stop criterion for the analysis process is applied to each sub-block
separately rather than to the image as a whole. This forces the model to consider
all regions of the image scene equally when performing the analysis.
For image
scenes with fairly uniform intensity distributions, the result will be a more uniform
Gaussian density prole in the reconstruction.
able Gaussian matches yield lower Gaussian densities, while sub-blocks with multiple
suitable Gaussian matches yield higher Gaussian density values.
Both the local block processing and the global processing modes of operation
provide useful information.
attributes, while the global method accommodates the larger image characteristics.
Figure 32 provides an example of an original FLIR image with the block processed
and global processed outputs.
4.4.3
In the FLIR application, potential targets can manifest themselves in the image as
structure with either high pixel intensities on a darker background (the case of a hot
vehicle in a cool background) or very low pixel intensities in a brighter background
(the case of the cold vehicle in a hot environment).
analysis approach using peak picking is well suited to modeling targets in the original
(positive) image since the targets are usually brighter than the background.
91
The
(a)
(b)
(c)
Figure 32: (a) The original FLIR image, (b) the Gaussian density prole image
with the analysis performed globally, and (c) the Gaussian density prole image with
the analysis performed on independent sub-blocks of the image.
92
second scenario, where the targets are darker than the background, is the inverse
problem. An easy way to accommodate this situation is to rst generate the negative
version of the original image. The dark target like features in the original image will
appear bright against a dark background in the negative image . The same standard
analysis approach that is used on the original (positive) image can now be used on
the negative image to extract target like features.
the image that may contribute to the complexity of the scene are included in the
overall complexity measure by way of the negative image based Gaussian density
prole.
Figure 33 illustrates how processing the positive and negative images will
yield dierent but complementary results. Processing the positive image results in a
density prole that highlights the hot target, while processing the negative image
yields a density prole the includes the cold target.
4.4.4
The generalized Gaussian ABS model can be congured using the various modes
presented above, but there are a number of parameters that must be selected for each
mode conguration. For example, consider all the possibilities for choosing the block
size when in the block mode or choosing the limits on the lobe widths when operating
in the constrained mode. The number of possible parameter and mode combinations
is quite extensive. The computational complexity would quickly become unacceptable
if some form of input selection and reduction were not implemented.
As a result,
a neural network sensitivity analysis approach has been adopted to identify mode
congurations and associated parameter settings that provide the best correlation
with scene complexity.
Sensitivity analysis is a common tool in the eld of neural network design that
describes how changes to the input variables aect the output variables. It is based
on the concept of pattern informativeness, which states that an input pattern with
93
(a)
(b)
(c)
(d)
Figure 33: (a) The original FLIR image, (b) the negative FLIR Image, (c) constrained Gaussian density prole image performed on original image, and (d) the
constrained Gaussian density prole image performed on the negative image.
94
x1
x2
y1
Neural
Network
y2
xn
ym
x1
x2
xn
Gradient
Model
Figure 34: A neural network block diagram with input variables xi and output variables yj and its corresponding gradient model with input variables xi and derivative
dy
output variables dxji .
In contrast, an
informative input pattern has a direct inuence on the outputs [31]. By observing
the corresponding changes on the output values, yj , to variations on the input values,
xi , one can establish the input/output sensitivity relationships for the given neural
network. In other words, the relative importance of an input variable corresponds
to the average size of the output gradient,
dyj
,
dxi
input values.
This forms the basis of a gradient model which is generated from the underlying
estimation or classication neural network. The outputs of the gradient model are the
derivatives of the neural network outputs with respect to each of the input variables,
dyj
,
dxi
evaluated for a given set of input values. For a neural network with n inputs and
m outputs, the corresponding gradient model takes the same n inputs and generates
n m outputs as shown in gure 34. To establish the overall contributions of each
input variable, one can measure the average of the squared gradient output values
95
over the entire data set. This yields the sensitivity matrix
2
2
2
dy1
dy2
dym
...
dx1 (p)
dx1 (p) dx1 (p)
2
2
2
dy
dy
dy
N
m
1
2
.
.
.
1
dx2 (p)
dx2 (p)
dx2 (p)
S=
,
..
..
..
..
N p=1
.
.
.
.
2
2
2
dy1
dy2
dym
...
dxn (p)
dxn (p)
dxn (p)
(69)
where N is the total number of samples in the data set. Summing along the rows
of the sensitivity matrix yields the cumulative sensitivity of each input variable over
all the outputs. The input variables corresponding to the largest cumulative average
squared gradient values can be retained as being the most signicant.
What is
unique about using sensitivity analysis for input space reduction is that it prunes
input variables based on both the input and output data. Most other methods rely
primarily on the structure of the input data [62].
4.5
The results section of this chapter can be divided into the following sections. The
neural network conguration and the results of the sensitivity analysis are presented
rst, as these provide the foundation for the selection of generalized Gaussian feature
congurations used in the measure.
graphically how the features (both the generalized Gaussian and the statistically
based varieties) relate to FLIR image complexity.
FLIR complexity measure are presented for the testing FLIR image database.
4.5.1
The previously described sensitivity analysis method of performing input feature reduction was employed to establish which generalized Gaussian model congurations
should be chosen and used in the nal FLIR clutter complexity system. To do this,
a representative sample of 78 FLIR images was selected from the public database and
96
Table 8: The twelve generalized Gaussian model ABS congurations tested to determine the best clutter complexity features.
Conguration
1
2
3
4
5
6
Input
Image
Negative
Negative
Original
Original
Negative
Negative
7
8
Negative
Original
9
10
11
12
Original
Original
Original
Original
Search
Space
Sub-block
Sub-block
Full Image
Full Image
Full Image
Sub-block with
Zero Local Mean
Full Image
Sub-block with
Zero Local Mean
Sub-block
Sub-block
Sub-block
Sub-block
Analysis
Mode
Unconstrained
Constrained
Unconstrained
Constrained
Unconstrained
Constrained
GGABS
Output
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Constrained
Constrained
Sum of Gaussians
Sum of Gaussians
Constrained
Constrained
Unconstrained
Unconstrained
Sum of Gaussians
Avg. of Gaussians
Avg. of Gaussians
Sum of Gaussians
The outputs consisted of either the sum or the average number (used in the
(70)
(71)
This normalization function maps all the data points in a set that lie within one
standard deviation of the mean of that set to the nearly linear region of the sigmoid
function, while the rest of the data points are compressed into the tails of the sigmoid
function. The sigmoid normalization technique was chosen, because it can accommodate outlier data points without sacricing (or compressing) the dynamic range of
the most commonly occurring data points [62].
The neural network was congured with 6 binary outputs (one for each level of
complexity).
which ranged from zero to ve, were used as the desired outputs and mapped to each
output as a binary true or false value. As a result, only one of the six output values
would be true for a given image and the associated generalized Gaussian model values
(which served as inputs to the neural network). The network was congured with a
single hidden layer containing 40 nodes and trained using a standard back propagation
algorithm. When an RMS error value of 0.1 or less was attained, the neural network
training was halted. The gradient model was then generated for this neural network,
and the resulting sensitivity matrix is shown in table 9. The last column is the sum
of the rows of the sensitivity matrix, and it indicates the cumulative sensitivity of all
the outputs to each input. The ve generalized Gaussian model based congurations
with the highest sensitivity values (which in this case are congurations 1-5 from
98
0.5
0.5
x = 500
= 100
x
1
0
200
400
600
800
1000
Figure 35: A sample sigmoid normalization function that maps all input values to
an output range of -1 to 1.
99
Table 9: The sensitivity matrix resulting from the neural network with the twelve
generalized Gaussian ABS model congurations as inputs, and the corresponding
benchmark complexity scores as outputs. The cumulative neural network sensitivity
scores for each conguration is given in the last column. The top ve congurations
were chosen as features for the overall clutter complexity measure.
Conguration
1
2
3
4
5
6
7
8
9
10
11
12
1
0.13709
0.03145
0.31755
0.06829
0.17015
0.07157
0.07258
0.04603
0.00866
0.00809
0.03598
0.03251
Sensitivity Outputs
2
3
4
5
0.24548 0.13763 0.11622 0.09555
0.08087 0.14908 0.24398 0.18487
0.10985 0.12666 0.06617 0.13629
0.17466 0.06678 0.13060 0.11310
0.05310 0.06445 0.08704 0.10553
0.09967 0.08909 0.05638 0.06339
0.01441 0.05828 0.06662 0.02143
0.06700 0.06657 0.05995 0.10599
0.02057 0.07306 0.06457 0.05681
0.02385 0.07484 0.05648 0.04888
0.05400 0.04365 0.02602 0.03586
0.05649 0.04987 0.02590 0.03224
6
0.18567
0.17007
0.07030
0.05119
0.08414
0.09666
0.14828
0.02653
0.05978
0.05472
0.02754
0.02504
Row
Sum
0.91766
0.86035
0.82685
0.60464
0.56443
0.47679
0.38164
0.37208
0.28347
0.26689
0.22308
0.22207
table 8) were chosen as the best inputs to the overall clutter complexity system.
4.5.2
The overall clutter complexity measure combines the top ve generalized Gaussian
ABS based features with the eight statistically based features listed in table 7 as a sum
of weighted values as given by equation 68. Two sets of example images are presented
to illustrate how the features, both the generalized Gaussian and the statistically
based varieties, contribute to the FLIR image clutter complexity measure. Figure 36
shows an original FLIR image, obtained from the Computer Vision Group at Colorado
State University [19], as well as each of the ve generalized Gaussian density prole
images. The two block processing based density prole images have been generated
by performing the generalized Gaussian ABS process over sixty-four 32 32 pixel
non-overlapping blocks, where the brighter blocks indicate a higher concentration of
Gaussians. The other three density prole images consist of the actual generalized
100
mentation of the clutter complexity measure the statistically based features are not
calculated on a block-by-block basis but rather over the entire image. Overall, the
brighter blocks or regions in both the generalized Gaussian model based and statistically based feature images correlate well with regions of higher complexity in the
image.
4.5.3
The clutter complexity measure was evaluated by Kaplan et al. [60] over a two-band
FLIR database that contained fty images from both the medium-wave and long-wave
bands.
The FLIR images contained the HMMWV, M113, M2, M35, M60A3, and
for each target type, a correlation lter was applied to each image using a image
chip of the corresponding target. Each target chip was cut from one image in both
wave bands and contains a broadside view of the target.
Correlation ltering is a
simple and yet reasonable method for estimating the performance of an ATR given a
particular target.
Once the ground truth false alarm count was established for all the targets, the
overall clutter complexity measure was generated based on all thirteen input features
using the procedure described in section 4.3. Figure 38 shows some medium-wave
FLIR images with varying degrees of clutter complexity and the corresponding false
alarm counts and clutter complexity scores for the M35 target chip.
The plot in
gure 38 shows the signicant correlation between the false alarm rates and the
101
(a)
(b)
(c)
(d)
(e)
(f)
Figure 36: Examples of Gaussian density prole images. (a) Original FLIR image, (b) Negative image block processed using unconstrained analysis, (c) Negative
image block processed using constrained analysis, (d) Original image processed using
unconstrained analysis, (e) Original image processed using constrained analysis, (f)
Negative image processed using unconstrained analysis.
102
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 37: Example of images from [60] used to compute statistically based complexity features: (a) Original FLIR image, (b) Standard deviation, (c) Schmieder
Weathersby, (d) FBm Hurst, (e) Target interference ratio, (f) Energy, (g) Entropy,
(h) Homogeneity, (i) Outlier.
103
clutter complexity scores for all fty images in the medium-wave database.
All of this was encapsulated in a highly congurable clutter complexity software
package that allows a user to select:
any subset of the thirteen input features (statistical or generalized Gaussian),
the threshold value for the correlation lter that establishes the ground truth
false alarm count,
the target type/chip for the correlation lter,
and between the long-wave or medium-wave FLIR databases.
In addition to the chosen congurations, the main screen displays a scatter plot of the
false alarms vs. the clutter complexity, the FLIR chip for the chosen target, and each
of the FLIR images showing the locations of the false alarms. The program cycles
through all 50 images in the chosen wave band and displays results for each image as
well as cumulative results. A sample screen shot is shown in gure 39. This software
interface package was written by researchers at Clark Atlanta University (CAU) and
runs in Matlab 5.0 or higher.
One benchmark of performance is the correlation between the clutter complexity
scores the number of ATR false alarms over a given set of images.
Of particular
interest are the results of the measure when using each class of input features (GGABS
and Statistical) independently and then combined. Figure 40 shows the false alarm vs
clutter complexity data for all three cases along with the correlation coecients for the
M35 target and the medium-wave FLIR data set. While combining all thirteen input
features (GGABS and Statistical) provides the best overall correlation (r = 0.879),
the GGABS input features perform slightly better (with a correlation coecient of
r = 0.802) than do the statistically based input features (with a correlation coecient
of r = 0.767) when used as the sole inputs to the measure.
104
Low Complexity
Medium Complexity
High Complexity
False Alarms: 4
Clutter Complexity: 1.0309
(a)
False Alarms: 11
Clutter Complexity: 2.273
(b)
False Alarms: 38
Clutter Complexity: 3.6028
(c)
40
False Alarms
30
20
10
0
1
1
2
Clutter Complexity Measure
(d)
Figure 38: Sample clutter complexity results over the medium-wave image data in
a dual-band FLIR database. (a) Low complexity image, (b) Medium complexity
image, (c) High complexity image, (d) Plot showing the correlation between false
alarm counts for the M35 target and the corresponding clutter complexity scores, (e)
Broadside view of M35 target in both visible and medium-wave bands.
105
Figure 39: Screen shot of the CAU clutter complexity software interface package that
shows the current conguration (input features, false alarm threshold, and database),
the false alarm vs. clutter complexity scatter plot, the target chip, and a sample
image with false alarm locations identied.
106
40
r = 0.879
35
30
r = 0.802
r = 0.767
25
All Features
r = 0.879
GG ABS Features
r = 0.802
Statistical Features
r = 0.767
20
15
10
5
4
10
12
14
16
Figure 40: The resulting clutter complexity measure scatter plots of false alarms
vs complexity scores and associated correlation coecients when using all thirteen
input features (both generalized Gaussian and statistical), using just the generalized
Gaussian input features, and using just the statistical features for an M35 target and
the medium-wave data set.
107
The clutter complexity measure was further tested by Kaplan et al. using all
thirteen features over ten partitions of the COMANCHE FLIR image database [60].
Each partition contains high contrast imagery of a single and unique target type
with an aspect angle of between 80 and 100 at a range of 2 kilometers.
Three
dierent sites served as the image data collection locations: (1) Grayling, Michigan,
(2) Hunter Liggett, California and (3) Yuma, Arizona. The background scenes at all
three locations were very similar, and the camera was stationary. As a result, the
main contributors to variations in the image scene data between the three data sets
were the diverse meteorological conditions. These provided the necessary variations
in the image data that enabled the evaluation of the clutter complexity model. With
the exception of the climate, the other conditions and sensor congurations were
largely constant across all of the partitions. As a result any dierences in the ATR
performance between partitions could be considered primarily due to the dierent
target types. Within partitions the number of pixels on target were approximately
the same in all the images, leaving only the clutter as a variable that could aect
ATR performance.
The false alarm counts for the LVQ ATR were computed across all of the partitions
in the database. The clutter complexity model was then trained and tested over all
combinations of the database partitions and correlated with the LVQ ATR rates.
Table 10 shows a confusion matrix of correlation coecients between the ATR false
alarm rates and clutter complexity scores, where the indices refer to the partitions
used to train and test the clutter complexity model.
between the ATR bounds and the clutter complexity scores as evidenced by the
relatively strong correlation coecients along the diagonal.
true when the clutter complexity model is trained and tested over the same partition.
This indicates that the clutter complexity measure is target dependent.
108
109
Test
BMP2
2S1
M1
M113
M3
M60
T72
ZSU23
M35
HMMWV
BMP2
2S1
M1
M113
0.9726 0.7856 0.0027 -0.0680
0.6401 0.9689 -0.0094 -0.4684
0.0270 -0.1913 0.7655 0.0879
0.3568 0.6082 -0.2818 0.7735
0.1343 0.4803 0.4188 -0.1672
-0.2023 0.6822 -0.5592 -0.6425
0.2862 0.2987 -0.2817 -0.4819
0.1643 0.0358 -0.2117 0.0662
0.0501 -0.3569 0.0156 0.4095
0.5468 -0.5429 0.5103 0.5305
Train
M3
M60
T72
ZSU23
M35
HMMWV
0.4438 0.3059 -0.4301 0.0196 -0.3836
-0.1208
0.0906 0.4904 0.2940 0.2386 -0.4954
-0.0236
-0.0079 0.0481 0.0597 0.0125 -0.0861
-0.0335
0.2644 0.5359 0.0559 0.2136 -0.1781
0.1439
0.9009 0.4011 -0.1718 0.0353 -0.4561
-0.2699
-0.5729 0.8764 0.5362 0.2636 0.1001
0.1301
-0.2087 0.2115 0.8279 -0.1151 0.0113
0.2834
0.1097 -0.2519 -0.0561 0.6227 0.0850
0.1175
0.3645 -0.3743 -0.4141 -0.3498 0.9793
0.7650
0.5216 -0.5376 0.5848 0.0582 -0.2474
0.8057
Table 10: Confusion matrix of correlation coecients between the clutter complexity scores and LVQ ATR false alarm rates
for ten partitions of the COMANCHE FLIR image database when the clutter complexity measure is trained over only one of
the partitions [59].
4.6
Clutter
complexity scores were derived for 10 dierent target types (over ten partitions of the
COMANCHE FLIR database) and shown to correlate well with the LVQ ATR results
for the image same data. In this sense the measure could be used as a predictor of
how well an ATR might perform on a given image when looking for a particular target
within clutter.
The generalized Gaussian ABS features were signicant contributors to the measure as evidenced by the improvement in performance when combined with the statistically based features. When using the generalized Gaussian features exclusively
as inputs, the measure performed slightly better than when using only the statistical
features. However, the best correlation between the clutter complexity score and the
ATR bounds (or false alarm counts) was achieved when using all thirteen features
together.
The ability to tune the Gaussian functions to model the expected target shapes and
sizes provided a signicant amount of exibility for extracting features in the image
that were target-like and thus contribute to the scene clutter. The ABS process provided a framework for iteratively extracting the target like image features/Gaussians
in a straight forward manner. These key aspects of the generalized Gaussian ABS
model make it ideally suited for extracting image content with generally known characteristics from the scene.
Unlike the SAR application, in the clutter complexity measure the generalized
110
Gaussian ABS model was not used to modify the image content and/or reconstruct
the image scene.
The
ability to tune the generalized Gaussian functions to model specic image features
was shown to be very valuable in the FLIR clutter complexity application, and it
could also be applied to other applications with similar requirements.
111
CHAPTER V
The goal of this research has been to explore the use of multidimensional signal
representations for use in image processing based feature extraction, modeling, and
enhancement.
5.1
Summary of Contributions
While a
An e-
cient peak-picking approach has been developed to perform the Gaussian parameter
extraction during the image analysis process.
the ability to perform independent parameter modication in both the analysis and
synthesis stages of the decomposition. The strong localization properties of the generalized Gaussian combined with the exibility of the ABS decomposition technique
form the basis of the new model and approaches in image analysis and enhancement.
112
resolution and reducing the obtrusiveness of the speckle while preserving the edges
and the denition of the image features.
The eectiveness of the resolution enhancement in the SAR case was assessed by
formal stimulus-comparison categorical judgement scale subjective testing as dened
by CCIR Recommendation 500-4 and by automatic target recognition (ATR) performance evaluations. In both cases the SAR images were resolution enhanced using
the generalized Gaussian ABS and then evaluated. The results of these assessments
show gains in both the subjective quality of the SAR clutter images and the objective
ATR performance score, which is one of the highest reported to date using the 22.5
degree MSTAR target chip database.
A consequence of its inherent exibility, the generalized Gaussian ABS model can
also be applied to image processing applications associated with non-coherent imaging
modalities.
Clutter in
the infrared imaging context can have broad connotations, but in this case the term
is used to indicate objects or artifacts in an image that obfuscate the detection of
targets within the scene. The goal of the clutter complexity measure is to serve as
a relative indicator of the amount of clutter in the image scene and of the level of
performance that can be expected when processing the FLIR image with an ATR
system.
113
The generalized Gaussian ABS model is used to identify clutter related features
within the FLIR image that indicate complexity. The ability to tune the Gaussian
functions to the expected target shapes and sizes and iteratively extract them using
the ABS framework makes this model well suited to this application.
Five gener-
alized Gaussian features, along with eight other statistically based features, serve as
the inputs to a system that generates a complexity score that correlates well with
ATR performance bounds. The contribution of the generalized Gaussian based features to the measure is shown to be signicant. As a group, they outperformed the
statistically based features in terms of ATR false alarm correlation when tested on
the COMANCHE FLIR image database. The overall best performance is achieved
when all thirteen features are included in the measure.
plexity application illustrates that the ability to tune the generalized Gaussian ABS
model can also be an eective tool for extracting image content with generally known
characteristics from the scene.
5.2
Future Work
We have successfully explored the SAR and FLIR imaging applications of the generalized Gaussian ABS model. However, there are many other areas where development
and use of the model could be investigated in future work.
area is in the coherent imaging eld of ultrasound.
ultrasound application has been conducted, and a summary of the work is included
below.
5.2.1
Brightness mode (B-mode) ultrasound images also suer from speckle noise degradation that is implicit to coherent imaging modalities.
As an impediment to high
image quality, speckle can contribute to poor edge and feature denition, which in
turn leads to lower perceived resolution. The nature of ultrasound speckle is unique
114
in that the speckle texture appears to be an artifact of both the imaging modality
and the microstructure of the tissue being imaged [89], [94].
The resolution of an
ultrasound image is directly tied to the frequency of the incident signal and the depth
of tissue penetration. The ultimate objective of any speckle reduction process is to
allow a user to accurately resolve small objects in the scene.
Similar to the approach for SAR images, the objective of this enhancement is not
to uncover any hidden structure or features in the ultrasound image, but rather to
provide the sonographer with another tool for improving the visual presentation of
the tissue being imaged.
115
the speckle texture, the Gaussian waveforms are biased to have sizes approximately
equal to or less than the size of the speckle grains.
The speckle granularity in ultrasound is unique in that it appears to have an
orientation and to grow in size as the penetration depth of the ultrasound increases.
Edges in the image associated with tissue boundaries also become somewhat more
important in the representation.
However, in the
ultrasound case the extracted Gaussians are somewhat larger than those associated
with the SAR speckle due to the larger granularity of the ultrasound speckle. The
general notions of the analysis functions dictating the synthesis packing densities, and
the synthesis functions controlling the speckle granularity hold true for the ultrasound
application as they did in the SAR case.
Preliminary results were generated by processing an ultrasound image in the decoupled mode using the separable form of the generalized Gaussian waveform and a
density based stop condition.
Gaussians used in the reconstruction reached 75 percent of the total number of pixels
in the image.
section 2.3.3 was used to ensure that dark and low contrast regions of the image were
accurately represented. The analysis and synthesis lobe width and shape parameter
modications are presented in table 11 in terms of the multiplication factors (mf)
applied to the original extracted Gaussian parameters.
that generated the highest subjective quality reconstructions were obtained experimentally by generating samples distributed throughout the parameter space. Figure
41 shows an original unprocessed image, and an example of a processed result is given
in gure 42.
Observation of gure 42 indicates that there is some perceptual improvement in
spatial extent of the speckle in the processed image over the original given in gure
116
117
Figure 42: Generalized Gaussian ABS processed ultrasound image with ner and
higher density granularity.
118
Table 11: Multiplication factors used to modify the analysis and synthesis Gaussians
for the ultrasound application.
Figure
42
41.
Analysis
mf mf
0.6 0.4
Synthesis
mf mf
0.4
0.2
generalized Gaussian waveforms while still preserving the basic structure and features associated with the original image. While this preliminary study demonstrates
the potential of the generalized Gaussian ABS model to perform speckle resolution
enhancement in the ultrasound application, there are several areas that could be
examined in future work. These might include the following.
Investigate the use of a preprocessor to accommodate low frequency variations
within the image. This is used in the SAR imaging application and could be
used eectively in ultrasound to improve the representation of the low frequency
structure in the reconstructed image.
Explore a more rigorous and better dened process for establishing appropriate
Gaussian modication parameters. This would lead to a better understanding
of the impact they have on the perception of the speckle and the content in
ultrasound images.
Examine the use of the full circularly smooth 2D generalized Gaussian formulation to accommodate the more oblong nature of ultrasound speckle. While
this will increase the computational complexity of the process, it may improve
the models ability to manipulate the larger and more oblong artifacts in the
ultrasound image scene.
Develop a comprehensive set of tests to evaluate the eectiveness of the approach.
method similar to the one described in section 3.5.1 that was used in the subjective analysis of the SAR data. Regardless of the approach, the goal would
be to evaluate the eectiveness of the generalized Gaussian based resolution
enhancement process when applied to ultrasound images
120
REFERENCES
[1] Radar imagery, Education Poster Series 6, Jet Propulsion Library, California
Institute of Technology, Pasedena, California 91109, 1998.
[2] 500-4, C. R., Method for the subjective assessment of the quality of television
pictures, in Recommendations and Reports of the CCIR International Radio
Consultative Committee, vol. 11, (Dusseldorf, Germany), pp. 4761, CCIR, 1990.
[3] Antonini, M., Barlaud, M., Mathieu, P., and Daubechies, I., Image coding using wavelet transform, IEEE Trans. on Image Processing, vol. 1,
pp. 205220, April 1992.
[4] Army Research Laboratory, Broad agency announcement for fall 1995,
tech. rep., Army Research Laboratory, June 1994.
[5] Army/Defense Advanced Research Projects Agency, Future combat
systems. http://www.darpa.mil/fcs/, July 2003.
[6] Atal, B. S. and Remde, J. R., A new model of LPC excitation for producing
natural-sounding speech at low bit rates, Proc. Int. Conf. on Acoustics, Speech,
and Signal Processing, pp. 614617, May 1982.
[7] Bastiaans, M. J., Gabors expansion of a signal inot gaussian elementary
signals, Proceedings of the IEEE, vol. 68, pp. 538539, April 1980.
[8] Bell, C. G., Fukisaki, H., Heinz, J. M., Stevens, K. N., and House,
A. S., Reduction of speech spectra by analysis-by-synthesis techniques, Journal of The Acoustical Society of America, vol. 33, pp. 17251736, December 1961.
[9] Bergeaud, F. and Mallat, S., Matching pursuit of images, in Proc. Int.
Conf. on Image Processing, pp. 5356, IEEE, 1995.
[10] Blackman, R. B. and Tukey, J. W., The Measurement of Power Spectra
from the Point of View of Communications Engineers. New York, NY, USA:
Dover, 1958.
[11] Bovik, A. C., Clark, M., and Geisler, W. S., Multichannel texture analysis using localized spatial lters, IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 12, pp. 5573, January 1990.
[12] Britton, D. F., Smith, M. J. T., and Mersereau, R. M., Generalized
gaussian decompositions for image analysis and synthesis, in World Multiconference on Systemics, Cybernetics and Informatorics, vol. 6, (Orlando, FL),
pp. 300305, July 2000.
121
[13] Buzo, A., Gray, A. H. J., Gray, R. M., and Markel, J. D., Speech
coding based upon vector quantization, IEEE Trans. on Acoustics, Speech, and
Signal Processsing, vol. ASSP-28, pp. 562574, October 1980.
[14] Carrara, W. G., Goodman, R. S., and Majewski, R. M., Spotlight Synthetic Aperture Radar: Signal Processing Algorithms. 685 Canton Street, Norwood, MA 01602: Artech House, 1995.
[15] Cassidy,
D.
C.,
Heisenberg.
http://www.aip.org/history/heisenberg/, November 1998.
tory of Physics of the American Institute of Physics.
Web
page:
Center for His-
[16] Cederquist, J., Dwan, C., Wegrzyn, J., and Rauss, P. J., Spatial spectral atr, Proceedings of the Third Annual ARL Federated Laboratory Advanced
Sensors Symposium, p. 331, Feb 1999.
[17] Chan, L. A. and Nasrabadi, N. M., An application of wavelet-based vector
quantization in target recognition, International Journal on Articial Intelligence Tools, vol. 6, pp. 165178, April 1997.
[18] Cohen, L., Time-Frequency Analysis. New Jersey: Prentice-Hall Inc., 1995.
[19] Computer Vision Group, Colorado State Univerisity, Fort Carson
RSTA Data Collection. http://www.cs.colostate.edu/vision/ft carson/, Nov
1993.
[20] Curlander, J. C. and McDonough, R. N., Synthetic Aperture Radar: Systems and Signal Processing. Wiley Series in Remote Sensing, New York, NY:
John Wiley & Sons, Inc., 1991.
[21] Daubechies, I., The wavelet transform, time-frequency localization and signal
analysis, IEEE Trans. on Information Theory, vol. 36, pp. 9611005, September
1990.
[22] Daubechies, I., Where do wavelets come from? - a personal point of view,
Proc. of the IEEE, vol. 84, pp. 510513, April 1996.
[23] Daugman, J. G., Two-dimensional spectral analysis of cortical receptive eld
proles, Vision Research, vol. 20, pp. 847856, 1980.
[24] Daugman, J. G., Complete discrete 2-d gabor transforms by neural networks
for image analysis and compression, IEEE Transactions on Acoustics, Speech,
and Signal Processing, vol. 36, pp. 11691179, July 1988.
[25] Defense
Advanced
Research
Projects
FCS
UGCV:
Unmanned
ground
combat
http://www.darpa.mil/tto/programs/fcs ugcv.html, July 2003.
122
Agency,
vehicle.
[26] DeGraaf, S. R., SAR imaging via modern 2-d spectral estimation methods,
IEEE Transactions on Image Processing, vol. 7, pp. 729761, May 1998.
[27] Deller, J. R. J., Proakis, J. G., and Hansen, J. H. L., Discrete-Time
Processing of Speech Signals. 866 Third Avenue, New York, New York 10022:
Macmillan Publishing Company, 1993.
[28] Dewaele, P., Wambacq, P., Osterlinck, A., Leuven, K. U., and Marchand, J. L., Comparison of some speckle reduction techniques for SAR images, Int. Geoscience and Remote Sensing Symposium, no. 10, pp. 24172422,
1990.
[29] Dudley, H., The vocoder, Bell Labs Record, vol. 17, pp. 122126, 1939.
[30] Elachi, C., Spaceborne Radar Remote Sensing: Applications and Techniques.
345 East 47th Street, New York, NY 10017-2394: IEEE Press, 1988.
[31] Engelbrecht, A. P., Sensitivity analysis for selective learning by feedforward
neural networks, Fundamenta Informaticae, vol. 45, no. 4, pp. 295328, 2001.
[32] et. al., C. H. C., Signal Processing Handbook. 270 Madison Avenue, New
York, New York 10016: Marcel Dekker, Inc., 1988.
[33] Feichtinger, H. G. and Strohmer, T., Gabor Analysis and Algorithms:
Theory and Applications. Boston: Birkhauser, 1998.
[34] Fitch, J. P., Synthetic Aperture Radar. 175 Fifth Ave, New York, NY 10010,
USA: Springer-Verlag, 1988.
[35] Flanagan, J. L., Parametric coding of speech signals, Journal fo the Acoustical Society of America, vol. 68, pp. 412419, August 1980.
[36] Flanagan, J. L. and Christensen, S. W., Computer studies on parametric
coding of speech spectra, Journal fo the Acoustical Society of America, vol. 68,
pp. 420430, August 1980.
[37] Flanagan, J. L. and Golden, R. M., Phase vocoder, The Bell System
Technical Journal, pp. 14931509, November 1966.
[38] Franceschetti, G. and Lanari, R., Synthetic Aperture Radar Processing.
Electronic Engineering Systems Series, Boca Raton, FL: CRC Press LLC, 1999.
[39] Gabor, D., Theory of communication, Journal of IEE (London), vol. 93,
pp. 429457, November 1946.
[40] Gagnon, L. and Jouan, A., Speckle ltering of SAR images - a comparative
study between complex-wavelet-based and standard lters, in Proceedings of
SPIE Wavelet Applications in Signal and Image Processing, vol. 3169, pp. 80
91, 1997.
123
[41] George, E. B., An Analysis-by-Synthesis Approach to Sinusoidal Modeling Applied to Speech and Music Signal Processing. Ph.d. dissertation, Georgia Institute
of Technology, Atlanta, GA, 1991.
[42] George, E. B. and Smith, M. J. T., A new speech coding model based
on a least-squares sinusoidal representation, in Proc. Int. Conf. on Acoustics,
Speech, and Signal Processing, pp. 16411644, IEEE, 1987.
[43] George, E. B. and Smith, M. J. T., Analysisbysynthesis/overlap-add
sinusoidal modeling applied to the analysis and synthesis of musical tones, J.
Audio Engineering Society, vol. 40, pp. 497516, June 1992.
[44] George, E. B. and Smith, M. J. T., Speech analysis/synthesis and modication using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans.
on Speech and Audio Processing, vol. 5, pp. 389406, September 1997.
[45] Graps, A., An introduction to wavelets, IEEE Computational Science &
Engineering, pp. 5061, summer 1995.
[46] Guglielmi, V., Castine, F., and Piau, P., Super-resolution algorithms for
SAR applications, in Proc. SPIE Conf. Image Reconstruction and Restoration
II, vol. 3170, (San Diego, CA), pp. 195202, SPIE, July 1997.
[47] Guo, H., Odegard, J., Lang, M., Gopinath, R., Selesnick, I., and Burrus, C., Wavelet based speckle reduction with application to SAR based atd/r,
in IEEE International Conference on Image Processing, (Austin, TX), November
1994.
[48] Haar, A., Zur Theorie der Orthogonalen Funktionensysteme. Mathematische
annalen 69, pp 331-371, Georg-August-Universitat Gttingen, Germany, 1909.
[49] Halle, M. and Stevens, K. N., Analysis by synthesis, Proc. Semantic
Speech Compression, vol. 2, p. Paper D7, December 1959.
[50] Hedelin, P., A tone-oriented voice-excited vocoder, IEEE Int. Conf. on
Acoustics, Speech, Signal Processing, pp. 205208, 1981.
124
125
[66] Lee, J. S., Rened ltering of image noise using local statistics, Computer
Graphic and Image Processing, vol. 15, pp. 380389, 1980.
[67] Levinson, N., The wiener rms (root mean square) error criterion in lter
design and prediction, Journal of Mathematical Physics, vol. 25, pp. 261178,
1947.
[68] Li, B., Zheng, Q., Der, S., Chellapa, R., Nasrabadi, N. M., Chan,
L. A., and Wang, L. C., Experimental evaluation of neural, statistical and
model-based apporaches to FLIR ATR, Proceedings of the SPIE, vol. 3371,
pp. 388397, April 1998.
[69] Li, F. K., Croft, C., and Held, D. N., Comparison of several techniques to
obtain multiple-look sar imagery, IEEE Trans. Geoscience and Remote Sensing,
no. 21, p. 370, 1983.
[70] Li, J. and Stoica, P., An adaptive lter approach to spectral estimation and
SAR imaging, IEEE Transactions on Signal Processing, vol. 44, pp. 14691484,
June 1996.
[71] Lopes, A., Nezry, E., Touzi, R., and Laur, H., Structure detection and
statistical adaptive speckle ltering in sar images, International Journal of Remote Sensing, vol. 14, no. 9, pp. 17351758, 1993.
[72] Lopes, A., Touzi, R., and Nezry, E., Adaptive speckle lter and scene heterogeneity, IEEE Trans. on Geoscience and Remote Sensing, vol. 28, pp. 992
1000, November 1990.
[73] Macon, M. W. and Clements, M. A., Speech concatenation and synthesis
using an overlap-add sinusoidal model, in Proc. Int. Conf. on Acoustics, Speech,
and Signal Processing, (Atlanta, GA), pp. 361364, IEEE, 1996.
[74] Macon, M. W., Jensen-Link, L., Olivero, J., Clements, M. A., and
George, E. B., A singing voice synthesis system based on sinusoidal modeling, in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, (Munich,
Germany), pp. 435438, 1997.
[75] Mallat, S. G., A theory for multiresolution signal decomposition: The
wavelet representation, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, pp. 674693, July 1989.
[76] Mallat, S. G. and Zhang, Z., Matching pursuits with time-frequency dictionaries, IEEE Trans. on Signal Processing, vol. 41, pp. 33973415, December
1993.
[77] Marcelja, S., Mathematical descriptions of the responses of simple cortical
cells, Journal of the Optical Society of America, vol. 70, no. 11, pp. 12971300,
1980.
126
[78] Marques, J. S. and Almeida, L. B., A background for sinusoid based representation of voiced speech, in Proc. Int. Conf. on Acoustics, Speech, and Signal
Processing, pp. 12331236, 1986.
[79] Meitzler, T., Gerhart, G., and Singh, H., A relative clutter metric,
IEEE Transactions on Aerospace and Electronic Systems, vol. 34, pp. 968976,
July 1998.
[80] Oliver, C. and Quegan, S., Understanding Synthetic Aperture Radar Images.
Boston: Artech House Inc., 1998.
[81] Oliver, C. J., The interpertation and simulation of clutter textures in coherent images, Inverse Problems, vol. 2, pp. 481518, November 1986.
[82] Oppenheim, A. V. and Schafer, R. W., Discrete-Time Signal Processing.
Signal Processing Series, Englewood Clis, New Jersey 07632: Prentice Hall,
1st ed., 1989.
[83] Owirka, G. J., Weaver, A. L., and Novak, L. M., Performance of a multiresolution classier using enhanced resolution SAR data, in Proc. SPIE Conf.
Radar Sensor Technology II, vol. 3066, (Orlando, FL), SPIE, April 1997.
[84] Pinson, E. N., Pitch synchronous time domain estimation of formant frequencies and bandwidths, Journal of The Acoustical Society of America, vol. 35,
pp. 12641273, August 1963.
[85] Porat, M. and Zeevi, Y. Y., Pattern analysis and texture discrimination in
the gabor space, 9th International Conference on Pattern Recognition, vol. 2,
pp. 700702, November 1988.
[86] Qian, S. and Chen, D., Joint time-frequency analysis, IEEE Signal Processing Magazine, pp. 5267, March 1999.
[87] Rabiner, L. R. and Schafer, R. W., Digital Processing of Speech Signals.
Digital Signal Processing Series, Englewood Clis, New Jersey 07632: PrenticeHall, Inc., 1978.
[88] Shi, Z. and Fung, K. B., A comparison of digital speckle lters, in International Geoscience and Remote Sensing Symposium (IGARSS), vol. 4, pp. 2129
2133, August 1994.
[89] Shung, K. K., Smith, M. B., and Tsui, B. M. W., Principles of Medical
Imaging. San Diego, CA: Academic Press Inc., 1992.
[90] Smith, M. J. T. and Docef, A., A study Guide for Digital Image Processing.
Riverdale, Georgia 30274: Scientic Publishers, Inc., early reslease edition ed.,
1997.
127
[91] Smith, M. J. T. and Thomas P. III, Barnwell, A new lter bank theory for time-frequency representation, IEEE Trans. Acoustics, Speech, Signal
Processing, vol. 35, pp. 314327, June 1987.
[92] Strang, G. and Nguyen, T., Wavelets and Filter Banks. Box 812060, Wellesley, MA 02181, USA: Wellesley-Cambridge Press, 2nd ed., 1997.
[93] Valens,
C., A really freindly guide to wavelets. found at
http://perso.wanadoo.fr/polyvalens/clemens/wavelets/wavelets.html,
December 1999.
[94] Wagner, R. F., Insana, M. F., and Brown, D. G., Statistical properties
of radio-frequency and envelope-detected signals with appilication to medical
ultrasound, Journal of the Optical Society of America, vol. 4, pp. 910922, May
1987.
[95] Wexler, J. and Raz, S., Discrete gabor expansions, Signal Processing,
vol. 21, November 1990.
[96] Wiley, C. A., Synthetic aperture radars - a paradigm for technology evolution, IEEE Trans. Aerospace Electrical Systems (AES), no. 21, pp. 440443,
1985.
[97] Zhaoqiang, B., Li, J., and Liu, Z.-S., Super resolution SAR imaging via
parametric spectral estimation methods, IEEE Trans. on Aerospace and Electronic Systems, vol. 35, pp. 267281, January 1999.
128