Sie sind auf Seite 1von 143

GENERALIZED GAUSSIAN DECOMPOSITIONS FOR

IMAGE ANALYSIS AND SYNTHESIS

A Thesis
Presented to
The Academic Faculty
by
Douglas Frank Britton

In Partial Fulllment
of the Requirements for the Degree
Doctor of Philosophy in the
School of Electrical and Computer Engineering

Georgia Institute of Technology


December 2006
c 2006 by Douglas Frank Britton
Copyright 

GENERALIZED GAUSSIAN DECOMPOSITIONS FOR


IMAGE ANALYSIS AND SYNTHESIS

Approved by:
Dr. Russell Mersereau,
Committee Chair
School of Electrical and Computer
Engineering
Georgia Institute of Technology

Dr. Paul Benkeser


School of Biomedical Engineering
Georgia Institute of Technology

Dr. Mark J.T. Smith, Advisor


School of Electrical and Computer
Engineering
Purdue University

Dr. Bonnie Heck-Ferri


School of Electrical and Computer
Engineering
Georgia Institute of Technology

Dr. George Vachtsevanos


School of Electrical and Computer
Engineering
Georgia Institute of Technology

Dr. Sheldon Jeter


School of Mechanical Engineering
Georgia Institute of Technology
Date Approved: 14 November, 2006

Glory be to Almighty God who is the source of all knowledge, wisdom and
understanding.

Trust in the Lord with all your heart, and lean not on your own understanding, but in all your ways acknowledge Him, and He will make
straight your path. Proverbs 3:5-6

... the wisdom that comes from above leads us to be pure, friendly, gentle,
sensible, kind, helpful, genuine, and sincere. James 3:17

iii

ACKNOWLEDGEMENTS

Life is as much about the journey as it is the destination, and what makes the journey
interesting are the friendships and experiences that dot the path. So many people
have contributed in so many dierent ways to make my journey at Georgia Tech
what it has become today. Some have walked beside me the entire time, while others
cheered for a season while our paths ran parallel. Some have quietly been praying,
while others have joined my trip to form a caravan. Ive been picked up and carried
from time-to-time when I could not make it on my own, but throughout the trip I
am certain that I have never been alone. So I thank all of you who played a role,
both big and small, in making my journey more than just a trip but a real adventure.
And while my ships log may not list your name, you know the part you played, and
I want to say thank you from the bottom of my heart.
I would like to thank my advisors, Dr. Mark J. T. Smith and Dr. Russell
Mersereau, for their support and guidance throughout my tenure as a Ph.D student.
I have learned so much from them and consider it a privilege to have worked under
their guidance. I would also like to thank Dr. George Vachtsevanos, Dr. Paul
Benkeser, Dr. Bonnie Heck-Ferri, and Dr. Sheldon Jeter for serving on my
dissertation committee.
My parents, Skip and Joan Britton, have been a constant source of encouragement, support, and prayer throughout this entire process.

They instilled in me a

love of learning that has enabled me to achieve all that I have accomplished. Their
unconditional love and comfort has been called upon often, and they always freely
gave. But most of all, I want to thank them for leading me into a personal relationship with my Heavenly Father, and for teaching me His ways from a young age. I

iv

am forever grateful for the home and family that they provided, and I honor them
for being faithful to Gods calling. I want to thank all of my family who have been
diligent prayer partners and a constant source of encouragement and support.
A special debt of gratitude goes to Fran LaMattina, who came along side and
provided the extra spark it took for me to complete this thesis.

Her patience is

amazing, and without her steadfast coaching and advice, I would not have nished.
She is an incredible woman whom God has truly blessed with the gift of insight and
encouragement.
I am truly blessed to have friends such as these. One of my closest friends, Peter
Cardillo has been a part of this adventure from the beginning.

He is one of the

brightest and funniest people I know, and his encouragement and friendship has been
unwavering.

I have known Je and Jenelle Piepmeier since undergrad, and we

became very close friends during our time together at Georgia Tech.
consistently cheered me on and I am thankful for their friendship.

They have

I would never

have nished if my good friend Caroline Clower had not insisted on helping me
organize, plan, and schedule my research proposal. It was her condence in me that
enabled me to take that next step, and for that I am forever grateful.
GTRI has been a fantastic place to work, and I want to thank everyone in the
Food Processing Technology Division for all of their support throughout my Ph.D.
studies.

I want to say a special thank you to Wayne Daley, who took a chance

and supported me as a graduate student in the very beginning. Even though he was
ahead of me by a few years, our journeys have taken similar paths. He has shared a
lot with me, and I have learned a lot from him. But most of all, I am thankful for
his genuine friendship and the encouragement he has provided along the way.
To all of my extended family and friends, and to our church small groups who
have been great source of encouragement and faithful in praying I want to express
my deep appreciation.

A special thanks to all of the people I have met and worked with in CSIP throughout my time as a student: Cheol Park, Tami Randolph, Gerardo Jose Gonzales, Quoc Pham, Tim Brosnan, Sang Park, and Paul Hong.
Dr. Robert Bob Stephens planted this seed almost twenty years ago, and
while he is now with our Heavenly Father, I want to thank him for making a dierence
in at least one missionary kids life.
More than anyone else, I want to thank my wonderful wife Susan for patiently
standing beside me these past several years. It has not been easy, and yet she has
faithfully supported me in this endeavor.

Her unwavering condence in me often

provided the motivation I needed to continue, even when I was ready to quit. She
has been the bedrock of our family and a wonderful mother to our precious gift from
God, Kari Rose.

How blessed I am to have such a godly, loving, caring, and

supportive partner in life, and she is my best friend too.


And so looking back I have come to realize that what really matters most is
learned not from books and lectures but from the journey itself. And it is through
my journey that I have learned to love, laugh, cry, struggle, trust, and succeed. But
most importantly, I have grown in my relationship with my Heavenly Father, who
has been with me all along. He has taught me to take the long view in life and to
trust that in the end, His ways are better than mine.

vi

TABLE OF CONTENTS
DEDICATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iii

ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . .

iv

LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xi

SUMMARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xv

II

INTRODUCTION AND BACKGROUND . . . . . . . . . . . . . . . . .

1.1 Short-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . .

1.2 Gabor & Heisenberg . . . . . . . . . . . . . . . . . . . . . . . . . .

1.3 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

11

1.4 Analysis-by-Synthesis . . . . . . . . . . . . . . . . . . . . . . . . .

14

1.5 Objective Statement . . . . . . . . . . . . . . . . . . . . . . . . . .

18

THE GENERALIZED GAUSSIAN MODEL . . . . . . . . . . . . . . . .

20

2.1 Analysis-by-Synthesis Decomposition . . . . . . . . . . . . . . . . .

22

2.2 2D Generalized Gaussians . . . . . . . . . . . . . . . . . . . . . . .

24

2.3 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

28

2.3.1

Coupled and Decoupled Modes . . . . . . . . . . . . . . . .

30

2.3.2

Gaussian Parameter Estimation . . . . . . . . . . . . . . . .

31

2.3.3

ABS Stop Considerations . . . . . . . . . . . . . . . . . . .

32

2.4 Applications
III

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

33

SAR RESOLUTION ENHANCEMENT . . . . . . . . . . . . . . . . . .

34

3.1 Introduction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

3.1.1

SAR Imaging Geometries . . . . . . . . . . . . . . . . . . .

35

3.1.2

SAR: Surface Properties and Scattering . . . . . . . . . . .

40

3.1.3

SAR Imaging Components . . . . . . . . . . . . . . . . . . .

43

3.1.4

SAR Speckle Noise . . . . . . . . . . . . . . . . . . . . . . .

44

vii

3.2 Traditional Speckle Reduction and Resolution Enhancement Methods 46

IV

3.2.1

Multilook Averaging Speckle Reduction . . . . . . . . . . .

47

3.2.2

Lee and Kuan Filters . . . . . . . . . . . . . . . . . . . . . .

48

3.2.3

Model Based Image Filtering . . . . . . . . . . . . . . . . .

51

3.2.4

Spectral Estimation and Superresolution Methods . . . . . .

56

3.3 Proposed Approach for SAR Imagery . . . . . . . . . . . . . . . . .

61

3.4 Generalized Gaussian Model Considerations . . . . . . . . . . . . .

64

3.4.1

Parameter Extraction . . . . . . . . . . . . . . . . . . . . .

64

3.4.2

Parameter Modication . . . . . . . . . . . . . . . . . . . .

66

3.5 SAR Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

67

3.5.1

Subjective Validation

. . . . . . . . . . . . . . . . . . . . .

67

3.5.2

MSTAR and ATR Results . . . . . . . . . . . . . . . . . . .

72

3.5.3

Comparisons with Traditional Despeckling Methods . . . . .

72

3.6 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . .

77

A CLUTTER COMPLEXITY MEASURE FOR ATR CHARACTERIZATION OF FLIR IMAGES . . . . . . . . . . . . . . . . . . . . . . . . . . 80


4.1 Introduction

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

80

4.2 The Nature of Clutter . . . . . . . . . . . . . . . . . . . . . . . . .

81

4.3 The Clutter Complexity Measure . . . . . . . . . . . . . . . . . . .

83

4.3.1

Image Databases and Partitions . . . . . . . . . . . . . . . .

84

4.3.2

Clutter Complexity Ground Truth . . . . . . . . . . . . . .

85

4.3.3

The Complexity Measure . . . . . . . . . . . . . . . . . . .

86

4.3.4

Validation and Renement of the Measure . . . . . . . . . .

87

4.4 Generalized Gaussian Based Features . . . . . . . . . . . . . . . . .

88

4.4.1

Constrained and Unconstrained Gaussian Analysis Modes .

88

4.4.2

Global and Block Processing Modes . . . . . . . . . . . . .

89

4.4.3

Original and Negative Input Image Modes . . . . . . . . . .

91

4.4.4

Sensitivity Analysis and Neural Networks . . . . . . . . . .

93

4.5 FLIR Clutter Complexity Results . . . . . . . . . . . . . . . . . . .

96

viii

4.5.1

Neural Network Conguration and Sensitivity Analysis Results 96

4.5.2

Complexity Feature Images . . . . . . . . . . . . . . . . . . 100

4.5.3

Complexity Measure Results . . . . . . . . . . . . . . . . . 101

4.6 Discussion and Future Work . . . . . . . . . . . . . . . . . . . . . . 110


V

CONTRIBUTIONS AND FUTURE WORK . . . . . . . . . . . . . . . . 112


5.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . 112
5.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.2.1

The Ultrasound Application . . . . . . . . . . . . . . . . . . 114

REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

ix

LIST OF TABLES
1

Subjective analysis mean opinion score for the top ve images. . . . .

Multiplication factors used to modify the analysis and synthesis Gaussians that resulted in the best performace for the SAR ATR application. 72

Classication results for the 22.5 degree MSTAR target chip data classied by the ATR process with and without resolution enhancement.

73

Resulting parameter and ENL values for the GGABS model when optimizing the ENL for each parameter independently. . . . . . . . . . .

77

Resulting parameter and ENL values for the GGABS model when using
the values from table 4 as the initial starting points for the optimization
over ENL. The rst row gives the baseline results, while the last row
shows the eect of the synthesis multiplier on the ENL value. . . . . .

78

The ENL results of processing the image region from gure 29 using
the dierent traditional despeckling lters with various window sizes.

78

List of statistical based image processing features used in the FLIR


clutter complexity measure. . . . . . . . . . . . . . . . . . . . . . . .

86

The twelve generalized Gaussian model ABS congurations tested to


determine the best clutter complexity features. . . . . . . . . . . . . .

97

4
5

6
7
8

68

The sensitivity matrix resulting from the neural network with the
twelve generalized Gaussian ABS model congurations as inputs, and
the corresponding benchmark complexity scores as outputs. The cumulative neural network sensitivity scores for each conguration is
given in the last column. The top ve congurations were chosen
as features for the overall clutter complexity measure. . . . . . . . . . 100

10

Confusion matrix of correlation coecients between the clutter complexity scores and LVQ ATR false alarm rates for ten partitions of
the COMANCHE FLIR image database when the clutter complexity
measure is trained over only one of the partitions [59]. . . . . . . . . . 109

11

Multiplication factors used to modify the analysis and synthesis Gaussians for the ultrasound application. . . . . . . . . . . . . . . . . . . . 119

LIST OF FIGURES
1

4
5
6

7
8
9

10
11
12
13

(a) Linear chirp signal x[n] = sin(2 1.9 105 )n2 shown with a
Hamming window N = 400 (b) Magnitude of the discrete short-time
Fourier transform using the Hamming window shown in (a). . . . . .

(a) Rectangular window N = 25, (b)Fourier transform of rectangular


window N = 25, (c) Rectangular window N = 50, (d)Fourier transform
of rectangular window N = 50, (e) Hamming window N = 25, (f)
Fourier transform of Hamming window N = 25, (g) Hamming window
N = 50, (h) Fourier transform of Hamming window N = 50. . . . . .

(a) Time-domain magnitude, |(t)|, of the elementary signal centered


at t0 with a width of t, (b) Frequency domain magnitude, |(f )|, of
the Fourier transform of the elementary signal centered at f0 with a
width of f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Joint time-frequency domain expansion of a signal into elementary Gabor functions with associated coecients, cnk . . . . . . . . . . . . . .

Gabor Filter (a) impulse response and (b) frequency response with
uo = 0.9, n1 = 5, and n2 = 10. . . . . . . . . . . . . . . . . . . . . .

11

2D Symmlet S8 wavelets with location indices n = 2 and scale indices


(a)m = 1, (c)m = 3, and (e)m = 5. The corresponding frequency
domain responses of each wavelet are shown in (b), (d), and (f). . . .

13

Block diagram of the analysis-by-synthesis procedure for multi-pulse


excitation of a low bit rate LPC speech coder [6]. . . . . . . . . . . .

15

Block diagram of the analysis-by-synthesis procedure for a 2 stage


residual vector quantization coder [58]. . . . . . . . . . . . . . . . . .

17

Plot of 1D Generalized Gaussian functions with shape parameters =


1, 2, 3. The case of = 2 is the standard Gaussian lobe width and
shape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

25

A 2D generalized Gaussian function with a contour plot detailing the


center location, lobe widths, and rotation angle. . . . . . . . . . . . .

26

2D Generalized Gaussians with lobe widths 1 = 2 = 7 and shape


parameters (a) = 2, (b) = 4, and (c) = 8. . . . . . . . . . . . .

27

Separable 2D generalized Gaussians with lobe widths 1 = 2 = 7 and


shape parameters (a) = 2, (b) = 4, and (c) = 8. . . . . . . . . .

28

Block diagram of the generalized Gaussian analysis-by-synthesis decomposition and synthesis technique. . . . . . . . . . . . . . . . . . .

30

xi

14

Geometry of a side looking imaging radar. . . . . . . . . . . . . . . .

35

15

(a) The range geometry of a side looking imaging radar with points A
and B shown with a separation of Rg along the ground and Rs along
the slant. (b) The timing of illumination pulses and scattering returns
required to eliminate overlap between the two along range points A
and B. This species the slant resolution of the SAR. . . . . . . . . .

37

(a) The azimuth geometry of a Real Aperture Radar (RAR). If the


points A and B both fall within the azimuth swath width, Sa , of the
radar beam they will not be resolved. (b) The timing of illumination
pulses and scattering returns required to eliminate overlap between the
two along range points A and B. This species the slant resolution of
the SAR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39

17

Illustration of how various surfaces can aect a radar return signal. .

41

18

A block diagram of the fundamental components of a SAR imaging


system [14]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

The block diagram of a traditional Fourier transform based SAR image


formation process [38]. . . . . . . . . . . . . . . . . . . . . . . . . . .

45

Multilook processing reduces the eects of speckle noise in a SAR image


by exploiting the ability of SAR systems to collect higher resolution
data in the azimuth direction than in the slant range direction. The
higher resolution data is averaged along the azimuth to produce square
pixels with reduced speckle. . . . . . . . . . . . . . . . . . . . . . . .

48

Weighting functions for the Lee, Kuan, and Modied Lee speckle lters
using Cu = 0.450 and Cmax = 0.705. . . . . . . . . . . . . . . . . . . .

51

(a) The original SAR chip image HB03333 of a BMP2 from the MSTAR
database. (b) Speckle reduction illustration using the Lee lter. (c)
Speckle reduction illustration using the Kuan lter. (d) Speckle reduction illustration using the Modied Lee lter and a 5 5 window. The
top left 32 32 pixel region of the original image was used to establish
Cu = 0.450 and Cmax = 0.705 for the modied Lee lter. . . . . . . .

52

(a) Subregion of the original SAR clutter image HB06211 from the
MSTAR database. (b) The speckle reduction results using the Gaussian MAP lter with a 7 7 window. (c) The speckle reduction results
using the Gamma MAP lter with a 7 7 window. The bottom
right 50 50 pixel region of the original image was used to establish a
Cu = 0.496, while Cmax = 0.558 for Gamma MAP lter. . . . . . . . .

55

24

Block diagram of the SAR resolution enhancement algorithm. . . . .

63

25

Original SAR clutter image of an urban scene. . . . . . . . . . . . . .

69

16

19
20

21
22

23

xii

26

Resolution enhanced SAR clutter image with modication factors of


0.8, 0.4, 0.6, and 0.4 for the analysis lobe width, synthesis lobe width,
analysis shape, and synthesis shape parameters respectively. . . . . .

70

27

SAR clutter image processed using the Lee lter with a 5 5 window.

71

28

The HB03424.0015 SAR chip image from the MSTAR database of a


T72 tank shown in (a) the original resolution (1m) with an ENL =
2.5996 and (b) a low resolution (2m) version with an ENL = 7.4372.
This illustrates the ENL tendency to measure image smoothness and
not resolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

75

(a) Original SAR clutter image and (b) the region of this image used
to evaluate the ENL for the various traditional despeckling lters and
the GGABS model. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

76

Block diagram of the general approach for generating a FLIR clutter


complexity measure that correlates with ATR performance bounds. .

84

(a) The original FLIR image, (b) the Gaussian density prole image
with the analysis constrained to nd target sized objects, and (c) the
Gaussian density prole image with the analysis unconstrained. . . .

90

(a) The original FLIR image, (b) the Gaussian density prole image
with the analysis performed globally, and (c) the Gaussian density
prole image with the analysis performed on independent sub-blocks
of the image. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92

(a) The original FLIR image, (b) the negative FLIR Image, (c) constrained Gaussian density prole image performed on original image,
and (d) the constrained Gaussian density prole image performed on
the negative image. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

94

A neural network block diagram with input variables xi and output


variables yj and its corresponding gradient model with input variables
dy
xi and derivative output variables dxji . . . . . . . . . . . . . . . . . .

95

A sample sigmoid normalization function that maps all input values to


an output range of -1 to 1. . . . . . . . . . . . . . . . . . . . . . . . .

99

29

30
31

32

33

34

35
36

Examples of Gaussian density prole images. (a) Original FLIR image,


(b) Negative image block processed using unconstrained analysis, (c)
Negative image block processed using constrained analysis, (d) Original image processed using unconstrained analysis, (e) Original image
processed using constrained analysis, (f) Negative image processed using unconstrained analysis. . . . . . . . . . . . . . . . . . . . . . . . . 102

xiii

37

Example of images from [60] used to compute statistically based complexity features: (a) Original FLIR image, (b) Standard deviation, (c)
Schmieder Weathersby, (d) FBm Hurst, (e) Target interference ratio,
(f) Energy, (g) Entropy, (h) Homogeneity, (i) Outlier. . . . . . . . . . 103

38

Sample clutter complexity results over the medium-wave image data in


a dual-band FLIR database. (a) Low complexity image, (b) Medium
complexity image, (c) High complexity image, (d) Plot showing the
correlation between false alarm counts for the M35 target and the corresponding clutter complexity scores, (e) Broadside view of M35 target
in both visible and medium-wave bands. . . . . . . . . . . . . . . . . 105

39

Screen shot of the CAU clutter complexity software interface package that shows the current conguration (input features, false alarm
threshold, and database), the false alarm vs. clutter complexity scatter
plot, the target chip, and a sample image with false alarm locations
identied. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

40

The resulting clutter complexity measure scatter plots of false alarms


vs complexity scores and associated correlation coecients when using
all thirteen input features (both generalized Gaussian and statistical),
using just the generalized Gaussian input features, and using just the
statistical features for an M35 target and the medium-wave data set.

107

41

Original ultrasound image. . . . . . . . . . . . . . . . . . . . . . . . . 117

42

Generalized Gaussian ABS processed ultrasound image with ner and


higher density granularity. . . . . . . . . . . . . . . . . . . . . . . . . 118

xiv

SUMMARY

This thesis presents a new technique for performing image analysis, synthesis, and
modication using a generalized Gaussian model. The joint time-frequency characteristics of a generalized Gaussian are combined with the exibility of the analysisby-synthesis (ABS) decomposition technique to form the basis of the model.

The

good localization properties of the Gaussian make it an appealing basis function for
image analysis, while the ABS process provides a more exible representation with
enhanced functionality. ABS was rst explored in conjunction with sinusoidal modeling of speech and audio signals [42], [41]. A 2D extension of the ABS technique is
developed here to perform the image decomposition. This model forms the basis for
new approaches in image analysis and enhancement.
The major contribution is made in the resolution enhancement of images generated using coherent imaging modalities such as Synthetic Aperture Radar (SAR) and
ultrasound. The ABS generalized Gaussian model is used to decouple natural image
features from the speckle and facilitate independent control over feature characteristics and speckle granularity. This has the benecial eect of increasing the perceived
resolution and reducing the obtrusiveness of the speckle while preserving the edges
and the denition of the image features. A consequence of its inherent exibility, the
model does not preclude image processing applications for non-coherent image data.
This is illustrated by its application as a feature extraction tool for a FLIR imagery
complexity measure.

xv

CHAPTER I

INTRODUCTION AND BACKGROUND

The concept of representing a signal as a combination of fundamental elements has


existed for over a century. One early example of this kind of decomposition is the
Fourier series, in which a periodic signal is represented as the linear combination
of weighted complex sinusoids. The frequency of each complex sinusoid is a unique
integer multiple of the periodic signals fundamental frequency. One major limitation
of the Fourier series is that it can only represent periodic functions. Nevertheless,
this decomposition spawned the development of an extensive body of work known as
Fourier analysis, which includes the Fourier transform (FT) and the discrete Fourier
transform (DFT). Both of these transforms yield frequency domain representations
of any signal (periodic or aperiodic) in terms of the amplitudes and frequencies of the
contributing complex sinusoids.

The ability to characterize signals and systems in

this alternate space is a very powerful analytical tool as evidenced by the substantial
body of work involving Fourier analysis techniques. One key benet of the DFT is
that it is particularly well suited in the analysis of discrete signals and linear timeinvariant systems.

As a result, the DFT has been employed in a wide range of

applications across many engineering and science disciplines.


However, each of the Fourier basis functions or sinusoids spans the entire analysis
frame. As a result, small changes in the spatial domain signal can aect all of the
transform domain coecients and vice versa.

The direct connection between the

signal attributes at a particular instant in time or location in space and the associated transform coecients is lost during the transformation. This inherent mutually

exclusive analysis in either the time or frequency domains becomes a limitation in classical Fourier analysis. As a result it becomes less attractive for representing rapidly
varying signals, tracking local phenomena, or analyzing signals of short duration.
In many signal processing applications, such as speech and image processing, it
is precisely the spatial organization of the data that conveys the information in the
signal. As a result, transforms that do not preserve any spatial connection are not
very useful in the analysis and extraction of highly localized signal features.

The

ability to localize the analysis of the signal in both the spatial and frequency domains
can be a very powerful tool. In order to overcome this time-frequency localization
limitation, a number of alternative techniques have been developed to study the timedependent spectra of signals. These techniques form the basis of joint time-frequency
analysis and include among others, the short-time Fourier transform and the Gabor
transform [33], [18], [86].

1.1

Short-Time Fourier Transform

One way to localize a transform spatially is to window the signal and then perform
the decomposition on the contents of each windowed region independently.

One

technique to accomplish this is the discrete short-time Fourier transform (DSTFT)


[82], [32], [27], which is given by
Xn [k] =

x[m]w[n m]ej(2/K)km ,

k = 0, ..., K 1,

K M.

(1)

m=

In one interpretation of the DSTFT, the output, Xn [k], can be viewed as the discrete
Fourier transform (DFT) of a section of the input signal, x[m], at index n as viewed
through the sliding analysis window w[n m] that typically has a nite length,
M. The window serves to limit the extent of the signal being transformed with the
expectation that the spectral characteristics are approximately stationary over its
duration. The DSTFT keeps track of the window location in time through the index
n.

This preserves the linear phase information, or delay, associated with the time
2

x[n]

400

1000
(a)

2000

1000
Time, n
(b)

2000

Frequency

0.1

Figure 1: (a) Linear chirp signal x[n] = sin(21.9105 )n2 shown with a Hamming
window N = 400 (b) Magnitude of the discrete short-time Fourier transform using
the Hamming window shown in (a).
oset of the window that would be lost by simply taking the DFT of the data within
the window.

The signal can be recovered by using the inverse discrete short-time

Fourier transform (ISTDFT) given as


K1

1
Xn [k]ej(2/K)km ,
x[m] =
Nw[n m] k=0

m n M + 1 m,

(2)

where m spans the range of the original indices for all the associated signal segments.
Figure 1 shows a linear chirp signal with time-varying frequency content and
the magnitude plot of its short-time discrete Fourier transform using a window. The
diculty becomes choosing an appropriate window for the analysis of the entire signal.
A longer time-domain window has a narrower main lobe in the frequency domain,
and inversely, a shorter time-domain window has a wider main lobe in the frequency
domain. Since windowing a signal in the time-domain is equivalent to convolving the
window and signal in the frequency domain , it would be best to have a window with a
very narrow main lobe width (an impulse function is ideal) if good frequency domain
resolution is desired.

However, this requires a long time-domain window.


3

In the

case of the DSTFT, good time-domain localization (i.e. a short window) is usually
desired, which results in degraded frequency domain resolution. So once again there
is an inherent tradeo between localization in the time and frequency domains.
The Fourier transform of a rectangular window is a sinc function of the form
sin()/. While the sinc function has a narrow main lobe, it also has rather large
side lobe amplitudes.

These can introduce undesirable distortion in the form of

leakage (or blurring) when convolved with closely spaced spectral peaks in the signal.
The amount of leakage distortion for a given window depends on the amplitude of
the side lobes relative to the main lobe, with lower side lobes resulting in less leakage.
The amplitude of these side lobes can be signicantly reduced by smoothly tapering
the window edges to zero in contrast to the abrupt transition of the rectangular
window. A number of windows exist that reduce the eect of the edge transitions
including the Kaiser, Hamming, and Hanning windows.

Figure 2 shows both the

rectangular and Hamming windows and their Fourier transforms. While it is clear
from this plot that the Hamming window has lower side lobe amplitudes than the
rectangular window, it is achieved at the cost of having a wider main lobe [82], [32].
Even with design of special windows to reduce the eects of the side lobe distortion,
the inherent tradeo between time and frequency domain localization remains.

1.2

Gabor & Heisenberg

Dennis Gabor was one of the early pioneers in this eld.

Intrigued by this time-

frequency relationship for one-dimensional (1D) signals, he began looking for a new
representation in which a continuous signal could be represented as a sum of elementary functions that are localized in both the time and frequency domains.

With

t and f representing the eective extent of the signal in the time and frequency
domains, respectively, Gabor posed the following question: What is the shape of the
signal for which the product tf actually assumes the smallest possible value . . . ?

20 log10|W[k]|

0
20
40
60
80
100

w[n]

1
0.5
0

10

20

30

40

50

256

512
(b)

768

1024

256

512
(d)

768

1024

256

512
(f)

768

1024

256

512
(h)

768

1024

(a)

20 log10|W[k]|

0
20
40
60
80
100

w[n]

1
0.5
0

10

20

30

40

50

(c)

20 log10|W[k]|

0
20
40
60
80
100

w[n]

1
0.5
0

10

20

30

40

50

(e)

20 log10|W[k]|

0
20
40
60
80
100

w[n]

1
0.5
0

10

20

30

40

50

(g)

Figure 2: (a) Rectangular window N = 25, (b)Fourier transform of rectangular


window N = 25, (c) Rectangular window N = 50, (d)Fourier transform of rectangular
window N = 50, (e) Hamming window N = 25, (f) Fourier transform of Hamming
window N = 25, (g) Hamming window N = 50, (h) Fourier transform of Hamming
window N = 50.

[39]
Gabors pursuit for an answer to this question parallels the derivation of Heisenbergs uncertainty principle in the eld of quantum mechanics. Heisenbergs uncertainty principle simply states that the more precisely the position, x, of a subatomic
particle is known, the less precisely the momentum (mass times velocity), p, of the
particle is known at that instant, and vice versa [51]. This uncertainty relationship
is given by
1
xp h
2

(3)

where h is Plancks constant, and x and p are the position and momentum uncertainties respectively. Heisenberg went on to postulate that the uncertainty in the
measurement of the canonically conjugate position and momentum variables at a
specic point in time is not a function of the accuracy of the tools for measurement,
but an inherent limitation of quantum mechanics [15].
Similarly, Gabor recognized that time and frequency are also canonical conjugates
and cannot be precisely determined simultaneously. As an alternative he sought to
minimize both the spatial extent, t, and the frequency extent, f , of these elementary signals in the two-dimensional (2D) time-frequency domain. Following the
mathematical developments in quantum mechanics, Gabor derived the uncertainty
relationship
tf

1
2

(4)

between the time and frequency resolution of a signal. This function places a lower
bound on the compactness of a signal in both time and frequency.

He further

determined that a frequency modulated Gaussian elementary function of the form


2 (tt )2
0

(t) = e

ej(2fo t+)

(5)

provides the optimal tradeo between eective spatial duration and frequency bandwidth, with the two being inversely related by equation (4).
6

The , t0 , f0 , and

|(t)|
2

(t t )

|(t)| = e

time, t

(a)

|(f)|
2

(/) (f f )

|(f)| = e

f0

frequency, f

(b)

Figure 3: (a) Time-domain magnitude, |(t)|, of the elementary signal centered


at t0 with a width of t, (b) Frequency domain magnitude, |(f )|, of the Fourier
transform of the elementary signal centered at f0 with a width of f .
are constants that dictate the shape of the Gaussian, the location of the peak,
and the modulating frequency and phase respectively [39].

The frequency domain

representation of this elementary function is given by


2
(f f

(f ) = e( )

2
0)

ej(2to (f fo )+) ,

(6)

which is very similar in form to the original. The magnitude plots of the elementary
function and its Fourier transform given in gure 3 illustrate how both have Gaussian
shapes.
Using this Gaussian basis function, Gabor sought to develop a decomposition that
would permit the simultaneous description of a signal in both the time and frequency

t
(n+1)t
nt

cn+1,k-1

cn+1,k

cn-1,k+1

cn,k-1

cn,k

cn-1,k+1

cn-1,k-1

cn-1,k

cn-1,k+1

t
(n-1)t

1/(2t)
(k-1)f

f = 1/(2t)
kf

(k+1)f

Figure 4: Joint time-frequency domain expansion of a signal into elementary Gabor


functions with associated coecients, cnk .
domains. Any signal could be represented in this joint time-frequency domain as a
series of localized functions that are translated and modulated versions of the basic
elementary function. The amplitude of each function would be represented by the
coecient, cnk , where n and k serve as the indices of the new joint time-frequency
space.

A one-dimensional time-domain signal could then be represented in a two-

dimensional time-frequency space as shown in gure 4, where each oval represents a


single elementary function.
While Gabors contributions provided the motivation for much of the development
in joint time-frequency theory, it wasnt until after his lifetime that the mathematical
basis for the Gabor expansion would be fully developed [86]. The Gabor decomposition is a sum of modulated and translated functions that cover the time-frequency
domain, where the transform coecients indicate the contribution of each function
to the sum. Mathematically, the Gabor expansion can be represented by
x(t) =




c(n,k) g(t nt)ej2kf t ,

(7)

n= k=

where g(t) is a window function and t and f are the time and frequency sampling
8

intervals (or translation and modulation parameters). While it has since been shown
that several other analysis functions can be used in the Gabor expansion [56], the
classical choice for the window function g(t) as rst proposed by Gabor is a normalized
Gaussian of the form
g(t) =

 a  14

eat .

(8)

The parameter a dictates the balance of concentration between time and frequency
[18], [86]. As has been stated previously, the optimal compactness in the joint timefrequency domain is the reason for choosing the Gaussian.
Unfortunately, the set of elementary functions in the Gabor representation are not
generally orthogonal.

Multiple sets of Gabor coecients can exist to describe the

joint time-frequency behavior of a given signal.

As a result solving for the Gabor

coecients is not a trivial task, which is further complicated by the desire to choose
the best set [86], [95] [39], [33], [18].

Gabor proposed a successive approximation

approach along each time index independently.

However, even he concluded that,

The expansion into logons [elementary functions] is, in general, a rather inconvenient
process, as the elementary signals are not orthogonal [39].

It wasnt until 1980

when Bastiaans published a solution involving the biorthogonality condition that an


analytical approach for computing the Gabor coecients was derived [7].

Even

so, the biorthognal function method only resulted in a unique solution under the
critically sampled case (when tf = 1/2) or when other analytical constraints were
imposed, and even then a solution was not guaranteed [95], [7], [86]. Needless-to-say,
the lack of a simple and ecient method for generating unique transform coecients
has hindered the broad acceptance of the Gabor expansion in signal processing.
The application of Gabor functions to image texture segmentation, extraction,
and analysis evolved from research pertaining to the visual cortex.

Daugman [23]

and Marcelja [77] successfully applied 2D Gabor functions in a model of the response
of simple cells of the visual cortex.

Using neurophysical measurements, they were

able to show that these functions closely represented the 2D receptive eld proles in
cortical simple cells in terms of spatial localization, orientation selectivity, frequency
selectivity, and quadrature phase relationship [24]. While the Gabor expansion and
functions do not yield a comprehensive model of the entire visual system with all of
its complexity, they certainly have served as useful tools in understanding biological
vision [33].
Based on the eectiveness of the Gabor expansion in describing the biological
attributes of mammal vision, Porat and Zeevi [85] proposed a Gabor based image
analysis tool for the nonuniform segmentation of images. Their goal was to describe
a decomposition that mimicked the behavior of the rst stages of the visual system.
The biological evidence indicated that some level of discrimination was being accomplished in a combined spatial-frequency space.

As such they sought a nonlinear

representation that was localized in both space and frequency.

With the develop-

ment of a set of tunable Gaussian modulated lters, they laid the foundation of what
would become Gabor ltering.
The primary application of multi-channel Gabor ltering [11], [54], [53] has been
the analysis, segmentation, and extraction of local texture features from digital images. The canonical 2D spatial domain Gabor lter is given by



1 n1 2 n2 2
cos(2uon1 + )
+ 2
h(n1 , n2 ) = exp
2 n2 1
n2

(9)

(n1 , n2 ) = (n1 cos + n2 sin , n1 sin + n2 cos ),

(10)

where uo and are the frequency and phase of the sinusoidal plane wave along the
n1 -axis, and n1 and n2 are the spatial constraints on the Gaussian.

Examples

of a Gabor lter impulse and frequency response are shown in gure 5, where the
orientation of the Gabor lter is specied through the parameter in the rotation
equation (10).
The ability to tune the lters to extract local features at specic frequencies is very

10

0.5

80

|G( , )|

160

g(n1,n2)

0.5
1
50

0
80

160
50
50
0
n2

0
50

50

50
0
2

(a)

0
50

50

(b)

Figure 5: Gabor Filter (a) impulse response and (b) frequency response with uo =
0.9, n1 = 5, and n2 = 10.
attractive. Combinations of features extracted from a bank of Gabor ltered images
can be used to classify and segment various patterns in an image. Gabor lters have
proven particularly useful in analyzing both natural and articially textured images
due to the localized nature of the image information. The optimal joint localization
in both spatial and frequency domains makes the Gabor lter an excellent choice for
this type of analysis.

1.3

Wavelets

Wavelet analysis is another recently popular representation which also aspires to


provide some joint localization capabilities.

The term wavelet is coined precisely

from the notion that this is a localized wave that has nite extent [92]. The concept
of the wavelet was rst mentioned in 1909 by Alfred Haar [48] in his doctoral thesis
which discussed new orthogonal decompositions. Since then a signicant amount of
work has been presented in the area of wavelet analysis by the likes of Levy, Weiss,
Coifman, Morlet, Mallat, Meyer, Vetterli, Daubechies, and Donoho [45], [22]. Like
many other decompositions, the wavelet transform seeks to represent a signal as the

11

weighted superposition of elementary functions, which in this case are called wavelets.
These wavelets are subject to the admissibility and regularity conditions. The rst
condition implies that a wavelet must generally have zero mean and therefore be
oscillatory in nature.

The second condition constrains the wavelet to be localized

and somewhat smooth [93]. A whole family of basis wavelets can be generated by
translating and scaling a single prototype function known as the analyzing or mother
wavelet. It is this scaling that plays a special role in the wavelet decomposition and
leads to the notion of a time-scale representation (similar to time-frequency which
is associated with the Fourier and Gabor representations).

The scaling operation

dictates the bandwidth and center frequency of the wavelet [93], [90].
The discrete wavelet (which is actually continuous in time) is given by
1
s,l (t) = s
ao

t
l bo
ao s

(11)

where the s and l parameters indicate the level of scaling and translation (location)
of the mother wavelet, (t). The variable bo is the translation step size and is usually
equal to 1. The scaling step size, ao , usually equals 2 resulting in dyadic sampling
along the frequency axis [3]. The Discrete Wavelet Transform (DWT) and its inverse
then become

1
t
s l x(t) dt
Wx [s, l] =
2
2s



t
1
Wx [s, l] s l .
x(t) =
2
2s
s= l=

(12)
(13)

The 2D discrete wavelet is typically generated by taking the outer dot product of two
1D discrete wavelets
s,l (n1 , n2 ) = s,l (n1 ) s,l (n2 ).

(14)

It is interesting to note that for the DWT representation, wavelet generation and
processing is essentially a variation of traditional lter bank based subband processing
theory that had been developed long before the introduction of the DWT [91], [22].

12

30

0.04

15

0
0.04
1

0
0.5

0.5
n2

(a)

0.5

0.5

n1

0.5

0
0.5

(b)

8
0.15
4

0
0.15
1

0
0.5

0.5
n2

(c)

0.5

0.5

n1

0.5

0
0.5

(d)

2
0.4
1

0
0.4
1

0
0.5

0.5
n

(e)

0.5

0.5

0.5

0
0.5

(f)

Figure 6: 2D Symmlet S8 wavelets with location indices n = 2 and scale indices


(a)m = 1, (c)m = 3, and (e)m = 5. The corresponding frequency domain responses
of each wavelet are shown in (b), (d), and (f).
The similarities can be readily observed by considering the frequency response of the
mother wavelet and its related dilations as illustrated in gure 6.
Of particular interest are the localization properties in the joint time-frequency
domain, which were a motivating factor in the development of wavelet analysis. It
is clear, however, that Heisenbergs principle of uncertainty has once again not been
eluded through the use of wavelets.

This becomes evident by observing that in-

creasing the scaling parameter, s, reduces the spatial extent of the wavelet while the
frequency extent of the wavelet is indeed expanded. What has made wavelet analysis particularly interesting is the ability to generate sets of orthogonal wavelet basis
functions [21]. The combination of its localization properties and the discoveries of

13

fast, ecient, and nonredundant implementations has made the DWT a useful tool
for image analysis and ltering [75], [3], [90].

1.4

Analysis-by-Synthesis

Systems that perform analysis and synthesis of speech signals have been the focus of
a signicant body of research. The introduction of voice coders (vocoders) [29] led
to the development of parametric models based on the acoustic model of speech production. Further development in the area of parametric modeling led to the process
known as Analysis-By-Synthesis (ABS). It is a method that was developed to obtain
valid parameters for the speech production models. This was accomplished by iteratively optimizing some error function (for example between an original speech signal
and the synthetic representation) over the set of available model parameters [87]. It
was rst proposed and implemented by Halle and Stephens [49] as a procedure for
estimating the time-dependent Fourier representation of a speech signal by adjusting
parameters for a speech production model that included representations of the vocal
tract transfer function and the glottal waveform. The same ABS approach was used
by Bell et al. [8] in a more rened model of speech spectra, while Pinson [84] adapted
ABS for the time-domain analysis of speech formant frequencies and bandwidths.
One class of vocoders models the vocal tract characteristics and associated spectral shaping using linear predictive coding (LPC). It is assumed that the speech
signal can be classied as either voiced or unvoiced and that the pitch period for
the voiced portion is known. For voiced speech, the excitation for the LPC vocoder
is a quasi-periodic impulse train with delta functions at the pitch period intervals.
The excitation for the unvoiced case is simply white noise [37], [87], [35], [36], [50],
[27].

One of the diculties associated with this excitation model includes accu-

rately detecting the voiced/unvoiced state since it does not allow for mixed speech
segments.

14

SPEECH

LPC
SYNTHESIZER

V[n]

[n]

e[n]

SYNTHETIC
SPEECH

EXCITATION
GENERATOR

PULSE
AMPLITUDES
AND
LOCATIONS

PERCEPTUAL
WEIGHTING

ERROR
WEIGHTED
ERROR

ERROR
MINIMIZATION

Figure 7: Block diagram of the analysis-by-synthesis procedure for multi-pulse excitation of a low bit rate LPC speech coder [6].
Atal and Remde [6] proposed a multi-pulse excitation model that was independent
of the voiced/unvoiced state and without constraints on the pitch periodicity. Their
ABS procedure determined the locations and amplitudes of the excitation pulses
for an LPC synthesizer by considering the dierence between the synthetic speech
and the original.

Figure 7 shows a block diagram of this process. A weighted

mean-squared error (to accommodate human perception), , is calculated from the


dierence between the synthetic and original speech segment. The excitation pulse
amplitudes and positions are then determined one at a time such that this error term

15

is minimized at each step. The contributions of the previous pulse are included when
generating an updated error signal for determining the current pulse. It is this key
aspect of the procedure which denes the ABS approach.
ABS is a Gauss-Seidel type of successive approximation that has been successfully
applied in a variety of applications under several dierent names. Juang and Gray
[58] investigated this concept for data compression; the method used in the context
of compression is widely known as multiple stage vector quantization (VQ) or residual
VQ. The primary constraints associated with single stage VQ implementations [13]
are the large computation and storage requirements. These constraints grow exponentially with the number of quantization bits when using the full-search algorithm
and a single stage quantizer.

As an alternative Juang and Gray proposed a mul-

tistage approach where the subsequent quantization stages are designed around the
residual signal from the previous stage. The computational complexity and storage
requirements in the multistage approach are simply the sum of the individual stages.
As a result, adding more bits to the overall quantization by adding stages results in
a signicant computational and storage savings over a comparable single stage VQ
[58]. The signal reconstruction is accomplished by simply summing the vectors from
each stage. Figure 8 shows the process for a residual VQ system designed for speech
coding.
The ABS process was further investigated by George and Smith [42], [41] as an
eective tool for performing the decomposition of speech signals in conjunction with
sinusoidal modeling. They used an overlap-add (OLA) windowing technique to analyze short quasi-stationary segments of the original speech signal. The ABS process
successively extracted sinusoids from the original segments, where the sinusoidal amplitudes, frequencies, and phase terms were chosen to minimize residual error. This
was accomplished by performing a search along candidate frequencies for amplitude
and phase values that minimize the successive error. The extracted values are the

16

1st
CODE BOOK

INPUT
VECTORS x i

1st
QUANTIZER
q1

2nd
CODE BOOK

q1(x i) =
y j1

x i - y j1 =
e1

2nd
QUANTIZER
q2

q2(e1) =
z j2

INDEX j 2
INDEX j 1

Figure 8: Block diagram of the analysis-by-synthesis procedure for a 2 stage residual


vector quantization coder [58].
sinusoidal model parameters used to synthesize perceptually identical reconstructions
of the input speech. They also presented the ABS/OLA model as a tool for performing time-scale, frequency-scale, and pitch-scale modication of speech signals [44].
Marques and Almeida [78] also proposed a similar algorithm based on a spectral peak
picking and optimization approach. The ABS procedure has since been incorporated
into a number of speech and audio applications such as speech concatenation and
musical tone analysis and generation [43], [73], [74].
Later, Mallat and Zhang [76] independently developed a very similar algorithm,
called matching pursuits, that iteratively decomposes a signal into an expansion of
waveforms that belong to a redundant dictionary of possible functions.

Similar

to wavelet bases, these waveforms have the added functionality of an independent


modulation parameter in addition to scaling and translation parameters.

The de-

compositions are accomplished by matching residues and nding the best waveforms
in the dictionary to characterize the remaining signal structure.

17

1.5

Objective Statement

In this thesis a new model is presented that represents an image as a sum of weighted
generalized Gaussian functions. The fundamental advantage of the Gaussian, compared to other traditional basis functions like sinusoids, is its time-frequency localization property. This allows the Gaussian to easily accommodate local variations and
features associated with a variety of image scenes and artifacts. The features that
separate this new image model from a traditional Gabor ltering representation are
the lack of modulating sinusoids and the addition of shape parameters associated with
the generalized Gaussian function. When adapted for a 2D implementation, the ABS
procedure provides a tractable method for performing the basis function extraction,
modication, and synthesis. The general formulation of the model does not assume
any a priori knowledge of the image. However, such knowledge can be exploited to
tune the model for a given application.

By combining generalized Gaussian basis

functions with the ABS decomposition technique, this model provides a powerful tool
for image analysis, modication, and synthesis.
The primary application of the generalized Gaussian ABS model is to perform
resolution enhancement on coherent imaging data such as Synthetic Aperture Radar
(SAR). This imaging modality suers from an artifact, known as speckle or speckle
noise, inherent in coherent imaging modalities. Speckle can contribute to poor edge
and feature denition, which in turn leads to lower perceived resolution. The ultimate
objective of any speckle reduction process is to allow one to resolve small objects in
the scene accurately. The true image resolution in the SAR application is largely
related to the sensor hardware and the signal bandwidth [80]. Ultrasound is another
coherent imaging modality that suers from similar speckle artifacts. In the case of
an ultrasound image, resolution is primarily a function of the acoustic signal frequency
and the absorption characteristics of the tissue being imaged [89]. While the SAR
application has been the primary application for this thesis, preliminary investigations

18

in the eld of ultrasound imaging have shown signicant promise.


The approach presented here does not attempt to derive quality improvement from
the physics of the image formation process like some superresolution schemes [26], [46],
[97], [70], nor does it attempt to suppress noise based on statistical characteristics as
some of the ltering and denoising schemes attempt to do [47], [28]. Rather, the approach taken here is to provide a decomposition where texture, object space features,
and speckle can be decoupled (to some extent), modied, and reconstitutedthe idea
being that by having these components isolated, modications can be tuned to make
the enhanced images approximate images of higher resolution.
In a second application, the generalized Gaussian ABS model is used as a preprocessor for a forward looking infrared (FLIR) image complexity measure process.
Unlike the SAR and ultrasound applications, this data is not generated using coherent
imaging methods and does not suer from the same type of speckle degradation. In
this case the model is tuned to identify artifacts and clutter in the image that aect
the complexity of the scene. These and other features form the input to a system that
establishes a complexity score for the scene. The objective of the complexity measure
is to provide an analysis aid for performing manual or automatic target recognition
[59].
The rest of this thesis is organized as follows.
formulation for the 2D generalized Gaussian.

Chapter 2 introduces a general

It also presents fundamentals of the

ABS procedure, and then combines the two in a discussion of the proposed model.
Chapter 3 explores the application of the model to the resolution enhancement task for
SAR images, and touches briey on some of the preliminary work done on ultrasound
data.

The FLIR complexity application of the model is presented in Chapter 4.

Conclusions and suggestions of future work are presented in Chapter 5.

19

CHAPTER II

THE GENERALIZED GAUSSIAN MODEL

This chapter lays the foundation and basic structure of the generalized Gaussian
model that is the subject of this thesis.

Many image representations have been

developed that characterize the complex nature of image structures like edges and
textures.

Many of these formulations generate complete characterizations of the

entire image in a basis.

This is very useful if the ultimate goal is a complete and

compact image representation.

However, it is often the case that the information

pertaining to particular local image features is distributed across many of the basis
vectors. Processing these features involves identifying the feature information in each
basis vector and then manipulating this information appropriately to aect the desired
modication.

For example, coherent imaging modalities suer from an inherent

spatially local artifact known as speckle noise.

In the Fourier representation the

speckle noise in a coherent image has frequency components that are distributed
throughout the spectrum.

As a result, any attempts to address the speckle noise

artifacts in the Fourier domain aects all of the frequency coecients.


An alternative is to represent the image as a set of highly redundant waveforms, of
which the appropriate choice for representation becomes task dependent. Bergeaud
and Mallat [9] proposed this approach for image decompositions using the argument
that the redundancy in the basis functions provided signicantly more exibility in
the representation. They generate a subset dictionary of 2D Gabor functions to represent the image by applying the non-optimal greedy strategy of the matching pursuit
algorithm.

This adaptive choice of waveforms provides a compact representation.

20

While this approach may work well for texture classication or edge and feature detection, it does not necessarily provide a means for straightforward modication or
enhancement of these local features.
In the generalized Gaussian image processing model presented here, local image
attributes are represented accurately in a way that permits meaningful modication
and enhancement. The adaptation of a redundant decomposition for such a model
is attractive because of the potential exibility it provides in representing local image
features and phenomena. This, however will only be possible if the basis functions
are also spatially localized.
As discussed in chapter 1, Gabor functions have been shown to have optimal
time-frequency localization properties [39], [18], [33], and as such would make ideal
candidates for this model. However, the sinusoidal modulation term limits the Gabor functions ability to represent highly localized image features spatially, which is
required to eect certain kinds of enhancement. Removing this sinusoidal modulation term from the Gabor function results in a Gaussian waveform.

Another way

of viewing this is to let the basis functions be Gabor functions with a modulation
frequency of zero. The result is a set of Gaussian functions that can be viewed as a
subset of the traditional Gabor functions.
The generalized Gaussian, (which we will introduce in section 2.2) of which the
standard Gaussian is a special case, further provides a mechanism for parameterizing
the waveform in terms of its shape. The result is a basis function with a considerable
amount of exibility.

This is very attractive when modeling local structure and

features associated with natural images.


The combination of a redundant representation of exible and localized generalized Gaussian functions iteratively extracted by means of the analysis-by-synthesis
(ABS) procedure (which is discussed next) forms the foundation of the model. This
generalized Gaussian ABS model is capable of not only modeling highly localized

21

image features and phenomena but also providing a means for modication and enhancement [12].

2.1

Analysis-by-Synthesis Decomposition

Linear transforms that represent a signal as a weighted sum of basis vectors have a
long history of development in a wide variety of applications. As an example, the
discrete Fourier transform discussed in Chapter 1 represents a time/spatial domain
signal as a set of weighted orthogonal complex exponentials.

Linear expansions

of this nature provide the mechanisms for generating closed-form solutions that are
manageable and ecient.

However, there are problems for which clean analytical

solutions are very computationally burdensome, and they do not aord the exibility
in the signal decomposition necessary to achieve the application objective.

For

example, the information value of an image is often dependent on the level of local or
low-level detail such as edges, texture and structure. While an image is completely
characterized by a decomposition over a basis, it is often the case that these lowlevel image features are distributed across many of the basis elements. This makes
analysis and modication by means of linear transforms very dicult and cumbersome
to implement.

An alternative is to use an iterative approximation technique to

perform the signal decomposition that provides the exibility necessary to extract,
analyze, and modify those particular features of interest in the image.

The ABS

procedure provides this level of versatility by providing a framework for the analysis,
modication and synthesis of a signal regardless of the elementary functions used,
which in this model are generalized Gaussians.
The presentation of the ABS method follows closely the development of George
[41] by providing a concise description of the ABS algorithm using nite-dimensional
vector theory. Consider the real p-dimensional vector
x = (x1 , x2 , ..., xp ) .
22

(15)

The goal is to generate an approximation to x as a sum of components {


xj } given by
x
=

J


x
j .

(16)

j=1

Assuming that the rst l 1 components have already been obtained, the approximation vector becomes
x
l1 =

l1


x
j

(17)

j=1

The approximation error at this point would be


l1 = x
el1 = x x

l1


x
j .

(18)

j=1

These vectors can be updated recursively using the equations


l1 + x
l ,
x
l = x

(19)

l ,
el = el1 x

(20)

where the starting conditions are x


0 = 0 and e0 = x for any l 0. The only thing
remaining is to determine the vectors x
j , which are the individual components that
contribute to the approximation sum.

This is traditionally done by minimizing a

squared error norm at each step in the recursion


El = el 2 =

p

i=1

= el1 x
l 

{e2i }l

(21)

in terms of the parameters of x


l used to model the signal. If these vector components
are composed of nonlinear functions, then the optimization in equation (21) will also
be nonlinear.
There is a special case in which the solution can be determined in terms of a
linear least-squares approximation [41]. This is accomplished by xing the nonlinear
variable and then solving the approximation for a candidate set of function parameters
that minimize the error.

Using this set of function parameters, the error is then


23

recalculated over an ensemble of values for the nonlinear variable to nd the nal
value that minimizes the overall error.
The ABS process converges in a mean-square sense, because from equation (21)
the minimum value of El according to the Pythagorean theorem can be given by
El = el1 2 
xl 2 = El1 
xl 2 .

(22)

This indicates that the approximation error at each iteration is less than the previous
one as long as 
xl 2 > 0. By trying to minimize El we are actually modeling the
residual error of the approximation after l 1 iterations in terms of the component
vectors x
l . While most of the development of the ABS process has been associated
with speech and audio signal processing, it will be shown as a very suitable technique
for performing image decompositions as well.

ABS provides a exible framework

within which to perform the analysis and synthesis necessary to implement localized
image feature modeling and enhancement.

2.2

2D Generalized Gaussians

The traditional 1D Gaussian function is given by




2 
1
1 x
G(x) =
exp
,
2

(23)

where , typically recognized as the mean, is the center location that corresponds
to the peak of the waveform, and is the standard deviation of the waveform that
dictates the lobe width. While the Gaussian function is most commonly recognized
as the probability density function of a normal distribution in statistics, we are interested in the Gaussian function as a waveform for signal representation. As such the
constant area constraint associated with the statistical form of the Gaussian function
is eliminated by removing the

1
2

scaling factor. Each Gaussian function is simply

constrained to have a unity peak amplitude.

24

g(x)

=3
=2
=1

Figure 9: Plot of 1D Generalized Gaussian functions with shape parameters =


1, 2, 3. The case of = 2 is the standard Gaussian lobe width and shape.
Furthermore, for the purpose of modeling and representing arbitrary signals it
would be advantageous to control the shape of the Gaussian function in addition to
the extent of the lobe width. The generalized Gaussian function, which is given as

where

 
 
|x |
,
g(x) = exp

(24)

    12
1
 

(25)

and () is the gamma function, provides the ability to change the shape through the
additional parameter, .

This parameter is unique to the generalized Gaussian in

that it controls the exponential rate of decay (or rollo) of the function as illustrated
in gure 9 for three distinct values of .

It should be noted from equations (23)

and (24) that the traditional Gaussian function is a special case of the generalized
Gaussian function with a shape parameter of = 2.
Similar to the 1D generalized Gaussian function, the 2D function is characterized
by a center peak location, (1 , 2 ), a pair of lobe width parameters, 1 and 2 , that
dictate the spatial extent of the Gaussian along the n1 and n2 axes, and , the
25

1
g[n ,n ]
1

n2

Figure 10: A 2D generalized Gaussian function with a contour plot detailing the
center location, lobe widths, and rotation angle.
associated shape parameter.

Figure 10 shows a 2D generalized Gaussian function

that can accommodate spatial rotation and translation that is given by




2
2

r1 [n1 , n2 ] + r2 [n1 , n2 ]

g [n1 , n2 ] = exp

()

(26)

where
1 2
() = 
,
2
2
(1 cos ) + (2 sin )

 

arctan  n2  , n1 = 0
n1
,
=

,
n1 = 0
2

(27)

(28)

where is given in equation (25). Equations (27) and (28) provide for a circularly
smooth transition between the two lobe width values, 1 and 2 . The spatial translation and rotation of the 2D Gaussian in equation (26) is accomplished through the
functions
r1 [n1 , n2 ] =

(n1 1 ) cos + (n2 2 ) sin

(29)

r2 [n1 , n2 ] = (n1 1 ) sin + (n2 2 ) cos


where the parameters (1 , 2 ) and the angle specify the Gaussian center point

26

(a)

(b)

(c)

Figure 11: 2D Generalized Gaussians with lobe widths 1 = 2 = 7 and shape


parameters (a) = 2, (b) = 4, and (c) = 8.

(mean) and the amount of o-axis rotation as illustrated in gure 10.

The shape

parameter, , maintains the same eect on the 2D generalized Gaussian function as


it did in the 1D case (see gure 9 for the 1D functions) as shown in gure 11.
A simplied alternative formulation of the 2D generalized Gaussian can be realized
as the outer product of two 1D generalized Gaussian functions
g [n1 , n2 ] = g1 [n1 ] g2 [n2 ] .

(30)

Each 1D generalized Gaussian assumes the discrete form


 
 
|ni i|
gi [ni ] = exp
,
i

(31)

where i = 1, 2 and identies the axis of the 1D function and is given in equation
(25). This representation is attractive because it is separable and enables independent
consideration along each spatial dimension of the 2D function.

If computational

constraints are an issue, this could save valuable time in the analysis and synthesis
of such functions.

However, further inspection of equations (30) and (31) shows

that this 2D generalized Gaussian is a constrained representation. The exibility to


assume an arbitrary spatial orientation and the ability to achieve a smooth transition
from one lobe width to the other are no longer present.

This results in Gaussian

functions with major and minor axes that can lie only in the direction of the image
coordinate axes, n1 and n2 .

Articial squaring of the waveforms becomes evident


27

(a)

(b)

(c)

Figure 12: Separable 2D generalized Gaussians with lobe widths 1 = 2 = 7 and


shape parameters (a) = 2, (b) = 4, and (c) = 8.

as the shape parameters of the associated 1D functions deviate from the standard
Gaussian. This is illustrated in gure 12.

2.3

The Model

The generalized Gaussian ABS model combines the spatial localization and parametric control properties of the generalized Gaussian function with the tractability and
exibility of the ABS procedure. The generalized Gaussians serve as the component
functions for the ABS engine which performs the image decomposition, parameter
modication, and synthesis.

Unlike other more conventional decompositions, the

ABS procedure is unique in that it enables decoupled parameter specications between the analysis and synthesis stages. We seek to approximate a discrete image
i[n1 , n2 ] as a nite sum of weighted generalized Gaussian functions, gk [n1 , n2 ], as
[n1 , n2 ] =

N


ak gk [n1 , n2 ] ,

(32)

k=1

where [n1 , n2 ] is the reconstructed image, and ak represents the associated weights.
During the analysis a search is conducted to nd the Gaussian function and corresponding parameters that best approximate the residual image or match the model of
some feature in that image. In SAR and ultrasound applications, we wish to model
coherent imaging speckle, whereas in FLIR complexity assessment applications we

28

seek to model target features of specied sizes.

As a result, the process for deter-

mining the best image components (or generalized Gaussian functions) is inherently
application dependent.
Once the best generalized Gaussian image component has been identied, it becomes the extracted Gaussian function, gl [n1 , n2 ], of the lth iteration. It is interesting
to note that the standard implementation of the ABS algorithm would require that
this extracted function be subtracted from the previous residual to form the new
residual. However, it might be to our advantage in terms of our image processing objectives to modify the extracted function before performing the subtraction. This new
potentially modied version of the extracted function becomes the analysis Gaussian,
which is used to form the residual or error function
gl [n1 , n2 ]} ,
el [n1 , n2 ] = el1 [n1 , n2 ] A {

(33)

where A {} is an analysis modication function.


The extracted Gaussian function is also passed to the synthesis stage, which is
similar in structure to the analysis stage. Once again the ABS process provides the
ability to modify the extracted generalized Gaussian functions during the synthesis
stage.

These potentially modied versions of the extracted functions, which are

called the synthesis Gaussians, are then accumulated and summed to form the new
processed output image
[n1 , n2 ] =

N


S {
gk [n1 , n2 ]} ,

(34)

k=1

where S {} is a synthesis modication function.

Embedding these modication

functions, A {} and S {}, in the ABS process provides signicant modeling and
enhancement exibility that will be discussed in chapter 3. A block diagram of the
generalized Gaussian ABS process is shown in gure 13.
This process is repeated recursively until some stop criterion is met.

If the

analysis and synthesis functions are exact replicas of the extracted function (i.e. no
29

i[n1 , n2 ] +

ek [n1 , n2 ]

EXTRACT
BEST
GAUSSIAN

MODIFY
ANALYSIS
GAUSSIAN

MODIFY
SYNTHESIS
GAUSSIAN

ACCUMULATE
GAUSSIANS

ACCUMULATE
GAUSSIANS

~
ik [n1 , n2 ]

Figure 13: Block diagram of the generalized Gaussian analysis-by-synthesis decomposition and synthesis technique.

modication), then image reconstruction that is perceptually identical to the original


can be achieved using a nite number of Gaussians.
2.3.1

Coupled and Decoupled Modes

As was just presented, one unique aspect of the ABS process is that it enables coupled and decoupled modes of operation in terms of the analysis and synthesis function
specications/modications. In the coupled operational mode of generalized Gaussian ABS, the modication processes are identical in both the analysis and synthesis
stages
A {} = S {} .

(35)

This reduces computation as the extracted functions are only modied once and
then used in both the analysis and synthesis stages.
In the decoupled mode the nature of the modications to the extracted Gaussian
functions performed in the analysis and synthesis stages is quite dierent. As a result
the analysis and synthesis Gaussian functions are not the same. While this requires
more overhead in terms of computation, the decoupled operational mode is the key
30

attribute of the model that provides the additional exibility needed to perform useful
image enhancement.
2.3.2

Gaussian Parameter Estimation

As was presented in the ABS section previously, one approach to identifying suitable
analysis components is to nd some closed-form solution to the optimization problem.
An alternative is to search the entire generalized Gaussian parameter space for a
function that minimizes the global residual error.

While these may yield compact

representations, they do not exploit any a priori knowledge we may have of the image
content that we seek to model.
Instead, an approach is pursued whereby a search is conducted for a set of suitable
Gaussian parameters that best model a particular image feature. The search can be
conducted across the entire image or in a localized region, depending on the application and the type of image features being modeled. The amplitude, lobe widths,
shape factors, and spatial location parameters are extracted for each 2D generalized
Gaussian function in the decomposition.

The range of possible values for each of

these parameters can be reduced to those values that yield meaningful representations
of the image features being modeled. This signicantly reduces the parameter search
space while maximizing the probability of obtaining a set of reasonable parameter
values.
Once a reasonable range of expected values has been established, the general
approach for estimating the individual parameters is as follows. First, a candidate
feature location and amplitude is identied using a priori knowledge of the image
features being sought.

If the features generally have high image intensities, then

a simple peak-picking approach might suce.

Conversely, if the features are dark

or have low image intensity values then a peak-picking approach on the negative
image might be more suitable. Once the amplitude and central spatial location of

31

a feature are established, the search for suitable lobe width and shape factor values
can be conducted by searching radially outwards from this position. Once again, the
range of values for these lobe width and shape parameters can be limited by a priori
knowledge of the image feature. A less complicated optimization can now be done
for parameter values within these reduced ranges that minimize the error between the
generalized Gaussian representation of the feature and the feature itself. However,
even this can potentially be simplied even further depending on the nature of the
image features being modeled.

For example, if the features are somewhat square

in shape, then use of the separable generalized Gaussian function could reduce the
search to two independent optimizations of only two variables (one lobe width and one
shape factor) along two perpendicular axes. This further reduces the computational
complexity of the search. Once a complete set of parameters has been determined,
the associated generalized Gaussian can be extracted from the original signal. The
process is then repeated resulting in a successive approximation based model.
2.3.3

ABS Stop Considerations

The recursive nature of the ABS process requires a mechanism to stop the successive
approximation process.

An error based approach could be employed to stop the

process based on an error threshold.

However, image energy based methods, such

as this, tend to be image scene and feature dependent, and it is dicult to choose
an appropriate threshold for all possible image scene scenarios. An error or energy
based stop condition could result in incomplete or inadequate image decompositions.
Another approach is to specify a Gaussian density requirement that stops the ABS
process once a specied number of Gaussians has been used in the decomposition.
The number of Gaussians can be set as a percentage of the image size or simply xed
to a set amount.

This approach is more deterministic in terms of computational

complexity, and it is much less image scene dependent.

32

The peak-picking generalized Gaussian search method can end up causing the
model to neglect dark and low contrast regions of the image regardless of which stop
criterion is used. One solution to this is to apply the model in a non-overlapping block
processing mode. In this implementation the image is segmented into smaller (and
usually more similar) regions, and the generalized Gaussian ABS model is applied
to each region independently. This provides a more uniform spatial distribution of
Gaussians across the image, and it tends to reduce the amount of amplitude variation
presented to the search routine which performs the parameter estimation.

2.4

Applications

The generalized Gaussian model is applied to a number of applications in the following


chapters. Of particular interest is the ability to model and control the granularity
of speckle in coherent images. The combination of the decoupled ABS process and
the generalized Gaussian basis function form a powerful tool with the potential of
performing some signicant enhancement in SAR and ultrasound images. The model
is not limited to coherent modalities. A FLIR clutter complexity measure is presented
in chapter four as an example of how the model can be used as an image feature
extraction tool.

33

CHAPTER III

SAR RESOLUTION ENHANCEMENT

In this chapter the generalized Gaussian analysis-by-synthesis (ABS) model is used to


perform resolution enhancement on image data generated by synthetic aperture radar
(SAR). Because SAR is a coherent imaging modality it suers from an artifact known
as speckle. By modeling and then reducing the size of the speckle, the generalized
Gaussian ABS model can enhance the image giving it the appearance of a higher
resolution image.

The benets of this resolution enhancement extend beyond the

subjective improvement to include improved performance when used in automatic


target recognition (ATR) systems. The conguration of the model and the ensuing
resolution enhancement results will be thoroughly discussed in the remainder of this
chapter.

3.1

Introduction

Synthetic aperture radar (SAR) has, over the past few years, proven to be an invaluable tool in remote sensing applications such as hydrology, vegetation mapping
and monitoring, reconnaissance, geology, glaciology, and tectonic activity assessment
[80]. Image data of the earth can be collected using both airborne and spaceborne
SAR platforms. While earth and atmospheric science applications tend to favor the
spaceborne systems, most military applications rely on airborne platforms with rapid
deployment capabilities. The military reconnaissance objectives consist of the detection and recognition of various vehicles or targets within a scene. Human analysts
and automatic target recognition (ATR) systems that perform these tasks require
SAR image resolutions of better than 1m in order to extract meaningful information
from the image data [80].

As a result, eorts to increase the resolution of a SAR


34

elevation

L
w
h

ut h
im
z
a
range

r
Rs

Sa

Rg

V = velocity
L = antenna length
W = antenna width
h = antenna elevation
= look angle
r,a = range & azimuth
beam width
Sr,a = range & azimuth
swath width
R s,g = slant & ground
range

Sr

Figure 14: Geometry of a side looking imaging radar.

image can lead to improved target detection and recognition capabilities.


3.1.1

SAR Imaging Geometries

In synthetic aperture radar (SAR) imaging, very narrow beamwidth angle microwave
pulses are used to illuminate an object surface, while the reected or backscattered
microwave energy is measured between the pulses.

This backscatter is used to es-

tablish the position and contrast of the various targets on the object surface. The
general geometry for an imaging radar system is shown in gure 14. This conguration is known as a side looking airborne radar (SLAR), where a swath of land is
illuminated by the radar beam on just one side of the vehicle track. The distance,
known as the slant range, Rs , between the antenna and a point on the scattering
surface can be determined by measuring the round trip time delay, , between the

35

pulse transmission and the backscattered return. This is given by


Rs = c/2,

(36)

where the microwave signal travels at the speed of light, c. A time log of backscattered returns allows the signal processor to generate a distance map between surface
scatterers and the antenna. By limiting the radar swath to just one side of the airborne platform, left-to-right ambiguities that might occur between two equi-distant
points on either side of the platform are eliminated [30].
The slant range resolution is a function of the duration or pulse width, p , of the
illumination pulses. As illustrated in gure 15, the return signals of the two point
scatterers, A and B, separated by a slant distance of Rs will have a time delay
dierence at the receiving antenna of t = 2Rs /c. In order to resolve these two
point scatterers, the radar pulse width will need to meet the constraint,
p < t =

2Rs
.
c

(37)

Solving for Rs , the best achievable slant range resolution, rs , for a given pulse width
becomes
rs =

p c
.
2

(38)

In the extreme, an innitely high slant range resolution (small rs ) appears possible
given an innitely small pulse width (p ). However, with the reduction in pulse width
comes a drop in transmitted power that eventually results in very little scattered
energy being returned to the sensor.

The alternative is to transmit a long linear

frequency modulated (FM or chirp) signal pulse that has both a large p and a large
bandwidth, Bp . The reected modulated pulse can be compressed later during image
formation to yield higher slant range resolution with more instantaneous signal power
[20], [80], [14]. The new slant range resolution then becomes
rs =

c
.
Bp 2

36

(39)

Rs

Rs
Rg

Rg
A

Rs

range

(a)
Tp

Wp

A
t

t
A

B
B

2Rs
(b)
Figure 15: (a) The range geometry of a side looking imaging radar with points A
and B shown with a separation of Rg along the ground and Rs along the slant. (b)
The timing of illumination pulses and scattering returns required to eliminate overlap
between the two along range points A and B. This species the slant resolution of
the SAR.

37

In real aperture radar (RAR), the resolution of the imaging system in the azimuth
direction depends directly on the width of the microwave beam along the azimuth.
Referring to gure 16(a), the only way points A and B can be resolved is if they are
not in the radar beam at the same time. The beam width, a = /L, is in turn a
function of the length of the antenna in the azimuth direction, L, and the carrier
wavelength, . The resolution then becomes simply the azimuth swath width, Sa , at
a given slant range, Rs , as
rRARa = Sa = 2Rs sin(

a
) Rs a
2

(40)

for small a . For a given carrier wavelength, the two ways to improve resolution are
to use a longer antenna aperture or to shorten the slant range distance.

Physical

constraints on both of these variables put limitations on the practical design and
operation of antennae and the achievable resolution for a real aperture radar systems
[20].
Using a method rst proposed by Wiley in the 1950s, SAR establishes the position
of targets along the azimuth by looking at the Doppler frequency shift of the returned
backscatter [96].

Consider the reections from the two point scatterers, A and B,

shown in gure 16(b) at equal slant range distances, Rs , from the antenna that is
moving at a velocity, V . With a platform velocity of zero, the reected frequencies at
the antenna from both points A and B would be fo = c/ or the original transmitted
frequency. Both of the return signals would reach the antenna at the same time due
to the identical slant range distances given a platform velocity of zero. This leads to
a front-to-back ambiguity between points A and B. However, if the platform has a
known velocity, V , then the return signal frequencies become
fA,B =

fo fD

fA,B =

2VD
,

(41)

where fD is the Doppler frequency shift, and c is the speed of light. VD is the Doppler
38

azimuth

Rs

Sa

a
Rs

Real Aperture Radar

range

V
L

azimuth

(a)

Rs

+fD A
2I

Rs

2x
-fD B

Synthetic Aperture Radar range


(b)

Figure 16: (a) The azimuth geometry of a Real Aperture Radar (RAR). If the
points A and B both fall within the azimuth swath width, Sa , of the radar beam they
will not be resolved. (b) The timing of illumination pulses and scattering returns
required to eliminate overlap between the two along range points A and B. This
species the slant resolution of the SAR.

39

velocity for the given angular oset, , of the targets and is given by
VD = V sin().

(42)

Point scatterer A will return a slightly higher apparent frequency fo + fD , while


scatterer B will return a slightly lower apparent frequency fo fD .

By frequency

ltering the return signals, the SAR processor can distinguish between two point
scatterers at equal slant distances based on the Doppler shifts and eliminate the frontto-back position ambiguity. This is known as unfocused SAR processing. Focused
SAR processing compensates for the nonlinear phase behavior of the return signals
resulting from the relative target motion during the observation interval [20]. This
slight shift in target position relative to the antenna platform results in a Doppler
bandwidth (as opposed to a xed Doppler frequency) that must be accounted for in
order to achieve the theoretical SAR azimuth resolution limit of
rSARa = L/2.
3.1.2

(43)

SAR: Surface Properties and Scattering

Typically, the reection or absorption of microwaves (with wavelengths between 2


and 15 cm) depends on the properties of the objects surface.

Consequently, the

magnitude of the microwave return signal can be aected by the objects local surface
geometry (like the slope), texture (or roughness), and inhomogeneities (like porosity)
as illustrated in gure 17.

In addition, electrical properties such as the dielectric

constant, absorption, and conductivity will also aect the scattering/reection of the
signal [30], [20], [1].
Natural surfaces can be represented as the sum of large scale (low frequency)
straight facets and the small scale (high frequency) surface roughness. The surface
roughness can be characterized statistically in terms of the standard deviation relative to the mean of the surface and the correlation length, which is dened as the

40

Faceted or Sloped Surfaces

Smooth Surface Rough Surface

Figure 17: Illustration of how various surfaces can aect a radar return signal.

separation distance at which two points become statistically independent. At high


angles of incidence, the large facets tend to dominate the radar backscatter while at
low angles of incidence the surface roughness tends to become predominant. As a
result, dierent surface scattering models have been developed to describe the interaction between the microwave radiation and an objects surface. These models are
often used in combination, as each represents the backscattering for a dierent set of
surface and signal conditions [30].
Facet scattering models characterize the backscatter from the large scale surface
facets that are relatively devoid of surface roughness. For the sake of modeling, the
geometry of the surface can be broken down into a series of connecting facets with
low slope angles. The backscatter for each facet is then derived given the electrical
properties of the surface material.

The overall radar backscatter is represented as

the superposition of the individual elds from each of the facets. The fundamental
limitation of the facet scattering model is that it is only valid at high angles of
incidence [30].
Two other models, the point scatterers model and the Bragg model, are used in
conjunction with low angles of incidence. It is at these lower angles that the surface
roughness starts to dominate the backscatter return signal. As the name insinuates,
the point scatterers model represents the surface as a homogeneously distributed
collection of radiating points that are uncoupled. The total backscatter is considered

41

as the sum of the individual point scatterers. The individual responses are derived
from the projection of the incident wavefront on the surface of the point scatterers.
Elachi [30] presents a simplied model with a backscatter equation given by
() = N0 cos2 ,

(44)

for N point scatterers each with a backscatter cross section (also called a radar cross
section or RCS) of 0 and an angle of incidence of .
In the case of the Bragg model, the backscatter is assumed to result from resonance between specic spectral components of the surface and the incident wavefront.
Specically, those spectral components that are integer multiples of half the signal
wavelength, , given by
= n/2 sin , n = 1, 2, . . .

(45)

where is the angle of incidence [30]. Thus at low angles of incidence the backscatter
is also partially a function of the spectral composition of the surface roughness.
Several other factors also aect the magnitude of the returned radar signal. These
include absorption losses in both the atmosphere and the reecting surface and volume
scattering (or the transformation of energy into heat/vibration).

The polarization

(either horizontal or vertical) of the incident waveform also has an eect on the
returned signal resulting in highlighting of dierent surface features. As mentioned
in the discussion of the Bragg scattering model, the signal frequency can have a major
impact on the signal backscatter.

In addition to inuencing the Bragg resonance,

frequency is the primary parameter that inuences the extent to which a radar signal
can penetrate the surface. In general, the penetration depth is linearly proportional
to the signal wavelength [30].

All of these illustrate that multiple factors, both

natural and systemic, contribute to the formation of a radar image.

42

SAR
Sensor

Signal Data
(complex)

SAR
Imagery

Image
Formation
Processor

Image
Processor
Scene
Information

SAR Platform
Motion Data

Motion
Compensation

Image
Exploitation

Figure 18: A block diagram of the fundamental components of a SAR imaging


system [14].

3.1.3

SAR Imaging Components

A SAR imaging system is usually comprised of the basic elements shown in gure
18.

The sensor subsystem typically receives the scattered electromagnetic return

signal and demodulates it to a lower baseband frequency (usually zero) resulting in a


narrow bandwidth complex digital signal. The image formation processor generates
a complex image that corresponds to the radar cross section returns of the objects
within the scene. The originally transmitted signal as well as radar platform motion
information are captured in the motion compensation subsystem and exploited to
correct for inherent spatial distortions in the resulting image. The image processor
is usually composed of algorithms designed to analyze and detect features within the
image.

The image exploitation block then classies the image data based on the

features extracted in the image processing stage.


The SAR image formation process, as shown in gure 19, typically relies on discrete Fourier transforms to perform range and azimuth compression of the raw complex SAR signal data. This involves multiplying the SAR signal data by matched

43

lters after both have been transformed into the frequency domain by means of fast
Fourier transforms. In the range compression direction, the matched lter consists
of the complex conjugate of the originally transmitted chirped signal. The matched
lter used in the azimuth compression is derived from the Doppler history of the received signal and takes into account the pulse repetition frequency of the SAR system
as well as sensor platform motion [14], [38]. While the image formation process illustrated in gure 19 involves 1D processing, it is not dicult to realize that the entire
process can be accomplished using full 2D matched lters and fast Fourier transform
processing. Various windows, such as the Hamming, Taylor, and Kaiser-Bessel windows, may be applied to the matched lters (also known as reference functions) prior
to compression as a way to achieve side lobe suppression and control the peak and
integrated side lobe ratios. However, this comes at the expense of reduced resolution
and blurring from the resulting wider mainlobes [20], [38], [26].
3.1.4

SAR Speckle Noise

Speckle noise is an inherent artifact in SAR imaging systems.

The signal received

at the antenna is the superposition of backscattering interactions between the transmitted signal and a multitude of elemental scatterers within a resolution cell, dA,
on the target surface. However, even over a nominally homogeneous target, signicantly dierent surface interactions can result due to the natural variability of surface
characteristics as described in section 3.1.2. As a result, the cross section or RCS
parameter, , which represents the amount of backscattering from a resolution cell, is
really better described as a random variable with a given probability density function.
Ideally, a SAR image would represent a collection of the mean cross section values,
= E{e }, of each of the dA resolution cells.

In reality however, it is comprised

of a collection of cross section values governed by the targets surface cross section
probability density function and some variation about its mean value, [20], [30].

44

SAR
Sensor
Raw Signal
Data (complex)

FFT along Range

FFT

Range Compression

Matched
Filter 1

IFFT along Range

Matched
Filter 2

Range Migration

Patch Processing

FFT along Azimuth

FFT

Azimuth Compression

Platform
Velocity,
PRF,
Absolute
Range

IFFT along Azimuth

Complex SAR Image


Figure 19: The block diagram of a traditional Fourier transform based SAR image
formation process [38].

45

This variation is what makes up speckle noise in a SAR image. It manifests itself
in the image as spurious high and low intensity pixel regions (usually only 2 to 3
pixels across) distributed spatially throughout the scene. The statistical nature of
the speckle is not constant across the SAR image as it is an artifact of the scattering
properties of the various surfaces in the scene. As a result, the statistical properties
of speckle noise within an image are closely related to the properties of the various
target cross sections.

3.2

Traditional Speckle Reduction and Resolution Enhancement Methods

While the Forward Problem described in section 3.1.2 seeks to predict the speckle
formation process using complicated models of the scattering medium and the physics
of the transmitted wave, the Inverse Problem attempts to derive the radar cross
section (RCS) or true reectivity, (), of the scattering surface from the resulting SAR
intensity image. Despite the rather thorough understanding of speckle characteristics,
there are no unique solutions for obtaining from the intensity image because of
the stochastic contribution of the speckle to the image [80].

As a result, several

speckle reduction and image enhancement methods have been proposed, and they
can be divided into two general categories: processes implemented during the image
formation stage and those applied to the image after formation.

The following

sections 3.2.1, 3.2.2, and 3.2.3 describe some of the traditional post-image-formation
speckle reduction techniques, while section 3.2.4 provide a brief overview of the the
pre-image-formation resolution enhancement methods.
The generalized Gaussian analysis-by-synthesis (GGABS) model proposed in this
chapter is also a post-image-formation process.

However, while most of the tradi-

tional post-image-formation processes seek to eliminate speckle from the scene altogether through image smoothing processes, the goal of the GGABS model is to reduce
the granularity and size of the speckle without losing the texture in the image. The
46

resulting images take on the appearance of higher resolution image data, and yet they
do not compromise the structure and texture of the original SAR image.
3.2.1

Multilook Averaging Speckle Reduction

One of the traditional methods for reducing speckle is multilook processing [34]. Since
the occurrence of speckle can be considered a random process, one can conceivably
reduce the speckled eect through averaging.

The simplest implementation is to

apply windowed averaging to the magnitude image (hence the common name of Multilook Averaging). While this implementation does reduce the amount of speckle and
requires only one SAR image, it reduces the image resolution in both the azimuth
and slant range directions as a function of the window size.
Another method of multilook processing averages multiple coherent data sets (or
looks) of the same target region. These data sets might consist of several independent
single-look complex SAR images of the same target region that are spatially aligned
and then averaged on a pixel-by-pixel basis.

This approach has the benet of not

inherently degrading the resolution, however it is very dicult to acquire/obtain


multiple data sets under the same acquisition conditions, and the alignment process
can be cumbersome at best.

The ability to reduce the speckle in the image is a

function of the number of independent looks, and the results are often valuable only
for the limited region where the alignment of all the data sets is attainable.
A third approach exploits the SAR image formation process and tends to be the
most popular.

The azimuth resolution in a SAR system is xed as a function of

antenna length (see section 3.1.1), and it is usually several time higher than the slant
range resolution, which is a function of the signal characteristics of the illumination
pulses. As a result, more lines of raw data are collected in the slant range direction
(or along the azimuth) yielding dierent sampling rates in each direction. Averaging
the appropriate number of the slant range data lines along the azimuth direction

47

Range Direction

Azimuth
Direction

Multilook
Averaging

Figure 20: Multilook processing reduces the eects of speckle noise in a SAR image
by exploiting the ability of SAR systems to collect higher resolution data in the
azimuth direction than in the slant range direction. The higher resolution data is
averaged along the azimuth to produce square pixels with reduced speckle.

achieves some speckle reduction, while at the same time producing square pixels that
are easier to display.

The inherent resolution loss associated with averaging still

exists, however it is constrained to only a single dimension in the SAR image. Figure
20 illustrates this most common implementation of multilook averaging.

Li et al.

were able to show that similar results could be achieved by simply low-pass ltering
a single-look image [69]. As a result, the multilook approach is not considered very
valuable for enhancing image understanding or preserving detail.
3.2.2

Lee and Kuan Filters

The Lee lter [65], [66], and the Kuan lter [63] are two adaptive lters commonly
used to reduce speckle in SAR imagery. They are based on the multiplicative model
of speckle noise
I(x, y) = RCS(x, y) u(x, y),

(46)

where I is the image intensity, RCS is the radar cross section, and u is the speckle
noise.

The Lee lter approximates the multiplicative noise model with a linear
48

model that resembles a Taylor series expansion and then applies the minimum mean
square error (MMSE) criterion to the linear model. The Kuan lter transforms the
multiplicative model into a signal-based additive model and then in a similar fashion
to the Lee lter applies the MMSE criterion to the additive model [88]. The lters
have very similar forms as both vary the output by minimizing the mean squared error
between the current intensity and the expected t, which is based on local image
statistics.

In this way excess pixel intensity uctuations are reduced without an

explicit model of the underlying image [80]. The Kuan and Lee lters use weighted
linear combinations of the mean pixel intensity, I w , of a local window around the
center pixel, Ic , to generate an output, I, given by
I = I w + k(Ic I w ).

(47)

Lopes [72] generalized the ltering process by dening a set of coecients that characterize the nature of the speckle noise and the local texture in the image. These
are the coecients of variation
Cu =

u
,
u

CIw =

Iw
Iw

of the speckle noise and the image window respectively.

(48)
Because the statistics, u

and u, of the speckle are not known explicitly, they are approximated as the standard
deviation and mean of a representative (relatively large but homogeneous) subregion
within the speckled image. The Lee and Kuan lter weights can then be dened in
terms of the coecients of variation as
Cu2
CIw 2
(1 (Cu2 /CI2w ))

kL = 1
kK =

(1 + Cu2 )

Lee weighting function

(49)

Kuan weighting function.

(50)

Lopes also proposed an enhancement to these lters by establishing a set of criteria


for choosing the lter weight. In the enhanced lter implementations, the image is
divided into three distinct classes:
49

Homogeneous regions, in which simple smoothing can be used to reduce speckle


Heterogeneous regions, in which texture must be preserved while speckle is
reduced
Regions with isolated point targets that should be preserved.
The local image content is classied as one of the three classes using the coecients
of variation, and an appropriate lter weight is assigned based on the criteria
k = 0,

CIw Cu ,

k = kL , kK , kLm ,

Cu < CIw < Cmax ,

k = 1,

CIw Cmax .

(51)

Cmax is a threshold that is often chosen to be the largest valued local variation coecient (calculated over small windows) observed within the representative speckle
subregion. In addition, the modied Lee weighting function was proposed that provides smoother transition between the lter weights across all three classes.

The

modied Lee weighting function is given as




kLm


0.1(CIw Cu )
= 1 exp
.
(Cmax CIw )

(52)

For each of these three lters (Lee, Kuan, modied Lee), as the pixel intensity variation within the current local window approaches the expected speckle variation, the
output value trends toward the mean intensity, Iw , of the local window. The larger
the pixel intensity variation becomes within the window the more the lter output
approaches the current center pixel value, Ic . Figure 21 shows a plot of the weighting
functions for all three lter implementations using the Lopes criteria in equation (51).
The modied Lee lter provides a smooth transition through the entire range, while
the standard Lee and Kuan lters result in a discontinuity at Cmax [72].

As the

images in gure 22 illustrate, the processed output of all three lters is very similar.
This would be expected given the similarity in the weighting functions.
50

All three

Weighting Functions Lee, Kuan, and Modified Lee Speckle Filters


1
Lee
Kuan
Modified Lee

0.9
0.8
0.7

Weight

0.6
0.5
0.4
0.3
0.2
0.1
0
0.4

0.45

0.5

0.55
0.6
0.65
Coefficient of Variation

0.7

0.75

Figure 21: Weighting functions for the Lee, Kuan, and Modied Lee speckle lters
using Cu = 0.450 and Cmax = 0.705.

lters do a reasonable job of removing the speckle in relatively homogeneous regions,


but they sacrice edge denition in low contrast regions. [80].
3.2.3

Model Based Image Filtering

Model based despeckling techniques assume prior knowledge of the statistical characteristics of the radar cross section (RCS) and use this information to generate
improved estimates of the same.

One such lter is the maximum a posteriori

(MAP) lter, which assumes that the probability density function (PDF) of the
RCS is either known or can be modeled a priori [63], [80].

Using Bayes rule and

the variable substitution for simplication of RCS = R, the conditional probability


51

(a)

(b)

(c)

(d)

Figure 22: (a) The original SAR chip image HB03333 of a BMP2 from the MSTAR
database. (b) Speckle reduction illustration using the Lee lter. (c) Speckle reduction
illustration using the Kuan lter. (d) Speckle reduction illustration using the Modied
Lee lter and a 5 5 window. The top left 32 32 pixel region of the original image
was used to establish Cu = 0.450 and Cmax = 0.705 for the modied Lee lter.

52

P (RCS|I) = P (R|I) can be written as


P (R|I) =

P (I|R)P (R)
,
P (I)

(53)

where I is the observed SAR image intensity that has been corrupted by speckle noise
(u) as given in the multiplicative noise model in equation (46).

The logarithm of

the conditional probability in (53) is then maximized by taking the partial derivative
with respect to the RCS (R) and setting it to zero resulting in

ln P (I|R) ln P (R) 
+
R
R R=R

= 0.

(54)

M AP

This yields the maximum likelihood (rst term) of the detected image intensity once
the RCS model is given, and the maximum a priori (second term) value of the mean
image intensity [71].

Kuan et al. [63] proposed a non-stationary mean and non-

stationary variance model, where the underlying scene has a Gaussian PDF,


[R R]2
1
exp
P (R) = !
,
2R2
2R2

(55)

where the mean, R, and variance, R2 , are estimated from the SAR image based on the
local statistics of a moving window (similar to the Lee lter). Solving the maximum
a priori component of equation (54) using a Gaussian PDF leads to the cubic MAP
equation
3
2
2
RM
AP R RM AP + R (RM AP I) = 0,

(56)

where the real root of the equation, RM AP , has a value that lies between the observed
intensity I and the local mean R. The image ltering is achieved by replacing the
window center pixel with the real solution/root of the Gaussian MAP equation. An
example of a Gaussian MAP processed output image is shown in gure 23.
The Gamma distribution has been presented as an improvement to the Gaussianbased MAP lter. It has been established [55], [57], [81], that the detected intensity
in a SAR image can be described by a K-distribution for a wide variety of natural

53

scattering scenes. Lopes et al. [72], [71] claim that the interaction of a coherent wave
with a Gamma distributed surface leads to a K-distributed observation, suggesting
the use of the Gamma PDF,


R

R1 ,
exp
P (R) =
R
R ()

(57)

to represent the underlying scene RCS. This provides a more realistic representation
of the underlying texture when looking at ecological materials in natural wooded and
vegetated regions. The term is a heterogeneity parameter given by
=

(1 + Cu 2 )
,
(CIw 2 Cu 2 )

(58)

that is approximated from the coecients of variation of the speckle noise, Cu , and
the lter window, CIw , as given in equation (48). Substituting the Gamma PDF into
the Bayesian MAP equation (54) leads to the Gamma MAP lter equation

2
RM
AP

1
I R
+ 1 + 2 + R RM AP
= 0,
Cu
Cu 2

(59)

where I is the current pixel observation and R is the mean of the local lter window.
The only positive root of the Gamma MAP lter equation is
" 


2
2
C1u 2 1 R + R C1u 2 1 + 4
RM AP =
2

I R
Cu 2

(60)

where and Cu are given in equations (58) and (48) respectively. The Gamma MAP
lter is a non-linear solution for the underlying scene as a function of the current
observation, I, and the local a priori mean, R, as dictated by the coecients of
variation, Cu and CIw . Figure 23 shows sample images of the Gaussian and Gamma
MAP lter outputs.
The correlated neighborhood model uses a similar PDF modeling and Bayesian
approach with the dierence being that local statistics are used to ensure the reconstruction is as smooth as possible and yet consistent with the local region.
54

Other

(a)

(c)

(b)

Figure 23: (a) Subregion of the original SAR clutter image HB06211 from the
MSTAR database. (b) The speckle reduction results using the Gaussian MAP lter
with a 7 7 window. (c) The speckle reduction results using the Gamma MAP lter
with a 7 7 window. The bottom right 50 50 pixel region of the original image
was used to establish a Cu = 0.496, while Cmax = 0.558 for Gamma MAP lter.

55

derivations and combinations of these methods have also been proposed such as
the correlated neighborhood maximum likelihood and the correlated neighborhood
gamma MAP methods [80].

Each process tends to address one aspect of speckle

reduction (either in uniform regions, regions with known texture, or point targets).
The selection of a window size presents a tradeo between the amount of speckle
reduction and the preservation of image detail in each of these approaches.
Enhancements to all of these adaptive lters can be achieved by taking into account the scene structure in the SAR image. The simplest way of doing this is to
identify edges and varying texture regions within the scene. By choosing an appropriate lter and window size, a reconstruction can be generated that accommodates
these various natural features, without over or under smoothing the content.
Another approach is to apply the adaptive despeckling techniques iteratively to
the same image.

Each time the algorithms are adapted to the latest iteration of

the image resulting in a new reconstruction.

This is known as nonlinear iterated

processing. While the results show considerable improvement in terms of despeckling


performance, a similar loss of scene structure tends to occur [80].
3.2.4

Spectral Estimation and Superresolution Methods

Resolution enhancements beyond those achieved through the traditional post processing techniques have attracted a signicant amount of interest in the SAR community.
Most of the work has concentrated on the application of spectral estimation techniques to the problem of accurately estimating the power spectral density of SAR
phase history data. The eld of spectral estimation can be broadly subdivided into
two groups: classical or Fourier methods and modern spectral estimation methods.
The classical methods are non-parametric and include the Fourier based periodogram,
windowed periodogram, and Blackman-Tuckey spectral estimation methods.

The

modern spectral estimation techniques generally rely on modeling of the spectra to

56

generate a spectral estimate. In this case, the problem becomes one of accurately estimating the model parameters, and as a result they are often referred to as parametric
spectral estimation methods [61].
The motivation for pursuing spectral estimation techniques for the SAR application is the hope of enhancing the resulting SAR images by modifying the phase
history data prior to image formation.

This requires the power spectral density

(PSD), which is dened as


Pxx (f ) =

rxx [k] exp(j2f k),

k=

1
1
f ,
2
2

(61)

for a wide sense stationary random signal, x[n], where rxx [k] is the autocorrelation
function of x[n]. However, by noting that the limits on the integration go to innity
one can quickly recognize that the true PSD of a signal is not realizable for real
world limited duration signals. Instead, the best that can be done is to generate an
accurate estimate of the PSD for a signal that is limited in scope. It is the quest for
this PSD estimate that forms the basis of spectral estimation theory and practice.
Simply put, the goal is to estimate the PSD of an innite sequence (with an innite
autocorrelation function) using a nite portion of that sequence (and consequently a
limited autocorrelation function).
Classical Fourier spectral estimation methods are the most widely used.

They

only operate on those values within a nite observation range and consider all values
outside that range to be zero. The periodogram and the Blackman-Tukey spectral
estimators are the most common of the classical PSD estimators with the periodogram
given by


2
N
1



1 


x[n] exp(j2f n) .


PP ER(f ) =


N  n=0

(62)

However, as with most Fourier based techniques there is a tradeo in this case as
well between the variance and the bias or resolution of the PSD estimate.

In the

case of the periodogram, the estimate average does converge to the true average
57

value as the length of the data set N large, but the variance remains constant
while the bias grows (and the resolution gets worse).
to be an inconsistent estimator.

As a result it is considered

An alternative, called the averaged periodogram,

segments the input sequence into blocks, calculates the individual periodograms of
the windowed blocks, and then averages them to produce the resulting estimate. This
has the eect of reducing the variance while keeping the bias small through the use of
smaller sequences. The net result is better resolution and lower variance [61], [32].
The Blackman-Tukey spectral estimator [10] is also a Fourier based technique and
is dened as
PBT (f ) =

M


w[k]
rxx [k] exp(j2f k),

(63)

k=M

where rxx [k] is the autocorrelation estimate of x[n], and w[k] is a windowing function
called a lag window with the properties:
0 w[k] w[0] = 1
w[k] = w[k]

(64)

w[k] = 0 for |k| > M.


This process estimates the PSD by taking the Fourier transform of the weighted autocorrelation function. The Bartlett window is a common choice for the lag window
since it meets the requirements in equation (64), and it does not result in a negative
Fourier transform.

The inherent tradeo between bias and variance still remains.

However the length of the lag window, M, can be adjusted to achieve the best compromise between the two for the given application [61].
In contrast to the classical methods, modern spectral estimation techniques employ models of the PSD, which can be represented by a limited number of variables,
to generate the spectral estimates.

The model parameter values are derived using

the known autocorrelation values derived from a nite signal. In general, these models do not put any constraints on the values of the sequence outside the observation
range. As a result, the models can be used to extrapolate the PSD sequence outside
58

the observed range. If the model is accurate, then the theory implies that expanding
the PSD sequence through extrapolation will lead to higher spatial resolution (superresolution) in the image. However, all of this relies on the selection of an accurate
model that does not articially introduce artifacts into the PSD estimate that are
not part of the true PSD. Very little guidance exists on the appropriate selection
of models [61]. In addition, these modern spectral estimation techniques eliminate
the window functions used in the classical Fourier methods and the distortions introduced by those windows. This alone accounts for some of the realized improvement
in terms of resolution and image quality [26].
With this modeling approach, the spectral estimation problem shifts to being
more of a parameter estimation problem. As a result, the broad eld of statistical
parameter estimation can be employed in generating accurate models of the PSD.
These models are often based on a priori knowledge of the SAR signal generation
process. The only drawback to modeling-based spectral estimation is the lack of a
generally accepted denition of what it means to be optimal, and this in turn has led
to a proliferation of models with all sorts of claims regarding ecacy.
The suite of rational transfer function models consists of the autoregressive (AR)
model, the moving average (MA) model, and the autoregressive moving average
(ARMA) model, which is given as
x[n] =

p


a[k]x[n k] +

k=1

where u[n] is a driving function.

q


b[k]u[n k],

(65)

k=0

The parameter estimation objective is to obtain

values for a minimum number of a[k] and b[k] parameters that allow an accurate
representation of the spectrum. A closer inspection of equation (65) shows the two
distinct AR and MA components in the rst and second terms respectively.

The

all-pole AR term is good for modeling sharp peaks in the spectrum, but it does not
represent the valleys very well. In contrast, the all-zero MA term does a good job of
modeling the deep valleys in the spectrum, but it is insucient in representing sharp
59

peaks. The combined ARMA model provides the capability to adequately model both
extremes at the cost of a higher number of parameters.

Choosing an appropriate

model order becomes an iterative process, and while some knowledge of the process
being modeled can be helpful, the nal choice is typically derived experimentally [61].
Of these three, the AR models remain the most popular, as the model parameters
can be found by solving a series of linear equations, known as Yule-Walker equations.
The Levinson algorithm [67] further simplies the establishment of a suitable AR
model by providing an ecient method for recursively solving these equations.
While the last few paragraphs have focused on some of the more traditional spectral estimation processes, DeGraaf [26] provides an excellent review and comparison
of some of the latest spectral estimation techniques and models.

In addition to

some of the superresolution techniques already presented in this section, DeGraaf


includes discussions of the adaptive sidelobe reduction (ASR) model, the minimum
variance method (MVM), the reduced rank minimum variance method (RRMVM),
the space variant apodization (SVA) algorithm, the super space variant apodization
(SSVA) algorithm, the signal-clutter subspace decomposition, the eigenvector (EV)
and multiple signal classication (MUSIC) methods, Pisarenkos method, the AR
spectral averaging model (ARSA), the Tufts-Kumaresan AR (TKAR) variation, and
the parametric maximum likelihood (PML) algorithm. In this extensive comparison
the two extremes were the FFT method and the PML as being the least and most
computationally intensive.

Both the ASR and SVA methods reduce sidelobe dis-

tortion, but suer from speckle noise artifacts.

The periodogram method reduces

speckle by using what is equivalent to a series of xed lters, but at the expense of
resolution and higher sidelobe distortion. Both of the MVM and RRMVM methods
reduced sidelobe distortion while reducing speckle eects by using adaptive lters.
However, the resolution improvement is dependent to some extent on the nature of
the image content, and computation complexity starts to become a concern. The AR

60

methods, the EV/MUSIC processes, and the Pisarenko algorithm all produce good
resolution images with reduced speckle and limited sidelobe interference. Again, the
extent of the goodness of these results is somewhat dependent on the nature of the
image scene content, and these methods are computationally burdensome [26].

In

addition there are multiple variations to these models that allow them to perform
better for certain SAR implementations [97], [70], [46]. One thing that most of them
tend to have in common is the goal of trying to produce images that appear to have
been obtained from higher resolution sensor hardware while reducing the impact of
the speckle noise.

3.3

Proposed Approach for SAR Imagery

The approach introduced here does not attempt to derive quality improvement through
power spectrum modeling like the superresolution schemes, nor does it attempt to
suppress noise based on statistical characteristics as some of the ltering and denoising
schemes attempt to do. Rather, the approach taken here is to provide a decomposition where texture, object and terrain features, and speckle can be decoupled (to
some extent), modied, and reconstitutedthe idea being that having these components isolated, modications can be tuned to make the enhanced images approximate
SAR images of higher resolution.

The generalized Gaussian analysis-by-synthesis

(GGABS) model is proposed as a mechanism to do just that.


One indicator of eectiveness in this application is subjective quality.

This is

important because SAR images are routinely reviewed by image analysts trained to
recognize specic threats. Thus, for them, preserving the familiar appearance of the
images is an important propertya property not present in many enhancement and
superresolution techniques.

In the text by Oliver and Quegan [80], this concept is

summed up as follows.
Ultimately, image understanding is rooted in experience and empiricism,

61

both as regards our broad understanding of how the world is and in experimental ndings that guide our perceptions of what aspects of the image
should be examined. pg. 81.
Another measure of eectiveness for the SAR application is automatic target
recognition (ATR) performance improvement. ATR performance is signicant, because automated screening is often desired as a way to handle massive amounts of
acquired SAR images and to assist in detecting threats and identifying areas of interest.

Current target recognition routines perform signicantly better on higher

resolution SAR imagery [83]. Thus, the extent to which an enhancement algorithm
can improve ATR performance can be viewed as a measure of its value.
To facilitate control over the speckle texture, the Gaussian waveforms in the
GGABS model are biased to have sizes approximately equal to or less than the size
of the speckle grains.

Consequently, low frequency spatial variations are captured

by the aggregate sum of high frequency Gaussians as opposed to a smaller sum of


low frequency Gaussians. The shapes of these Gaussian functions can be modied
to provide isolated control over speckle, object boundaries, and contrast.
To provide a more explicit representation of the low frequency information in
the images, an initial front-end extraction operation is performed, which results in
a smooth denoised representation of the original image. This low frequency version
of the original image is combined with the resulting GGABS model representation
according to
output = process(original) + (1 ) median(original),

(66)

where determines the relative weighting of the two components. A block diagram
of the algorithm is given in gure 24. For this particular application, a 9 9 median
lter is used to perform this low frequency extraction, although a more sophisticated
denoising lter could be applied if the need called for it.
62

i [n1,n2]

EXTRACT
BEST
GAUSSIAN

+
_

DENOISED
IMAGE
EXTRACTOR

MODIFY
ANALYSIS
GAUSSIAN

MODIFY
SYNTHESIS
GAUSSIAN

ACCUMULATE
GAUSSIANS

ACCUMULATE
GAUSSIANS

MODIFICATION/
ENHANCEMENT

~
i [n1,n2]
RECOMBINATION

Figure 24: Block diagram of the SAR resolution enhancement algorithm.

63

3.4

Generalized Gaussian Model Considerations

While applying the generalized Gaussian analysis-by-synthesis (GGABS) model to


the task of improving the resolution by controlling the speckle in the SAR application may appear straightforward, there are several subtleties that require further
explanation. These include the method used for extracting appropriate generalized
Gaussian parameters to represent the image speckle , and the various options and alternatives for performing the modications of these waveforms to achieve the desired
resolution enhancement results.

Finally, some method must be employed to stop

the analysis-by-synthesis process. All of these topics are discussed in the following
sections.
3.4.1

Parameter Extraction

The GGABS models search objective for the SAR application is to nd generalized
Gaussian functions that best represent the speckle in the image. This is accomplished
by targeting the higher amplitude, very narrow acircular peaks that characterize the
speckle using a peak-picking based search method. While more sophisticated search
routines that navigate the parameter space in some optimal way can be employed, for
this application, subjective quality is a major motivation. Tying the search process
directly to the spatial characteristics of the speckle in a simple yet pragmatic way
ensures that the resulting extractions are directly tied to the motivation, and not
subject to some arbitrary optimization constraint. While this might not be the most
ecient process, it is shown to be sucient for achieving the desired results.
Reasoning suggests that the largest amplitude peak in the residual image represents a large portion of the overall energy in the image.

By modeling this peak

rst we can be fairly certain that the residual error is decreased appreciably. While
the separable formulation of the 2D generalized Gaussian has what might appear to

64

be some major drawbacks in terms of a circularly smooth representation (see section 2.2), the combination of the successive approximation of the ABS and the very
narrow nature of the speckle peaks being modeled makes these negative eects somewhat inconsequential.

The separable formulation does have a major advantage in

that once a peak has been identied, the remaining lobe width and shape analysis can
be performed independently along each axis. This further reduces the computational
complexity of the search. The generic search algorithm works as follows:
1. Find the largest peak in the current residual image.
2. Determine the spatial position and maximum value of the peak and store these
as the weight (Al ) and position (1,l , 2,l ) of the current extracted function. This
immediately xes three of the generalized Gaussian parameters and reduces the
parameter search space under consideration.
3. Search outward along each axis in both directions and identify candidate lobe
width () and shape () parameters that fall within the expected range of values
for speckle.
4. For each axis chose the smallest parameter values and store them as the actual
generalized Gaussian lobe width (1,l , 2,l ) and shape parameters (1,l , 2,l ).
The reason for choosing the smallest lobe width and shape parameters is to ensure that the extracted function ts (as much as possible) within the shape of the
peak it is modeling.

This helps to ensure convergence of the entire process.

The

search criterion for identifying candidate lobe width values entails nding image pixel
intensity values that correspond to the associated Gaussian standard deviation values
(which is approximately 0.6 times the peak value, Al ) by searching outwardly from
the current peak along the row or column paths. Once a pixel is identied that meets
this criterion, the spatial oset between the peak location and this pixel is recorded

65

as a candidate lobe width. If the search condition is not met prior to encountering a
local minimum, then the local minimum location is used instead. Again the smallest
value in each axis direction becomes the extracted Gaussian lobe width. Once the
lobe widths in each direction have been identied, the shape parameters, 1 and 2 ,
are chosen to minimize the error between the generalized Gaussian model and the
speckle at that spatial position.
3.4.2

Parameter Modication

In general, the packing density of the synthesis Gaussians and resulting speckle in
the output image is controlled by the size of the analysis function.

An analysis

Gaussian with lobe widths that are some fraction smaller than the extracted function
will generate residual lobes during the subtraction process. These residual lobes are
then modeled at some later iteration by the ABS using smaller Gaussian functions
that are spatially near the originally extracted Gaussian.

This has the eect of

forcing the model to use more Gaussians to approximate the speckle. The result is
an image with eectively higher speckle density.
The synthesis Gaussians tend to dictate the size and spatial extent of the speckle
granularity in the nal image. Accumulating synthesis functions, which are reduced
lobe width versions of the extracted functions, will result in a reconstructed image
with narrower speckle. This eect can be further enhanced by reducing the shape
parameters that sharpen the synthesis Gaussians. The result is an image where the
spatial extent of the speckle has been reduced.
By operating the GGABS model in the decoupled mode and carefully modifying
the Gaussian parameters in both the analysis and synthesis stages one can eectively
control the nature of the speckle. This results in a synthesis image that eectively
has ner granularity and higher speckle density.

66

3.5

SAR Results

The two relevant indicators of the generalized Gaussian models eectiveness for the
SAR application are subjective quality assessment and automatic target recognition
(ATR) performance improvement. In addition, some insight into the uniqueness of
this model and approach can be gained by comparing it to some of the traditional despeckling methods discussed in section 3.2. The subjective quality between processed
and unprocessed SAR image scenes was assessed using the formal stimulus-comparison
categorical judgement scale as dened in the CCIR Recommendation 500-4 [2]. In
a similar fashion, ATR performance improvement was evaluated by comparing the
classication results between a set of original unprocessed images and those that had
been resolution enhanced by the generalized Gaussian model.

The testing results

showed gains in both subjective quality and ATR performance after the images had
been processed using the generalized Gaussian model.
A set of general urban image scenes was used during the subjective evaluation,
while the public MSTAR database was used in the ATR performance testing. Both
sets of images were processed by the generalized Gaussian Model in the decoupled
mode using a density based stop condition of 75% to terminate the analysis-bysynthesis process.

The best recombination factor was determined experimentally

to be = 0.6 and was held constant for all of the tests.


3.5.1

Subjective Validation

Resolution enhancement was performed on a set of urban scene SAR clutter images.
These enhanced images were viewed by 7 dierent assessors and rated in terms of
resolution improvement over the original SAR images using the CCIR Recommendation 500-4 stimulus-comparison subjective test. In this method the assessor is shown
a test and reference image and is permitted to switch freely between the two images
until a relative grade for the test image is established.

67

Results are then recorded

Table 1: Subjective analysis mean opinion score for the top ve images.
Image
Image
Image
Image
Image

1
2
3
4
5

Mean Opinion Score


2.07
1.86
1.67
1.6
1.43

on the seven-point CCIR comparison scale (3=Much Better, 2= Better, 1=Slightly


Better, 0=Same, 1=Slightly Worse, 2=Worse, 3=Much Worse).

Four one-

meter resolution images were used to generate the 20 test images. These test images
were generated to have samples distributed throughout the scoring range in terms of
enhancement (i.e. it had samples of good and bad enhancement), and they represented a subset of possible reconstruction congurations of the generalized Gaussian
ABS model.

Each test consisted of 20 dierent presentations repeated twice in

random order. The viewing distance was 4 times the image height.
The highest scoring image had a mean opinion score of 2.07. The majority of the
images (80%) had a mean score greater than zero which is parity.

The average of

the mean scores for the top ten images was 1.41. The results for the ve best scoring
images are shown in table 1.
These results clearly show that resolution enhancement by means of the generalized Gaussian model does provide signicant subjective improvement over the
originals. Examples of an original SAR clutter image, a resolution enhanced version
of this image, and an image ltered by the Lee lter and a 5 5 window are shown
in gures 25, 26, and 27. The higher density and ner granularity speckle is clearly
evident in the resolution enhanced image, while edge and feature denition associated
with the original have been preserved. In contrast, the Lee ltered image is clearly
smoothed and much of the texture in the image has been removed.

68

Figure 25: Original SAR clutter image of an urban scene.

69

Figure 26: Resolution enhanced SAR clutter image with modication factors of 0.8,
0.4, 0.6, and 0.4 for the analysis lobe width, synthesis lobe width, analysis shape, and
synthesis shape parameters respectively.

70

Figure 27: SAR clutter image processed using the Lee lter with a 5 5 window.

71

3.5.2

MSTAR and ATR Results

The public MSTAR database, which contains both training and testing images of
the T-72, the BMP-2, and the BTR-70, was used to evaluate the eectiveness of the
resolution enhancement for ATR performance. In order to establish a benchmark for
comparison, a higher order neural network (HONN) ATR algorithm [52] was trained
and tested on the database of images without any resolution enhancement.

The

same set of images were then processed using the generalized Gaussian model and
evaluated by the same ATR algorithm. The generalized Gaussian model parameter
modication conguration is given in table 2. In this application the best conguration was established through an empirical process that systematically adjusted the
model parameters and observed the classication output.

Modications that im-

proved overall results were kept and recorded while those that decreased performance
were discarded.
Table 2: Multiplication factors used to modify the analysis and synthesis Gaussians
that resulted in the best performace for the SAR ATR application.
Gamma
0.6

Analysis
mf mf
0.6 0.6

Synthesis
mf mf
0.3
0.8

The results given in table 3 show the classication performance improved signicantly for the BMP2 targets, while the T72 targets achieved a 100% correct classication score. The BTR70 classication results improved by 1.4% with the resolution
enhanced data.

The combined score of 97.2% using the resolution enhanced data

is the highest reported score to date for this ATR algorithm using the 22.5 degree
MSTAR target chip data set.
3.5.3

Comparisons with Traditional Despeckling Methods

The primary goal of the traditional despeckling techniques discussed in section 3.2
is to eliminate the speckle from the image by means of various statistically based
72

Table 3: Classication results for the 22.5 degree MSTAR target chip data classied
by the ATR process with and without resolution enhancement.
Without Enhancement
Resolution Enhanced
Test Images T72 BMP2 BTR70 T72 BMP2 BTR70
582 T72
99.7% 0.15% 0.15% 100%
0%
0%
587 BMP2
1.5% 90.3%
8.2%
0.5% 94.4%
5.1%
196 BTR70 2.7%
1.4%
95.9% 2.0% 0.7%
97.3%
% correct
95.3%
97.2%
ltering processes. The objective the GGABS, however, is to modify (not eliminate)
the existing speckle in such a way that the resulting image assumes an appearance
of higher resolution with smaller speckle while retaining the inherent structure in the
scene. As a result, the images generated by the GGABS process are very dierent
in appearance from the images produced by the traditional speckle reduction lters.
Comparing and contrasting the eectiveness of each of these methods is really a
function of the measure used and the attributes of the images that contribute to that
measure.
In the world of image compression the peak signal-to-noise ratio (PSNR) is an
established measure that has been used extensively. It is easy to compute but has
the drawback that it often does not correlate well with subjective quality.

In the

world of SAR speckle suppression, the eective number of looks (ENL) [40] measure
plays a similar role. Consequently, the ENL is examined as a performance measure
for the purposes of comparison.
The ENL is a simple statistical measure given by
ENL = (/)2,

(67)

where and are the mean and standard deviation respectively of the intensity values
within a relatively uniform region in terms of the image texture.

The assumption

is that the intensity variation observed in this uniform region is due exclusively to
the speckle noise present and not the result of underlying image structure or image
73

content based texture.


The name of this measure is derived from the old multi-look processing where
multiple independent realizations of the same image scene were averaged together
to remove the eect of the speckle.

In this case however, the measure reports a

comparable result as the eective number of looks that would be required in the
multi-look speckle reduction method.

Straightforward inspection of the measure

clearly indicates that it simply quanties the reduction in pixel intensity variability
as a ratio of the mean. This essentially equates to measuring the smoothness of the
image and not the resolution of the image. Figure 28 shows the ENL performance on
two images of the exact same image scene but with dierent resolutions. A 32 32
region in the bottom left corner of each image was used to calculate the measure. The
original resolution image (1m) had an ENL value of 2.5996, while the low resolution
image (2m) yielded a much higher ENL value of 7.4372. The higher ENL value would
indicate better speckle suppression performance, however the higher resolution image
clearly has better feature and edge denition.
Results were generated using the ENL as a criterion for the Lee lter, the enhanced
Lee lter, the Kuan lter, the enhanced Kuan lter, and the Gamma MAP lter.
A uniform region within the SAR clutter image shown in gure 29 was selected
to generate the ENL values for the dierent lters with various window sizes.

In

addition, the GGABS model was used to process the same region of the test image
using a couple of dierent parameter congurations. In order to generate the best
possible results for the GGABS, a dynamic optimization approach was devised to
choose the Gaussian parameter values that yielded the highest ENL value.

The

basic optimization process consisted of the following steps.


1. Set all parameters to a value of 1. This is the baseline setting that allows for
complete reconstruction.

74

(b)

(a)

Figure 28: The HB03424.0015 SAR chip image from the MSTAR database of a T72
tank shown in (a) the original resolution (1m) with an ENL = 2.5996 and (b) a low
resolution (2m) version with an ENL = 7.4372. This illustrates the ENL tendency
to measure image smoothness and not resolution.

2. Process the image region and calculate the ENL for the baseline case.
3. For the single Gaussian parameter, increase the value by 0.1 while keeping all
the other values constant, and recalculate the ENL value.
4. Continue to increment the Gaussian parameter value by 0.1 if the ENL increases,
otherwise decrement the Gaussian parameter value by 0.1. Recalculate the new
ENL value and repeat this entire step.
5. After a xed number of iterations of the previous step, stop the process and
determine if a local maximum has been reached in terms of the ENL criteria.
6. If not at a local maximum, repeat step 4.
7. If truly at a local maximum, record the Gaussian parameter value, and then
reset it to the baseline value of 1.

75

(a)

(b)
Figure 29: (a) Original SAR clutter image and (b) the region of this image used
to evaluate the ENL for the various traditional despeckling lters and the GGABS
model.

76

Table 4: Resulting parameter and ENL values for the GGABS model when optimizing the ENL for each parameter independently.
Parameter
Analysis Multiplier
Synthesis Multiplier
Analysis Alpha
Synthesis Alpha
Density

Value
ENL
0.9
2.4847
the larger the higher the ENL
1
2.4297
1.4
2.5167
1.1
2.6608

8. Repeat steps 3-7 for the other remaining Gaussian parameters.


This process resulted in the set of Gaussian parameter values and associated ENL
values shown in table 4 that individually maximized the ENL while all other parameters were set to 1.

As indicated in the table, the synthesis multiplier parameter

had a direct and very strong correlation with the ENL value. However, continuous
adjustment of this parameter led to image reconstructions that were meaningless in
terms of image content.

As a result it was set to the baseline value for most of

the experiments. The GGABS model was then congured with these values as the
starting point, and steps 3-8 in the search routine were repeated to arrive at the nal
set of optimal Gaussian parameter values based on the ENL. These parameter values
along with the associated ENL values are given in table 5 along with the baseline
ENL value and an example of how the synthesis multiplier directly impacts the ENL
value.
The ENL results of processing the same region using the traditional despeckling
lters and various window sizes is given in table 6. By examining the results of tables
5 and 6, it is clear that the GGABS model does a poor job of eliminating the speckle
as measured by the ENL.

3.6

Discussion and Future Work

This chapter presents an application of the generalized Gaussian ABS model to a


coherent imaging task in the form of a SAR image resolution enhancement tool. The
77

Table 5: Resulting parameter and ENL values for the GGABS model when using
the values from table 4 as the initial starting points for the optimization over ENL.
The rst row gives the baseline results, while the last row shows the eect of the
synthesis multiplier on the ENL value.
Analysis
Multiplier
1
0.9
0.9
0.9
0.9
1

Synthesis
Multiplier
1
1
1
1
1
3

Analysis
Alpha
1
1
1
1.4
1.6
1

Synthesis
Alpha
1
1.4
1.1
1.4
1.5
1.4

Density

ENL

1
1.1
1.1
1.1
1.1
1.1

2.4182
2.8110
2.8586
2.8898
2.8990
4.3838

Table 6: The ENL results of processing the image region from gure 29 using the
dierent traditional despeckling lters with various window sizes.
Filter
Lee
Enhanced Lee
Kuan
Enhanced Kuan
Gamma MAP

ENL Values for


3 3 5 5
7.107 13.6055
6.1597 8.4444
7.3213 14.9868
6.281 9.0098
6.205 8.8151

78

Various Window Sizes


7 7
99
20.3884
28.0491
9.801
12.3769
23.8511
34.3656
10.8267
14.3027
10.3812
13.4761

GGABS model was congured to operate in the decoupled mode using a basic peakpicking approach for extracting the Gaussian parameters. The GGABS processing
results were evaluated subjectively by means of the CCIR Recommendation 500-4
stimulus-comparison subjective test as well quantitatively in terms of ATR performance improvement. The results in both cases showed signicant improvement over
the unprocessed SAR images. Finally, the GGABS model was compared with some
of the traditional speckle reduction lters by means of the eective number of looks
(ENL) measure.

However, as the results indicate, the GGABS model was not in-

tended as a speckle removal tool.

Instead, it was intended as tool for performing

post processing on SAR data where the speckle could be decoupled from the structure in the image, modied in some meaningful way, and then resynthesized with the
appearance of an image of higher resolution.
The ability to tune the Gaussian functions to model the speckle in the images
provides a signicant amount of exibility not only when extracting the speckle,
but in implementing meaningful modications. Again the ABS process provided a
framework for iteratively extracting the speckle in an intuitive and straight forward
manner. These key aspects of the generalized Gaussian ABS model make it ideally
suited for SAR image data for the purpose of achieving resolution enhancement.

79

CHAPTER IV

A CLUTTER COMPLEXITY MEASURE FOR ATR


CHARACTERIZATION OF FLIR IMAGES

This chapter presents a unique application of the generalized Gaussian analysis-bysynthesis model.

It serves as one of several image analysis tools used to estab-

lish a clutter complexity metric.

The goal of the clutter complexity measure is to

characterize the Automatic Target Recognition (ATR) performance when applied to


Forward-Looking Infra-Red (FLIR) image data.

4.1

Introduction

The Defense Advanced Research Projects Agency (DARPA) is seeking to develop the
next generation of combat systems for the US Army through the Future Combat Systems (FCS) program [25]. Its goal is to develop concepts and designs for a network of
lethal manned and unmanned units that are able to provide mobile-networked command, control, communication, and computer capabilities on the battleeld. These
functions would support multi-mission objectives including but not limited to adverseweather reconnaissance, surveillance, and targeting and acquisition [5].
A key component to ensuring battleeld superiority is the ability to achieve information and intelligence domination. This requires the combination of computational
and cognitive data from a variety of sensors and processes including Automatic Target Recognition (ATR). However, when assessing the battleeld space and making
command decisions, it is important to place more emphasis on the information from
those subsystems with higher reliability than the information from the subsystems
with lower reliability.

One way to do this is to develop metrics that indicate how

80

well the individual subsystems perform at the desired tasks given the current inputs.
The resulting metric values could provide commanders with a condence indication
regarding the outputs from the various subsystems when making command and control decisions. In the case of ATR systems a simple metric that correlates well with
performance could be used to assess the ATR contributions to the overall decision
process. This could also serve as a predictor of images that might suer from potential target misclassications in ATR and thus provide an opportunity to implement
an alternative classication process.
A clutter complexity measure could also aid in the development of future ATR
algorithms by enabling researchers to benchmark the complexity of image databases.
Currently, most ATR systems are developed around a single database, and good
performance across a broad variety of databases can not be assumed. The disparity
across the image databases, which makes some more dicult than others, is the
result of a number of factors including dierent target types, varying environmental
conditions, and varying levels of clutter.

The clutter complexity measure could

certainly be employed as a tool to help establish the complexity of an image database.


In a collaborative research eort with Kaplan et al. at Clark Atlanta University,
we developed and analyzed a measure that characterizes ATR reliability as a function
of clutter complexity [59]. While a brief discussion of the overall system is presented
below, the primary interest in this area is the extension of the generalized Gaussian
model as an image analysis tool to a non-coherent imaging application.

4.2

The Nature of Clutter

One of the main obstacles to comprehending and predicting ATR performance is understanding the eect that clutter has on the detection process. The characterization
and quantication of clutter in infra-red (IR) ATR has remained an ambiguous task
because of its multifaceted nature. While sensor characteristics can be determined

81

by measurements in a lab, and man-made (and even some natural) targets can be
quantied by their shape and material compositions, clutter is not as easily dened.
The U.S. Army once described clutter as spurious or extraneous indications that
can cause the sensor to respond as though a target were present when it is not, can
cause the sensor not to respond when a target is present, or can cause the location
of the target to be sensed with substantial error [4]. The diculty lies in that the
real world contains an abundance of indications that arespurious or extraneous,
and dening them, even in broad terms, is no easy task.
In IR imagery, the manifestations of clutter result from an innumerable quantity
of sources. Consider all of the potential image scene congurations from a seemingly
innite array of terrain and target characteristics and geometries that can yield an
incredible amount of very diverse image content and clutter. Other system related
variables that can also impact clutter (albeit to a lesser extent) include the various
target and sensor geometries, such as the target aspect angles, ranges, and poses and
the sensor elevations.

Both the image content and the system related variability

directly impact image formation and the extent of the clutter that can be a confusor
when detecting a targets IR signature. This can lead to potential misclassications
by the ATR algorithms. Other examples of clutter include the instance where a hot
target leaves a temperatureshadow on a cool background, even after the target is
gone, and the case where vehicle exhaust combines with dust to produce a signature
cloud that resembles a target [64]. These simple examples just scratch the surface
of the multitude of possible manifestations of clutter that can lead to poor ATR
performance.
In addition, special attention must be given to ambiguous situations like images
with poor contrast, images of a single target among other non-target trac, images
with multiple target types, and images that contain high target densities. Each of
these scenarios presents a dierent view on how one might dene clutter complexity.

82

Comparing complexity across all of these situations becomes a subjective process,


and the denitions of complexity can change signicantly. The net result is that any
clutter metric devised is not going to be independent of the target [64].
So while it may be dicult to dene all instances of clutter it is important to at
least attempt to impose a denition on the concept of clutter. Thus, for the purposes
of the discussion presented here, clutter complexity is dened as the extent to which
objects and features in an image scene appear target-like and the density or frequency
with which they occur in that scene [59].

4.3

The Clutter Complexity Measure

The objective of the clutter complexity measure is not to predict the exact performance of a specic ATR system, but rather to place outer bounds on the ATR
performance potential for a given image. As such, an image with a low clutter complexity score would suggest that on average ATR algorithms will perform with a low
rate of false alarms on the given image. In contrast, a high image clutter complexity score would indicate that no ATR algorithm will achieve a low false alarm rate.
In addition, the functional requirements of the Future Combat Systems application
dictate that the computational burden must be relatively low.
Kaplan developed a general approach for measuring the clutter complexity of a
FLIR image that collapses several image derived features into one value that correlates
with ATR performance bounds. This involved (1) partitioning the image database to
minimize all other attributes of the images except for the clutter, (2) evaluating the
ground truth clutter complexity by means of the ATR bounds, (3) training a clutter
complexity measure as the weighted sum of the image derived features such that it
correlated with ground truth over the respective database partition, (4) validating
the measure against both real ATR performance results and the ATR performance
bounds, and (5) rening the measure through the evaluation of the eectiveness of

83

PARTITION
DATABASE

According to target type,


geometry, aspect angle, etc.

ESTABLISH
GROUND TRUTH

ATR performance bounds using


LVQ and optimal templates

TRAIN CLUTTER
COMPLEXITY
MEASURE

Weighted sum of image


processing features

VALIDATE
MEASURE

Compare results of non-training


data with ground truth

REFINE
MEASURE

Refine features to maximize


correlation with ATR bounds

Figure 30: Block diagram of the general approach for generating a FLIR clutter
complexity measure that correlates with ATR performance bounds.
the image features being extracted [60].

A diagram of this approach is shown in

gure 30, and a brief description of these ve steps is given below.


4.3.1

Image Databases and Partitions

The bulk of the development and testing of the measure was done using the COMANCHE FLIR image database, however it is not possible to include those results
in public documents because of the restricted classication of the data.

As a re-

sult, a smaller non-restricted database of FLIR imagery was used to generate the
results presented in this chapter.

To study the correlation between ATR perfor-

mance and clutter complexity it is necessary to hold constant other variables such
as sensor delity and target/sensor geometries. This was accomplished by developing the measure over small partitions of the image database, where the target types,
poses, aspect angles, number of pixels on target, etc. are held relatively xed. As

84

much as was possible, the only attribute that was allowed to vary within the database
partition was the extent of the clutter.
4.3.2

Clutter Complexity Ground Truth

Ideally the clutter complexity ground truth would correlate perfectly with ATR performance bounds for each image in the partition.

However, a satisfactory method

for determining bounds that exactly predicts ATR performance independent of the
ATR structure has not yet been developed. An alternative is to use a nely tuned
ATR and exploit a priori information of the targets to estimate the ATR performance
bound. This is done by training the ATR algorithm using the actual targets in the
scene and adjusting the algorithm parameters to optimize performance.
The wavelet-based learning vector quantization (LVQ) ATR algorithm developed
by the U.S. Army Research Laboratory (ARL) [17], which is one of the top performing
FLIR ATR algorithms [68], was used in this application.

The wavelet-based LVQ

algorithm computes and compares three subbands of the target chip with a set of
target codebooks, which have been computed o-line for a range of poses. A target
and pose hypothesis is generated by choosing the target codebook with the smallest
weighted mean squared error (MSE) across the three subbands.
In order to establish the upper performance bound of the ATR algorithm for
a given target, an optimal target template is generated.

The optimal template

simply consists of the best vector of subbands for this specic target and its given
pose. The correlation of the optimal template at all possible locations in an image
scene is compared to a threshold to determine potential targets.

The threshold is

normalized for each scene by making it proportional to the correlation value of the
optimal template with the actual target in the scene.

Since all other controllable

parameters associated with the scene (such as geometry and aspect angle) have been
held constant in the database partition, the total number of false alarms that result

85

Table 7: List of statistical based image processing features used in the FLIR clutter
complexity measure.
Feature Name
fBm Hurst Parameter
Standard Deviation
Schmieder Weathersby
Homogeneity
Energy
Entropy
Target Interference Ratio
Outlier Ratio

Description
Texture roughness
Global standard deviation
Average local standard deviation
Average pixel variations
Average histogram energy
Average histogram entropy
Average contrast
Average percentage of outliers

can be used to characterize the clutter in the scene. Using this procedure the clutter
complexity ground truth is generated for all the targets under consideration.
4.3.3

The Complexity Measure

The objective of the measure is to derive a single value that represents the extent
of the clutter in the image. This is accomplished by collapsing a number of image
derived features into one value that correlates with ATR performance bounds, which
we use as the ground truth for complexity. The clutter can cause the sensor to yield
both false positives and false negatives with regards to targets in the scene.

As a

result, it is important from an ATR standpoint that the derived image features take
into account the target characteristics, as well as the background texture. The set
of features consists of the eight statistical based features given in table 7 and ve
generalized Gaussian model derived features that are the focus of the research being
presented in this chapter.

The statistical features are mentioned for completeness

but will not be discussed. Further information on these features can be found in [79]
and [90].
The other ve features are derived from generalized Gaussian based decompositions of the scene.

Gaussians are attractive basis functions for performing image

analysis primarily because they are local in extent. This allows the decomposition to

86

easily accommodate local variations and features often associated with natural image
scenes.

The feature values are the number of Gaussians required to approximate

the image given certain constraints. What makes each of the ve features unique is
the constraint on the scales of the Gaussians, the stopping strategies, and the nature
(original or negative) of the image.

A complete discussion of the development of

these ve features is presented later in section 4.4.


The clutter complexity measure, C, is dened as a weighted sum of all thirteen
features given by
C(i) =

13


w(n)Fi (n),

(68)

n=1

where w are the weights, and Fi are the image features for each image, i. A training
partition of the image database is used to compute the best set of feature weights.
They are selected to maximize the correlation between the clutter complexity measure
and the ground truth discussed previously.

As such these weights represent the

optimal linear predictor of the ground truth false alarm count for this set of image
processing features [60].
4.3.4

Validation and Renement of the Measure

Once the weights have been generated, the measure is tested over each image in the
non-training partitions of the database in two ways. In the rst method the results
of the clutter complexity measure are compared against the ground truth values (or
ATR performance bounds) for the various images. The second process compares the
clutter complexity values to actual ATR performance values, which were generated
by applying the ARL Federated Laboratory baseline FLIR detector [16].

In both

of these validation procedures, ATR performance is dened as the number of false


alarms generated in each image by the associated process.

The renement step

involves tuning the individual image features and weights to improve the correlation
between the ATR performance values (or bounds) and the clutter complexity values.

87

4.4

Generalized Gaussian Based Features

The primary contribution of this work to the complexity measure is the development
of ve Gaussian based image features that correlate with clutter complexity.

The

general underlying premise to this approach is that more Gaussians are required to
represent areas of higher image complexity than areas of lower image complexity.
Each feature is derived from the generalized Gaussian ABS decomposition of the
image and the formation of a Gaussian density prole.

This prole is simply the

number of Gaussians in the decomposition that meet the specic features criteria.
The brighter blocks or regions in the image indicate higher values that should correspond with regions of higher complexity in the scene. Examples of Gaussian density
prole images are given throughout this section in gures 31 through 33. The FLIR
image data used in this section to illustrate the various Gaussian density proles was
obtained from the Computer Vision Group at Colorado State University [19].
The exibility of the ABS process enables the model to constrain the search during
the decomposition for Gaussians that meet certain size and shape criteria. Combining this with the ability to perform the decomposition either over the entire image
or in local regions leads to a number of possible Gaussian density proles.

From

all of these proles, the ve that best correlated with clutter complexity were retained and used in the nal measure in combination with the aforementioned statistical features. The fundamental operational parameters of the generalized Gaussian
model that dierentiate the ve density proles and associated features are the constrained/unconstrained Gaussian analysis modes, the full-image/sub-block processing
modes, and the positive/negative input images.
4.4.1

Constrained and Unconstrained Gaussian Analysis Modes

The advantage of the constrained analysis mode is the ability to tune the model
to a specic set of Gaussian characteristics that closely represent the signature of a

88

target.

This is accomplished by constraining the model during the analysis stage

to retain only those Gaussian functions that fall within the specied amplitude and
lobe width ranges. The result is a reconstructed image of Gaussians that identies
the spatial locations and extent of objects in the scene that are of the expected size
of potential targets. This directly addresses one of the clutter complexity objectives
by identifying the target and target like features in the image.
However, constraining the analysis process may cause the model to ignore other
signatures that obfuscate the scene and are also denite contributors to image scene
complexity. Examples of this might include when multiple targets blend together in
the image to form a single larger signature, or when a single target is represented by
a signature consisting of multiple smaller hot spots in the image (possibly due to a
partial occlusion of the vehicle). To better accommodate these patterns in the scene
the image can be processed in the unconstrained Gaussian analysis mode as well. In
this mode no constraints are placed on the scale of the Gaussians during the analysis
of the scene.
The density proles generated in both the constrained and unconstrained modes
provide useful information in terms of clutter complexity.

The constrained mode

provides an indication of the number of potential target like shapes of the expected
size in the region of interest. The unconstrained mode provides similar information,
but it also models objects that are larger or smaller than the expected target. Figure
31 provides an example of the Gaussian density prole images generated using the
constrained and unconstrained modes of operation.
4.4.2

Global and Block Processing Modes

The ABS process uses image amplitude or intensity characteristics during the analysis
stage to identify and extract the Gaussians. In the global processing mode, the search
space for the next Gaussian becomes the entire image. As a result, the analysis can be

89

(a)

(b)

(c)

Figure 31: (a) The original FLIR image, (b) the Gaussian density prole image with
the analysis constrained to nd target sized objects, and (c) the Gaussian density
prole image with the analysis unconstrained.

90

biased toward textured regions that contain multiple high intensity peaks. This can
cause the density based stop criterion to be satised, which would halt the analysis
before the model has considered all regions in the image.

In particular, textured

regions with lower amplitude peaks would be excluded, and they could just as easily
contain clutter that contribute to the overall complexity.
As an alternative, the search process can be made sensitive to local attributes by
dividing the image into non-overlapping sub-blocks and processing each sub-block independently. The stop criterion for the analysis process is applied to each sub-block
separately rather than to the image as a whole. This forces the model to consider
all regions of the image scene equally when performing the analysis.

For image

scenes with fairly uniform intensity distributions, the result will be a more uniform
Gaussian density prole in the reconstruction.

Similarly, sub-blocks lacking suit-

able Gaussian matches yield lower Gaussian densities, while sub-blocks with multiple
suitable Gaussian matches yield higher Gaussian density values.
Both the local block processing and the global processing modes of operation
provide useful information.

The block processing mode is sensitive to local image

attributes, while the global method accommodates the larger image characteristics.
Figure 32 provides an example of an original FLIR image with the block processed
and global processed outputs.
4.4.3

Original and Negative Input Image Modes

In the FLIR application, potential targets can manifest themselves in the image as
structure with either high pixel intensities on a darker background (the case of a hot
vehicle in a cool background) or very low pixel intensities in a brighter background
(the case of the cold vehicle in a hot environment).

In the rst case the standard

analysis approach using peak picking is well suited to modeling targets in the original
(positive) image since the targets are usually brighter than the background.

91

The

(a)

(b)

(c)

Figure 32: (a) The original FLIR image, (b) the Gaussian density prole image
with the analysis performed globally, and (c) the Gaussian density prole image with
the analysis performed on independent sub-blocks of the image.

92

second scenario, where the targets are darker than the background, is the inverse
problem. An easy way to accommodate this situation is to rst generate the negative
version of the original image. The dark target like features in the original image will
appear bright against a dark background in the negative image . The same standard
analysis approach that is used on the original (positive) image can now be used on
the negative image to extract target like features.

In this way, dark structures in

the image that may contribute to the complexity of the scene are included in the
overall complexity measure by way of the negative image based Gaussian density
prole.

Figure 33 illustrates how processing the positive and negative images will

yield dierent but complementary results. Processing the positive image results in a
density prole that highlights the hot target, while processing the negative image
yields a density prole the includes the cold target.
4.4.4

Sensitivity Analysis and Neural Networks

The generalized Gaussian ABS model can be congured using the various modes
presented above, but there are a number of parameters that must be selected for each
mode conguration. For example, consider all the possibilities for choosing the block
size when in the block mode or choosing the limits on the lobe widths when operating
in the constrained mode. The number of possible parameter and mode combinations
is quite extensive. The computational complexity would quickly become unacceptable
if some form of input selection and reduction were not implemented.

As a result,

a neural network sensitivity analysis approach has been adopted to identify mode
congurations and associated parameter settings that provide the best correlation
with scene complexity.
Sensitivity analysis is a common tool in the eld of neural network design that
describes how changes to the input variables aect the output variables. It is based
on the concept of pattern informativeness, which states that an input pattern with

93

(a)

(b)

(c)

(d)

Figure 33: (a) The original FLIR image, (b) the negative FLIR Image, (c) constrained Gaussian density prole image performed on original image, and (d) the
constrained Gaussian density prole image performed on the negative image.

94

x1
x2

y1
Neural
Network

y2

xn

ym

x1

[dy1/dx1 dy1/dx2 dy1/dxn]

x2

xn

Gradient
Model

[dy2/dx1 dy2/dx2 dy2/dxn]

[dym/dx1 dym/dx2 dym/dxn]

Figure 34: A neural network block diagram with input variables xi and output variables yj and its corresponding gradient model with input variables xi and derivative
dy
output variables dxji .

negligible eect on the neural network outputs is uninformative.

In contrast, an

informative input pattern has a direct inuence on the outputs [31]. By observing
the corresponding changes on the output values, yj , to variations on the input values,
xi , one can establish the input/output sensitivity relationships for the given neural
network. In other words, the relative importance of an input variable corresponds
to the average size of the output gradient,

dyj
,
dxi

evaluated over a set of representative

input values.
This forms the basis of a gradient model which is generated from the underlying
estimation or classication neural network. The outputs of the gradient model are the
derivatives of the neural network outputs with respect to each of the input variables,
dyj
,
dxi

evaluated for a given set of input values. For a neural network with n inputs and

m outputs, the corresponding gradient model takes the same n inputs and generates
n m outputs as shown in gure 34. To establish the overall contributions of each
input variable, one can measure the average of the squared gradient output values

95

over the entire data set. This yields the sensitivity matrix


2 
2
2
dy1
dy2
dym
...
dx1 (p)
 dx1 (p)   dx1 (p) 
2

2
2
dy

dy
dy
N
m
1
2

.
.
.
1 
dx2 (p)

dx2 (p)
dx2 (p)
S=

,
..
..
..
..

N p=1
.
.
.
.



2 
2
2
dy1
dy2
dym
...
dxn (p)
dxn (p)
dxn (p)

(69)

where N is the total number of samples in the data set. Summing along the rows
of the sensitivity matrix yields the cumulative sensitivity of each input variable over
all the outputs. The input variables corresponding to the largest cumulative average
squared gradient values can be retained as being the most signicant.

What is

unique about using sensitivity analysis for input space reduction is that it prunes
input variables based on both the input and output data. Most other methods rely
primarily on the structure of the input data [62].

4.5

FLIR Clutter Complexity Results

The results section of this chapter can be divided into the following sections. The
neural network conguration and the results of the sensitivity analysis are presented
rst, as these provide the foundation for the selection of generalized Gaussian feature
congurations used in the measure.

Next, some examples are provided that show

graphically how the features (both the generalized Gaussian and the statistically
based varieties) relate to FLIR image complexity.

Finally, results of the overall

FLIR complexity measure are presented for the testing FLIR image database.
4.5.1

Neural Network Conguration and Sensitivity Analysis Results

The previously described sensitivity analysis method of performing input feature reduction was employed to establish which generalized Gaussian model congurations
should be chosen and used in the nal FLIR clutter complexity system. To do this,
a representative sample of 78 FLIR images was selected from the public database and

96

Table 8: The twelve generalized Gaussian model ABS congurations tested to determine the best clutter complexity features.
Conguration
1
2
3
4
5
6

Input
Image
Negative
Negative
Original
Original
Negative
Negative

7
8

Negative
Original

9
10
11
12

Original
Original
Original
Original

Search
Space
Sub-block
Sub-block
Full Image
Full Image
Full Image
Sub-block with
Zero Local Mean
Full Image
Sub-block with
Zero Local Mean
Sub-block
Sub-block
Sub-block
Sub-block

Analysis
Mode
Unconstrained
Constrained
Unconstrained
Constrained
Unconstrained
Constrained

GGABS
Output
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians
Sum of Gaussians

Constrained
Constrained

Sum of Gaussians
Sum of Gaussians

Constrained
Constrained
Unconstrained
Unconstrained

Sum of Gaussians
Avg. of Gaussians
Avg. of Gaussians
Sum of Gaussians

manually scored in terms of clutter complexity on a scale of zero to ve, with ve


being the most complex. This manual score served as a simple ground truth for the
corresponding image.
The entire set of sample images was then processed using the twelve dierent
generalized Gaussian ABS model mode and parameter congurations given in table
8.

The outputs consisted of either the sum or the average number (used in the

sub-block search space cases) of Gaussians in the resulting density proles.


For each of the twelve Gaussian model congurations the ABS process was halted
when the search routine detected a new peak magnitude that was less than sixty
percent of the maximum amplitude in the search space, which was either an image
sub-block or the full image (depending on use of the sub-block or global image analysis mode). When the conguration called for the constrained analysis mode, only
Gaussians with lobe widths between two and ten pixels wide were retained in synthesizing the Gaussian density prole. These parameter boundaries were established by
observational analysis and found to best correspond with the expected target sizes in
97

the original FLIR image data.


To do the data reduction analysis, a Multi-Layer Perceptron (MLP) feed forward
neural network was trained using the 12 generalized Gaussian model output values
for each image as the inputs.

These values were normalized to a range of -1 to 1

using the sigmoid function shown in gure 35 and given by


1 e
,
1 + e
y mean(y)
.
=
stdev(y)
y =

(70)
(71)

This normalization function maps all the data points in a set that lie within one
standard deviation of the mean of that set to the nearly linear region of the sigmoid
function, while the rest of the data points are compressed into the tails of the sigmoid
function. The sigmoid normalization technique was chosen, because it can accommodate outlier data points without sacricing (or compressing) the dynamic range of
the most commonly occurring data points [62].
The neural network was congured with 6 binary outputs (one for each level of
complexity).

To train the neural network, the manually scored complexity values,

which ranged from zero to ve, were used as the desired outputs and mapped to each
output as a binary true or false value. As a result, only one of the six output values
would be true for a given image and the associated generalized Gaussian model values
(which served as inputs to the neural network). The network was congured with a
single hidden layer containing 40 nodes and trained using a standard back propagation
algorithm. When an RMS error value of 0.1 or less was attained, the neural network
training was halted. The gradient model was then generated for this neural network,
and the resulting sensitivity matrix is shown in table 9. The last column is the sum
of the rows of the sensitivity matrix, and it indicates the cumulative sensitivity of all
the outputs to each input. The ve generalized Gaussian model based congurations
with the highest sensitivity values (which in this case are congurations 1-5 from

98

Sigmoid Normalization Function


1

0.5

0.5
x = 500
= 100
x

1
0

200

400

600

800

1000

Figure 35: A sample sigmoid normalization function that maps all input values to
an output range of -1 to 1.

99

Table 9: The sensitivity matrix resulting from the neural network with the twelve
generalized Gaussian ABS model congurations as inputs, and the corresponding
benchmark complexity scores as outputs. The cumulative neural network sensitivity
scores for each conguration is given in the last column. The top ve congurations
were chosen as features for the overall clutter complexity measure.
Conguration
1
2
3
4
5
6
7
8
9
10
11
12

1
0.13709
0.03145
0.31755
0.06829
0.17015
0.07157
0.07258
0.04603
0.00866
0.00809
0.03598
0.03251

Sensitivity Outputs
2
3
4
5
0.24548 0.13763 0.11622 0.09555
0.08087 0.14908 0.24398 0.18487
0.10985 0.12666 0.06617 0.13629
0.17466 0.06678 0.13060 0.11310
0.05310 0.06445 0.08704 0.10553
0.09967 0.08909 0.05638 0.06339
0.01441 0.05828 0.06662 0.02143
0.06700 0.06657 0.05995 0.10599
0.02057 0.07306 0.06457 0.05681
0.02385 0.07484 0.05648 0.04888
0.05400 0.04365 0.02602 0.03586
0.05649 0.04987 0.02590 0.03224

6
0.18567
0.17007
0.07030
0.05119
0.08414
0.09666
0.14828
0.02653
0.05978
0.05472
0.02754
0.02504

Row
Sum
0.91766
0.86035
0.82685
0.60464
0.56443
0.47679
0.38164
0.37208
0.28347
0.26689
0.22308
0.22207

table 8) were chosen as the best inputs to the overall clutter complexity system.
4.5.2

Complexity Feature Images

The overall clutter complexity measure combines the top ve generalized Gaussian
ABS based features with the eight statistically based features listed in table 7 as a sum
of weighted values as given by equation 68. Two sets of example images are presented
to illustrate how the features, both the generalized Gaussian and the statistically
based varieties, contribute to the FLIR image clutter complexity measure. Figure 36
shows an original FLIR image, obtained from the Computer Vision Group at Colorado
State University [19], as well as each of the ve generalized Gaussian density prole
images. The two block processing based density prole images have been generated
by performing the generalized Gaussian ABS process over sixty-four 32 32 pixel
non-overlapping blocks, where the brighter blocks indicate a higher concentration of
Gaussians. The other three density prole images consist of the actual generalized

100

Gaussians used during the synthesis to generate the reconstruction.


Figure 37 shows the same original image along with resulting images for each
of the statistically based features. For illustration purposes these images have also
been generated by computing the features over the same 64 blocks, where the brighter
blocks in the resulting images indicate higher returned values.

In the nal imple-

mentation of the clutter complexity measure the statistically based features are not
calculated on a block-by-block basis but rather over the entire image. Overall, the
brighter blocks or regions in both the generalized Gaussian model based and statistically based feature images correlate well with regions of higher complexity in the
image.
4.5.3

Complexity Measure Results

The clutter complexity measure was evaluated by Kaplan et al. [60] over a two-band
FLIR database that contained fty images from both the medium-wave and long-wave
bands.

The FLIR images contained the HMMWV, M113, M2, M35, M60A3, and

a pickup truck as target types.

In order to establish the false alarm ground truth

for each target type, a correlation lter was applied to each image using a image
chip of the corresponding target. Each target chip was cut from one image in both
wave bands and contains a broadside view of the target.

Correlation ltering is a

simple and yet reasonable method for estimating the performance of an ATR given a
particular target.
Once the ground truth false alarm count was established for all the targets, the
overall clutter complexity measure was generated based on all thirteen input features
using the procedure described in section 4.3. Figure 38 shows some medium-wave
FLIR images with varying degrees of clutter complexity and the corresponding false
alarm counts and clutter complexity scores for the M35 target chip.

The plot in

gure 38 shows the signicant correlation between the false alarm rates and the

101

(a)

(b)

(c)

(d)

(e)

(f)

Figure 36: Examples of Gaussian density prole images. (a) Original FLIR image, (b) Negative image block processed using unconstrained analysis, (c) Negative
image block processed using constrained analysis, (d) Original image processed using
unconstrained analysis, (e) Original image processed using constrained analysis, (f)
Negative image processed using unconstrained analysis.

102

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Figure 37: Example of images from [60] used to compute statistically based complexity features: (a) Original FLIR image, (b) Standard deviation, (c) Schmieder
Weathersby, (d) FBm Hurst, (e) Target interference ratio, (f) Energy, (g) Entropy,
(h) Homogeneity, (i) Outlier.

103

clutter complexity scores for all fty images in the medium-wave database.
All of this was encapsulated in a highly congurable clutter complexity software
package that allows a user to select:
any subset of the thirteen input features (statistical or generalized Gaussian),
the threshold value for the correlation lter that establishes the ground truth
false alarm count,
the target type/chip for the correlation lter,
and between the long-wave or medium-wave FLIR databases.
In addition to the chosen congurations, the main screen displays a scatter plot of the
false alarms vs. the clutter complexity, the FLIR chip for the chosen target, and each
of the FLIR images showing the locations of the false alarms. The program cycles
through all 50 images in the chosen wave band and displays results for each image as
well as cumulative results. A sample screen shot is shown in gure 39. This software
interface package was written by researchers at Clark Atlanta University (CAU) and
runs in Matlab 5.0 or higher.
One benchmark of performance is the correlation between the clutter complexity
scores the number of ATR false alarms over a given set of images.

Of particular

interest are the results of the measure when using each class of input features (GGABS
and Statistical) independently and then combined. Figure 40 shows the false alarm vs
clutter complexity data for all three cases along with the correlation coecients for the
M35 target and the medium-wave FLIR data set. While combining all thirteen input
features (GGABS and Statistical) provides the best overall correlation (r = 0.879),
the GGABS input features perform slightly better (with a correlation coecient of
r = 0.802) than do the statistically based input features (with a correlation coecient
of r = 0.767) when used as the sole inputs to the measure.

104

Low Complexity

Medium Complexity

High Complexity

False Alarms: 4
Clutter Complexity: 1.0309
(a)

False Alarms: 11
Clutter Complexity: 2.273
(b)

False Alarms: 38
Clutter Complexity: 3.6028
(c)

40

False Alarms

30

20

10

0
1

1
2
Clutter Complexity Measure
(d)

Target: M35 Truck


Aspect: Broadside
(e)

Figure 38: Sample clutter complexity results over the medium-wave image data in
a dual-band FLIR database. (a) Low complexity image, (b) Medium complexity
image, (c) High complexity image, (d) Plot showing the correlation between false
alarm counts for the M35 target and the corresponding clutter complexity scores, (e)
Broadside view of M35 target in both visible and medium-wave bands.

105

Figure 39: Screen shot of the CAU clutter complexity software interface package that
shows the current conguration (input features, false alarm threshold, and database),
the false alarm vs. clutter complexity scatter plot, the target chip, and a sample
image with false alarm locations identied.

106

40

r = 0.879

35

30

Number of False Alarms

r = 0.802

r = 0.767

25

All Features
r = 0.879
GG ABS Features
r = 0.802
Statistical Features
r = 0.767

20

15

10

5
4

10

12

14

16

Clutter Complexity Score

Figure 40: The resulting clutter complexity measure scatter plots of false alarms
vs complexity scores and associated correlation coecients when using all thirteen
input features (both generalized Gaussian and statistical), using just the generalized
Gaussian input features, and using just the statistical features for an M35 target and
the medium-wave data set.

107

The clutter complexity measure was further tested by Kaplan et al. using all
thirteen features over ten partitions of the COMANCHE FLIR image database [60].
Each partition contains high contrast imagery of a single and unique target type
with an aspect angle of between 80 and 100 at a range of 2 kilometers.

Three

dierent sites served as the image data collection locations: (1) Grayling, Michigan,
(2) Hunter Liggett, California and (3) Yuma, Arizona. The background scenes at all
three locations were very similar, and the camera was stationary. As a result, the
main contributors to variations in the image scene data between the three data sets
were the diverse meteorological conditions. These provided the necessary variations
in the image data that enabled the evaluation of the clutter complexity model. With
the exception of the climate, the other conditions and sensor congurations were
largely constant across all of the partitions. As a result any dierences in the ATR
performance between partitions could be considered primarily due to the dierent
target types. Within partitions the number of pixels on target were approximately
the same in all the images, leaving only the clutter as a variable that could aect
ATR performance.
The false alarm counts for the LVQ ATR were computed across all of the partitions
in the database. The clutter complexity model was then trained and tested over all
combinations of the database partitions and correlated with the LVQ ATR rates.
Table 10 shows a confusion matrix of correlation coecients between the ATR false
alarm rates and clutter complexity scores, where the indices refer to the partitions
used to train and test the clutter complexity model.

A linear correlation exists

between the ATR bounds and the clutter complexity scores as evidenced by the
relatively strong correlation coecients along the diagonal.

However, this is only

true when the clutter complexity model is trained and tested over the same partition.
This indicates that the clutter complexity measure is target dependent.

108

109

Test
BMP2
2S1
M1
M113
M3
M60
T72
ZSU23
M35
HMMWV

BMP2
2S1
M1
M113
0.9726 0.7856 0.0027 -0.0680
0.6401 0.9689 -0.0094 -0.4684
0.0270 -0.1913 0.7655 0.0879
0.3568 0.6082 -0.2818 0.7735
0.1343 0.4803 0.4188 -0.1672
-0.2023 0.6822 -0.5592 -0.6425
0.2862 0.2987 -0.2817 -0.4819
0.1643 0.0358 -0.2117 0.0662
0.0501 -0.3569 0.0156 0.4095
0.5468 -0.5429 0.5103 0.5305

Train
M3
M60
T72
ZSU23
M35
HMMWV
0.4438 0.3059 -0.4301 0.0196 -0.3836
-0.1208
0.0906 0.4904 0.2940 0.2386 -0.4954
-0.0236
-0.0079 0.0481 0.0597 0.0125 -0.0861
-0.0335
0.2644 0.5359 0.0559 0.2136 -0.1781
0.1439
0.9009 0.4011 -0.1718 0.0353 -0.4561
-0.2699
-0.5729 0.8764 0.5362 0.2636 0.1001
0.1301
-0.2087 0.2115 0.8279 -0.1151 0.0113
0.2834
0.1097 -0.2519 -0.0561 0.6227 0.0850
0.1175
0.3645 -0.3743 -0.4141 -0.3498 0.9793
0.7650
0.5216 -0.5376 0.5848 0.0582 -0.2474
0.8057

Table 10: Confusion matrix of correlation coecients between the clutter complexity scores and LVQ ATR false alarm rates
for ten partitions of the COMANCHE FLIR image database when the clutter complexity measure is trained over only one of
the partitions [59].

4.6

Discussion and Future Work

This chapter presents an application of the generalized Gaussian ABS model to a


non-coherent imaging task in the form of a FLIR clutter complexity measure. Eight
statistically based features and ve generalized Gaussian based features served as
inputs to the measure. The resulting relative complexity scores were shown to correlate quite well with the ATR false alarm counts. The measure is target dependent
in that it achieves optimal performance when the target type is specied.

Clutter

complexity scores were derived for 10 dierent target types (over ten partitions of the
COMANCHE FLIR database) and shown to correlate well with the LVQ ATR results
for the image same data. In this sense the measure could be used as a predictor of
how well an ATR might perform on a given image when looking for a particular target
within clutter.
The generalized Gaussian ABS features were signicant contributors to the measure as evidenced by the improvement in performance when combined with the statistically based features. When using the generalized Gaussian features exclusively
as inputs, the measure performed slightly better than when using only the statistical
features. However, the best correlation between the clutter complexity score and the
ATR bounds (or false alarm counts) was achieved when using all thirteen features
together.
The ability to tune the Gaussian functions to model the expected target shapes and
sizes provided a signicant amount of exibility for extracting features in the image
that were target-like and thus contribute to the scene clutter. The ABS process provided a framework for iteratively extracting the target like image features/Gaussians
in a straight forward manner. These key aspects of the generalized Gaussian ABS
model make it ideally suited for extracting image content with generally known characteristics from the scene.
Unlike the SAR application, in the clutter complexity measure the generalized

110

Gaussian ABS model was not used to modify the image content and/or reconstruct
the image scene.

It was simply used as an adaptive feature extraction tool.

The

ability to tune the generalized Gaussian functions to model specic image features
was shown to be very valuable in the FLIR clutter complexity application, and it
could also be applied to other applications with similar requirements.

111

CHAPTER V

CONTRIBUTIONS AND FUTURE WORK

The goal of this research has been to explore the use of multidimensional signal
representations for use in image processing based feature extraction, modeling, and
enhancement.

This led to the development of a new model for performing image

analysis, synthesis, and modication using 2D generalized Gaussian functions.

5.1

Summary of Contributions

The joint time-frequency characteristics of the generalized Gaussian waveform make


it well suited for accurately representing spatially localized image content. The added
shape parameter, which distinguishes the generalized Gaussian from the traditional
Gaussian waveform, further improves the accuracy of the representation.

While a

circularly smooth implementation of the 2D generalized Gaussian function that can


accommodate o-axis rotation is presented, a much more ecient separable formulation has been adopted that signicantly simplies the Gaussian parameter extraction
process.
The analysis-by-synthesis (ABS) decomposition technique, which was rst explored in conjunction with the modeling of speech and audio signals, has been extended to the two-dimensional case to perform the image decomposition.

An e-

cient peak-picking approach has been developed to perform the Gaussian parameter
extraction during the image analysis process.

A powerful attribute of the ABS is

the ability to perform independent parameter modication in both the analysis and
synthesis stages of the decomposition. The strong localization properties of the generalized Gaussian combined with the exibility of the ABS decomposition technique
form the basis of the new model and approaches in image analysis and enhancement.
112

A major contribution is made in the resolution enhancement of images generated


using coherent imaging modalities such as synthetic aperture radar (SAR) and ultrasound. These images intrinsically suer from a degradation known as speckle or
speckle noise, which is an artifact of the image formation process. The generalized
Gaussian ABS model is used to decouple natural image features from the speckle
thereby facilitating independent control over the image feature characteristics and
the speckle granularity.

This has the benecial eect of increasing the perceived

resolution and reducing the obtrusiveness of the speckle while preserving the edges
and the denition of the image features.
The eectiveness of the resolution enhancement in the SAR case was assessed by
formal stimulus-comparison categorical judgement scale subjective testing as dened
by CCIR Recommendation 500-4 and by automatic target recognition (ATR) performance evaluations. In both cases the SAR images were resolution enhanced using
the generalized Gaussian ABS and then evaluated. The results of these assessments
show gains in both the subjective quality of the SAR clutter images and the objective
ATR performance score, which is one of the highest reported to date using the 22.5
degree MSTAR target chip database.
A consequence of its inherent exibility, the generalized Gaussian ABS model can
also be applied to image processing applications associated with non-coherent imaging
modalities.

This is illustrated by its application as a feature extraction tool for a

forward-looking infra-red (FLIR) imagery clutter complexity measure.

Clutter in

the infrared imaging context can have broad connotations, but in this case the term
is used to indicate objects or artifacts in an image that obfuscate the detection of
targets within the scene. The goal of the clutter complexity measure is to serve as
a relative indicator of the amount of clutter in the image scene and of the level of
performance that can be expected when processing the FLIR image with an ATR
system.

113

The generalized Gaussian ABS model is used to identify clutter related features
within the FLIR image that indicate complexity. The ability to tune the Gaussian
functions to the expected target shapes and sizes and iteratively extract them using
the ABS framework makes this model well suited to this application.

Five gener-

alized Gaussian features, along with eight other statistically based features, serve as
the inputs to a system that generates a complexity score that correlates well with
ATR performance bounds. The contribution of the generalized Gaussian based features to the measure is shown to be signicant. As a group, they outperformed the
statistically based features in terms of ATR false alarm correlation when tested on
the COMANCHE FLIR image database. The overall best performance is achieved
when all thirteen features are included in the measure.

The FLIR clutter com-

plexity application illustrates that the ability to tune the generalized Gaussian ABS
model can also be an eective tool for extracting image content with generally known
characteristics from the scene.

5.2

Future Work

We have successfully explored the SAR and FLIR imaging applications of the generalized Gaussian ABS model. However, there are many other areas where development
and use of the model could be investigated in future work.
area is in the coherent imaging eld of ultrasound.

The most signicant

An initial investigation of the

ultrasound application has been conducted, and a summary of the work is included
below.
5.2.1

The Ultrasound Application

Brightness mode (B-mode) ultrasound images also suer from speckle noise degradation that is implicit to coherent imaging modalities.

As an impediment to high

image quality, speckle can contribute to poor edge and feature denition, which in
turn leads to lower perceived resolution. The nature of ultrasound speckle is unique
114

in that the speckle texture appears to be an artifact of both the imaging modality
and the microstructure of the tissue being imaged [89], [94].

The resolution of an

ultrasound image is directly tied to the frequency of the incident signal and the depth
of tissue penetration. The ultimate objective of any speckle reduction process is to
allow a user to accurately resolve small objects in the scene.
Similar to the approach for SAR images, the objective of this enhancement is not
to uncover any hidden structure or features in the ultrasound image, but rather to
provide the sonographer with another tool for improving the visual presentation of
the tissue being imaged.

This is not unlike other tools already available such as

contrast or brightness enhancement.


Ultrasound image enhancement is a broad term that could encompass many dierent aspects related to improved image quality. We constrain the term to represent the
perceptual improvement of the image resolution and denition. This is achieved by
reducing the speckle granularity while preserving and/or enhancing edge denition.
The generalized Gaussian ABS procedure is investigated as a means to model large
granularity speckle and convert it into a ner more dense speckle. The search technique used to extract the Gaussians, the Gaussian parameter modication schemes in
both the analysis and synthesis stages, and the stop condition are all adapted specically to achieve this goal. Subjective quality is the primary indication of eectiveness
in this application. It is also important to preserve the familiar appearance of the
images, because that is the environment in which the sonographers are trained to
recognize specic features or abnormalities.
In this study we did not implement a front end preprocessor to extract the underlying low frequency information. This is because the underlying objective was to
investigate the generalized Gaussian models ability to independently represent the
ultrasound speckle. Lower frequency spatial variations in this application are reconstituted as the aggregate sum of high frequency Gaussians. To facilitate control over

115

the speckle texture, the Gaussian waveforms are biased to have sizes approximately
equal to or less than the size of the speckle grains.
The speckle granularity in ultrasound is unique in that it appears to have an
orientation and to grow in size as the penetration depth of the ultrasound increases.
Edges in the image associated with tissue boundaries also become somewhat more
important in the representation.

The Gaussian search and extraction technique

remains the same as the one presented in the SAR application.

However, in the

ultrasound case the extracted Gaussians are somewhat larger than those associated
with the SAR speckle due to the larger granularity of the ultrasound speckle. The
general notions of the analysis functions dictating the synthesis packing densities, and
the synthesis functions controlling the speckle granularity hold true for the ultrasound
application as they did in the SAR case.
Preliminary results were generated by processing an ultrasound image in the decoupled mode using the separable form of the generalized Gaussian waveform and a
density based stop condition.

The process was stopped when the total number of

Gaussians used in the reconstruction reached 75 percent of the total number of pixels
in the image.

The non-overlapping block processing analysis method described in

section 2.3.3 was used to ensure that dark and low contrast regions of the image were
accurately represented. The analysis and synthesis lobe width and shape parameter
modications are presented in table 11 in terms of the multiplication factors (mf)
applied to the original extracted Gaussian parameters.

The multiplication factors

that generated the highest subjective quality reconstructions were obtained experimentally by generating samples distributed throughout the parameter space. Figure
41 shows an original unprocessed image, and an example of a processed result is given
in gure 42.
Observation of gure 42 indicates that there is some perceptual improvement in
spatial extent of the speckle in the processed image over the original given in gure

116

Figure 41: Original ultrasound image.

117

Figure 42: Generalized Gaussian ABS processed ultrasound image with ner and
higher density granularity.

118

Table 11: Multiplication factors used to modify the analysis and synthesis Gaussians
for the ultrasound application.
Figure
42
41.

Analysis
mf mf
0.6 0.4

Synthesis
mf mf
0.4
0.2

This is accomplished by independently modifying the analysis and synthesis

generalized Gaussian waveforms while still preserving the basic structure and features associated with the original image. While this preliminary study demonstrates
the potential of the generalized Gaussian ABS model to perform speckle resolution
enhancement in the ultrasound application, there are several areas that could be
examined in future work. These might include the following.
Investigate the use of a preprocessor to accommodate low frequency variations
within the image. This is used in the SAR imaging application and could be
used eectively in ultrasound to improve the representation of the low frequency
structure in the reconstructed image.
Explore a more rigorous and better dened process for establishing appropriate
Gaussian modication parameters. This would lead to a better understanding
of the impact they have on the perception of the speckle and the content in
ultrasound images.
Examine the use of the full circularly smooth 2D generalized Gaussian formulation to accommodate the more oblong nature of ultrasound speckle. While
this will increase the computational complexity of the process, it may improve
the models ability to manipulate the larger and more oblong artifacts in the
ultrasound image scene.
Develop a comprehensive set of tests to evaluate the eectiveness of the approach.

This would most likely involve implementing a CCIR based testing


119

method similar to the one described in section 3.5.1 that was used in the subjective analysis of the SAR data. Regardless of the approach, the goal would
be to evaluate the eectiveness of the generalized Gaussian based resolution
enhancement process when applied to ultrasound images

120

REFERENCES

[1] Radar imagery, Education Poster Series 6, Jet Propulsion Library, California
Institute of Technology, Pasedena, California 91109, 1998.
[2] 500-4, C. R., Method for the subjective assessment of the quality of television
pictures, in Recommendations and Reports of the CCIR International Radio
Consultative Committee, vol. 11, (Dusseldorf, Germany), pp. 4761, CCIR, 1990.
[3] Antonini, M., Barlaud, M., Mathieu, P., and Daubechies, I., Image coding using wavelet transform, IEEE Trans. on Image Processing, vol. 1,
pp. 205220, April 1992.
[4] Army Research Laboratory, Broad agency announcement for fall 1995,
tech. rep., Army Research Laboratory, June 1994.
[5] Army/Defense Advanced Research Projects Agency, Future combat
systems. http://www.darpa.mil/fcs/, July 2003.
[6] Atal, B. S. and Remde, J. R., A new model of LPC excitation for producing
natural-sounding speech at low bit rates, Proc. Int. Conf. on Acoustics, Speech,
and Signal Processing, pp. 614617, May 1982.
[7] Bastiaans, M. J., Gabors expansion of a signal inot gaussian elementary
signals, Proceedings of the IEEE, vol. 68, pp. 538539, April 1980.
[8] Bell, C. G., Fukisaki, H., Heinz, J. M., Stevens, K. N., and House,
A. S., Reduction of speech spectra by analysis-by-synthesis techniques, Journal of The Acoustical Society of America, vol. 33, pp. 17251736, December 1961.
[9] Bergeaud, F. and Mallat, S., Matching pursuit of images, in Proc. Int.
Conf. on Image Processing, pp. 5356, IEEE, 1995.
[10] Blackman, R. B. and Tukey, J. W., The Measurement of Power Spectra
from the Point of View of Communications Engineers. New York, NY, USA:
Dover, 1958.
[11] Bovik, A. C., Clark, M., and Geisler, W. S., Multichannel texture analysis using localized spatial lters, IEEE Trans. on Pattern Analysis and Machine
Intelligence, vol. 12, pp. 5573, January 1990.
[12] Britton, D. F., Smith, M. J. T., and Mersereau, R. M., Generalized
gaussian decompositions for image analysis and synthesis, in World Multiconference on Systemics, Cybernetics and Informatorics, vol. 6, (Orlando, FL),
pp. 300305, July 2000.
121

[13] Buzo, A., Gray, A. H. J., Gray, R. M., and Markel, J. D., Speech
coding based upon vector quantization, IEEE Trans. on Acoustics, Speech, and
Signal Processsing, vol. ASSP-28, pp. 562574, October 1980.
[14] Carrara, W. G., Goodman, R. S., and Majewski, R. M., Spotlight Synthetic Aperture Radar: Signal Processing Algorithms. 685 Canton Street, Norwood, MA 01602: Artech House, 1995.
[15] Cassidy,
D.
C.,
Heisenberg.
http://www.aip.org/history/heisenberg/, November 1998.
tory of Physics of the American Institute of Physics.

Web
page:
Center for His-

[16] Cederquist, J., Dwan, C., Wegrzyn, J., and Rauss, P. J., Spatial spectral atr, Proceedings of the Third Annual ARL Federated Laboratory Advanced
Sensors Symposium, p. 331, Feb 1999.
[17] Chan, L. A. and Nasrabadi, N. M., An application of wavelet-based vector
quantization in target recognition, International Journal on Articial Intelligence Tools, vol. 6, pp. 165178, April 1997.
[18] Cohen, L., Time-Frequency Analysis. New Jersey: Prentice-Hall Inc., 1995.
[19] Computer Vision Group, Colorado State Univerisity, Fort Carson
RSTA Data Collection. http://www.cs.colostate.edu/vision/ft carson/, Nov
1993.
[20] Curlander, J. C. and McDonough, R. N., Synthetic Aperture Radar: Systems and Signal Processing. Wiley Series in Remote Sensing, New York, NY:
John Wiley & Sons, Inc., 1991.
[21] Daubechies, I., The wavelet transform, time-frequency localization and signal
analysis, IEEE Trans. on Information Theory, vol. 36, pp. 9611005, September
1990.
[22] Daubechies, I., Where do wavelets come from? - a personal point of view,
Proc. of the IEEE, vol. 84, pp. 510513, April 1996.
[23] Daugman, J. G., Two-dimensional spectral analysis of cortical receptive eld
proles, Vision Research, vol. 20, pp. 847856, 1980.
[24] Daugman, J. G., Complete discrete 2-d gabor transforms by neural networks
for image analysis and compression, IEEE Transactions on Acoustics, Speech,
and Signal Processing, vol. 36, pp. 11691179, July 1988.
[25] Defense
Advanced
Research
Projects
FCS
UGCV:
Unmanned
ground
combat
http://www.darpa.mil/tto/programs/fcs ugcv.html, July 2003.

122

Agency,
vehicle.

[26] DeGraaf, S. R., SAR imaging via modern 2-d spectral estimation methods,
IEEE Transactions on Image Processing, vol. 7, pp. 729761, May 1998.
[27] Deller, J. R. J., Proakis, J. G., and Hansen, J. H. L., Discrete-Time
Processing of Speech Signals. 866 Third Avenue, New York, New York 10022:
Macmillan Publishing Company, 1993.
[28] Dewaele, P., Wambacq, P., Osterlinck, A., Leuven, K. U., and Marchand, J. L., Comparison of some speckle reduction techniques for SAR images, Int. Geoscience and Remote Sensing Symposium, no. 10, pp. 24172422,
1990.
[29] Dudley, H., The vocoder, Bell Labs Record, vol. 17, pp. 122126, 1939.
[30] Elachi, C., Spaceborne Radar Remote Sensing: Applications and Techniques.
345 East 47th Street, New York, NY 10017-2394: IEEE Press, 1988.
[31] Engelbrecht, A. P., Sensitivity analysis for selective learning by feedforward
neural networks, Fundamenta Informaticae, vol. 45, no. 4, pp. 295328, 2001.
[32] et. al., C. H. C., Signal Processing Handbook. 270 Madison Avenue, New
York, New York 10016: Marcel Dekker, Inc., 1988.
[33] Feichtinger, H. G. and Strohmer, T., Gabor Analysis and Algorithms:
Theory and Applications. Boston: Birkhauser, 1998.
[34] Fitch, J. P., Synthetic Aperture Radar. 175 Fifth Ave, New York, NY 10010,
USA: Springer-Verlag, 1988.
[35] Flanagan, J. L., Parametric coding of speech signals, Journal fo the Acoustical Society of America, vol. 68, pp. 412419, August 1980.
[36] Flanagan, J. L. and Christensen, S. W., Computer studies on parametric
coding of speech spectra, Journal fo the Acoustical Society of America, vol. 68,
pp. 420430, August 1980.
[37] Flanagan, J. L. and Golden, R. M., Phase vocoder, The Bell System
Technical Journal, pp. 14931509, November 1966.
[38] Franceschetti, G. and Lanari, R., Synthetic Aperture Radar Processing.
Electronic Engineering Systems Series, Boca Raton, FL: CRC Press LLC, 1999.
[39] Gabor, D., Theory of communication, Journal of IEE (London), vol. 93,
pp. 429457, November 1946.
[40] Gagnon, L. and Jouan, A., Speckle ltering of SAR images - a comparative
study between complex-wavelet-based and standard lters, in Proceedings of
SPIE Wavelet Applications in Signal and Image Processing, vol. 3169, pp. 80
91, 1997.

123

[41] George, E. B., An Analysis-by-Synthesis Approach to Sinusoidal Modeling Applied to Speech and Music Signal Processing. Ph.d. dissertation, Georgia Institute
of Technology, Atlanta, GA, 1991.
[42] George, E. B. and Smith, M. J. T., A new speech coding model based
on a least-squares sinusoidal representation, in Proc. Int. Conf. on Acoustics,
Speech, and Signal Processing, pp. 16411644, IEEE, 1987.
[43] George, E. B. and Smith, M. J. T., Analysisbysynthesis/overlap-add
sinusoidal modeling applied to the analysis and synthesis of musical tones, J.
Audio Engineering Society, vol. 40, pp. 497516, June 1992.
[44] George, E. B. and Smith, M. J. T., Speech analysis/synthesis and modication using an analysis-by-synthesis/overlap-add sinusoidal model, IEEE Trans.
on Speech and Audio Processing, vol. 5, pp. 389406, September 1997.
[45] Graps, A., An introduction to wavelets, IEEE Computational Science &
Engineering, pp. 5061, summer 1995.
[46] Guglielmi, V., Castine, F., and Piau, P., Super-resolution algorithms for
SAR applications, in Proc. SPIE Conf. Image Reconstruction and Restoration
II, vol. 3170, (San Diego, CA), pp. 195202, SPIE, July 1997.
[47] Guo, H., Odegard, J., Lang, M., Gopinath, R., Selesnick, I., and Burrus, C., Wavelet based speckle reduction with application to SAR based atd/r,
in IEEE International Conference on Image Processing, (Austin, TX), November
1994.
[48] Haar, A., Zur Theorie der Orthogonalen Funktionensysteme. Mathematische
annalen 69, pp 331-371, Georg-August-Universitat Gttingen, Germany, 1909.
[49] Halle, M. and Stevens, K. N., Analysis by synthesis, Proc. Semantic
Speech Compression, vol. 2, p. Paper D7, December 1959.
[50] Hedelin, P., A tone-oriented voice-excited vocoder, IEEE Int. Conf. on
Acoustics, Speech, Signal Processing, pp. 205208, 1981.

[51] Heisenberg, W., Uber


den anschaulichen inhalt der quantentheoretischen
kinematik und mechanik, Zeitschrift f
ur Physik, vol. 43, pp. 172198, March
1927.
[52] il Park, S., New Directional Filter Banks and Their Applications in Image
Processing. Ph.d. dissertation, Georgia Institute of Technology, Atlanta, GA,
1999.
[53] Jain, A. K., Bhattacharjee, S. K., and Chen, Y., On texture in document
images, in Proc. on Computer Vision and Pattern Recognition, pp. 677680,
IEEE, 1992.

124

[54] Jain, A. K. and Farrokhnia, F., Unsupervised texture segmentation using


gabor lters, in Proc. Int. Conf. on Systems, Man and Cybernetics, pp. 1419,
IEEE, 1990.
[55] Jakeman, E. and Pusey, P. N., A model for non-rayleigh sea echo, IEEE
Trans. on Antennas and Propogation, vol. 24, pp. 806814, November 1976.
[56] Janssen, A. J. E. M., Gabor representation of generalized functions, Journal
of Mathematical Analysis and Applications, vol. 83, pp. 377394, October 1981.
[57] Jao, J. K., Amplitude distribution of composite terrain radar clutter and the
k-distribution, IEEE Trans. on Antennas and Propogation, vol. 32, pp. 1049
1062, October 1984.
[58] Juang, B.-H. and Gray, A. H. J., Multiple stage vector quantization for
speech coding, in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing,
pp. 597600, IEEE, 1982.
[59] Kaplan, L. M., Namuduri, K. R., Davies, M., Nasrabadi, N. N., Chan,
L. A., Britton, D. F., Smith, M. J. T., and Mersereau, R. M., Development and analysis of a clutter complexity measure for ATR characterization,
in Proc. of Advanced Sensors Consortium; 5th ARL Federated Labs Symposium,
(College Park, MD), pp. 195199, 2001.
[60] Kaplan, L. M., Namuduri, K. R., Davies, M., Nasrabadi, N. N., Chan,
L. A., Britton, D. F., Smith, M. J. T., and Mersereau, R. M., Final
report on the development and analysis of a clutter complexity measure for ATR
characterization, tech. rep., Clark Atlanta University, September 2002.
[61] Kay, S. M., Modern Spectral Estimation, Theory and Application. Signal Processing Series, Englewood Clis, New Jersey 07632: Prentice Hall, 1 ed., 1988.
[62] Kennedy, R., Lee, Y., Reed, C., and Van Roy, B., Solving Pattern Recognition Problems. 60 Birmingham Parkway, Brighton, MA 02135: Unica Technologies, Inc., January 1996.
[63] Kuan, D. T., Sawchuck, A. A., Strand, T. C., and Chavel, P., Adaptive restoration of images with speckle, IEEE Trans. on Acoustics, Speech, and
Signal Processing, vol. ASSP35, pp. 373383, March 1987.
[64] Lanterman, A. D., OSullivan, J. A., and Miller, M. I., Kullback-liebler
distances for quantifying clutter and models, Optical Engineering, vol. 38, pp. 1
13, December 1999.
[65] Lee, J. S., Digital image enhancement and noise ltering by using local statistics, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. PAMI2,
March 1980.

125

[66] Lee, J. S., Rened ltering of image noise using local statistics, Computer
Graphic and Image Processing, vol. 15, pp. 380389, 1980.
[67] Levinson, N., The wiener rms (root mean square) error criterion in lter
design and prediction, Journal of Mathematical Physics, vol. 25, pp. 261178,
1947.
[68] Li, B., Zheng, Q., Der, S., Chellapa, R., Nasrabadi, N. M., Chan,
L. A., and Wang, L. C., Experimental evaluation of neural, statistical and
model-based apporaches to FLIR ATR, Proceedings of the SPIE, vol. 3371,
pp. 388397, April 1998.
[69] Li, F. K., Croft, C., and Held, D. N., Comparison of several techniques to
obtain multiple-look sar imagery, IEEE Trans. Geoscience and Remote Sensing,
no. 21, p. 370, 1983.
[70] Li, J. and Stoica, P., An adaptive lter approach to spectral estimation and
SAR imaging, IEEE Transactions on Signal Processing, vol. 44, pp. 14691484,
June 1996.
[71] Lopes, A., Nezry, E., Touzi, R., and Laur, H., Structure detection and
statistical adaptive speckle ltering in sar images, International Journal of Remote Sensing, vol. 14, no. 9, pp. 17351758, 1993.
[72] Lopes, A., Touzi, R., and Nezry, E., Adaptive speckle lter and scene heterogeneity, IEEE Trans. on Geoscience and Remote Sensing, vol. 28, pp. 992
1000, November 1990.
[73] Macon, M. W. and Clements, M. A., Speech concatenation and synthesis
using an overlap-add sinusoidal model, in Proc. Int. Conf. on Acoustics, Speech,
and Signal Processing, (Atlanta, GA), pp. 361364, IEEE, 1996.
[74] Macon, M. W., Jensen-Link, L., Olivero, J., Clements, M. A., and
George, E. B., A singing voice synthesis system based on sinusoidal modeling, in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, (Munich,
Germany), pp. 435438, 1997.
[75] Mallat, S. G., A theory for multiresolution signal decomposition: The
wavelet representation, IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 11, pp. 674693, July 1989.
[76] Mallat, S. G. and Zhang, Z., Matching pursuits with time-frequency dictionaries, IEEE Trans. on Signal Processing, vol. 41, pp. 33973415, December
1993.
[77] Marcelja, S., Mathematical descriptions of the responses of simple cortical
cells, Journal of the Optical Society of America, vol. 70, no. 11, pp. 12971300,
1980.

126

[78] Marques, J. S. and Almeida, L. B., A background for sinusoid based representation of voiced speech, in Proc. Int. Conf. on Acoustics, Speech, and Signal
Processing, pp. 12331236, 1986.
[79] Meitzler, T., Gerhart, G., and Singh, H., A relative clutter metric,
IEEE Transactions on Aerospace and Electronic Systems, vol. 34, pp. 968976,
July 1998.
[80] Oliver, C. and Quegan, S., Understanding Synthetic Aperture Radar Images.
Boston: Artech House Inc., 1998.
[81] Oliver, C. J., The interpertation and simulation of clutter textures in coherent images, Inverse Problems, vol. 2, pp. 481518, November 1986.
[82] Oppenheim, A. V. and Schafer, R. W., Discrete-Time Signal Processing.
Signal Processing Series, Englewood Clis, New Jersey 07632: Prentice Hall,
1st ed., 1989.
[83] Owirka, G. J., Weaver, A. L., and Novak, L. M., Performance of a multiresolution classier using enhanced resolution SAR data, in Proc. SPIE Conf.
Radar Sensor Technology II, vol. 3066, (Orlando, FL), SPIE, April 1997.
[84] Pinson, E. N., Pitch synchronous time domain estimation of formant frequencies and bandwidths, Journal of The Acoustical Society of America, vol. 35,
pp. 12641273, August 1963.
[85] Porat, M. and Zeevi, Y. Y., Pattern analysis and texture discrimination in
the gabor space, 9th International Conference on Pattern Recognition, vol. 2,
pp. 700702, November 1988.
[86] Qian, S. and Chen, D., Joint time-frequency analysis, IEEE Signal Processing Magazine, pp. 5267, March 1999.
[87] Rabiner, L. R. and Schafer, R. W., Digital Processing of Speech Signals.
Digital Signal Processing Series, Englewood Clis, New Jersey 07632: PrenticeHall, Inc., 1978.
[88] Shi, Z. and Fung, K. B., A comparison of digital speckle lters, in International Geoscience and Remote Sensing Symposium (IGARSS), vol. 4, pp. 2129
2133, August 1994.
[89] Shung, K. K., Smith, M. B., and Tsui, B. M. W., Principles of Medical
Imaging. San Diego, CA: Academic Press Inc., 1992.
[90] Smith, M. J. T. and Docef, A., A study Guide for Digital Image Processing.
Riverdale, Georgia 30274: Scientic Publishers, Inc., early reslease edition ed.,
1997.

127

[91] Smith, M. J. T. and Thomas P. III, Barnwell, A new lter bank theory for time-frequency representation, IEEE Trans. Acoustics, Speech, Signal
Processing, vol. 35, pp. 314327, June 1987.
[92] Strang, G. and Nguyen, T., Wavelets and Filter Banks. Box 812060, Wellesley, MA 02181, USA: Wellesley-Cambridge Press, 2nd ed., 1997.
[93] Valens,
C., A really freindly guide to wavelets. found at
http://perso.wanadoo.fr/polyvalens/clemens/wavelets/wavelets.html,
December 1999.
[94] Wagner, R. F., Insana, M. F., and Brown, D. G., Statistical properties
of radio-frequency and envelope-detected signals with appilication to medical
ultrasound, Journal of the Optical Society of America, vol. 4, pp. 910922, May
1987.
[95] Wexler, J. and Raz, S., Discrete gabor expansions, Signal Processing,
vol. 21, November 1990.
[96] Wiley, C. A., Synthetic aperture radars - a paradigm for technology evolution, IEEE Trans. Aerospace Electrical Systems (AES), no. 21, pp. 440443,
1985.
[97] Zhaoqiang, B., Li, J., and Liu, Z.-S., Super resolution SAR imaging via
parametric spectral estimation methods, IEEE Trans. on Aerospace and Electronic Systems, vol. 35, pp. 267281, January 1999.

128

Das könnte Ihnen auch gefallen