Jacob Beutel
Consultant
J. Michael Fitzpatrick
Vanderbilt University
Steven C. Horii
University of Pennsylvania Health Systems
Yongmin Kim
University of Washington
Harold L. Kundel
University of Pennsylvania Health Systems
Milan Sonka
University of Iowa
The images on the front cover are taken from figures in the text. Top row: left,
Fig. 7.13(c), p. 430; middle, Fig. 2.14, p. 104; right, Fig. 6.10(b), p. 363 and
color plate 1. Middle row: left, Fig. 11.27, p. 659; right, Fig. 13.28(d), p. 774,
Bottom: Fig. 15.10(d), p. 937.
Milan Sonka
J. Michael Fitzpatrick
Editors
SPIE
PRESS
Contents
Extended Contents
vii
xxiii
xxv
1
71
129
175
273
343
399
COLOR PLATES
Chapter 8. Image Registration
J. Michael Fitzpatrick, Derek L. G. Hill, Calvin R. Maurer, Jr.
447
515
vi Contents
Chapter 10. Validation of Medical Image Analysis Techniques
Kevin W. Bowyer
567
609
675
711
809
915
1005
1061
1131
1153
Index
1203
Extended Contents
vii
xxiii
xxv
1
3
4
5
7
10
12
13
14
14
15
16
17
18
20
25
25
27
28
34
35
35
36
39
39
40
41
41
43
1.7
1.8
1.9
1.10
1.11
1.12
1.13
1.14
1.15
2
Image Segmentation
2.1 Introduction
2.2 Image preprocessing and acquisition artifacts
2.2.1 Partial volume effect
2.2.2 Intensity nonuniformity (INU)
2.3 Thresholding
2.3.1 Shapebased histogram techniques
2.3.2 Optimal thresholding
2.3.3 Advanced thresholding methods for simultaneous
segmentation and INU correction
2.4 Edgebased techniques
2.4.1 Border tracing
2.4.2 Graph searching
2.4.3 Dynamic programming
2.4.4 Advanced border detection methods
44
45
45
45
46
46
47
47
48
49
50
51
52
52
52
53
53
53
53
53
54
56
57
57
58
71
73
73
74
74
78
78
79
84
88
88
89
91
93
Extended Contents ix
2.5
2.6
2.7
2.8
2.9
3
94
98
98
99
100
101
103
109
110
111
116
119
120
120
129
131
133
134
136
138
144
145
146
146
147
150
152
153
154
154
155
157
159
161
167
167
168
168
168
x Extended Contents
4
175
177
182
182
182
185
187
190
192
194
195
195
199
201
201
202
204
206
207
209
209
214
214
218
218
220
221
223
223
225
227
227
230
233
237
237
240
241
242
245
246
253
Extended Contents xi
260
262
263
Feature Extraction
5.1 Introduction
5.1.1 Why features? Classification (formal or informal)
almost always depends on them
5.1.2 Review of applications in medical image analysis
5.1.3 Roots in classical methods
5.1.4 Importance of data and validation
5.2 Invariance as a motivation for feature extraction
5.2.1 Robustness as a goal
5.2.2 Problemdependence is unavoidable
5.3 Examples of features
5.3.1 Features extracted from 2D images
5.4 Feature selection and dimensionality reduction for classification
5.4.1 The curse of dimensionality subset problem
5.4.2 Classification versus representation
5.4.3 Classifierindependent feature analysis for classification
5.4.4 Classifierindependent feature extraction
5.4.5 How useful is a feature: separability between classes
5.4.6 Classifierindependent feature analysis in practice
5.4.7 Potential for separation: nonparametric feature
extraction
5.4.8 Finding the optimal subset
5.4.9 Ranking the features
5.5 Features in practice
5.5.1 Caveats
5.5.2 Ultrasound tissue characterization
5.5.3 Breast MRI
5.6 Future developments
5.7 Acknowledgments
5.8 References
273
275
275
276
278
279
279
279
280
280
280
286
286
286
287
291
291
295
296
304
306
308
308
308
325
335
335
335
343
345
345
345
346
348
348
349
6.4
6.5
6.6
6.7
6.8
6.9
7
350
355
357
358
359
360
361
363
368
369
371
371
373
376
383
383
387
390
390
399
400
404
404
405
409
409
410
413
413
420
420
421
423
423
426
429
434
434
435
437
438
Discussion
References
439
440
COLOR PLATES
8
Image Registration
8.1 Introduction
8.1.1 Operational goal of registration
8.1.2 Classification of registration methods
8.2 Geometrical transformations
8.2.1 Rigid transformations
8.2.2 Nonrigid transformations
8.2.3 Rectification
8.3 Pointbased methods
8.3.1 Points in rigid transformations
8.3.2 Points in scaling transformations
8.3.3 Points in perspective projections
8.3.4 Points in curved transformations
8.4 Surfacebased methods
8.4.1 Disparity functions
8.4.2 Head and hat algorithm
8.4.3 Distance definitions
8.4.4 Distance transform approach
8.4.5 Iterative closest point algorithm
8.4.6 Weighted geometrical feature algorithm
8.5 Intensitybased methods
8.5.1 Similarity measures
8.5.2 Capture ranges and optimization
8.5.3 Applications of intensitybased methods
8.6 Conclusion
8.7 Acknowledgments
8.8 References
447
449
449
450
451
452
454
460
463
469
473
474
476
478
478
482
482
483
484
485
487
488
496
498
504
505
506
515
516
519
520
520
526
530
530
533
534
536
538
539
542
545
545
546
547
550
556
556
557
567
569
570
571
571
571
577
581
582
582
583
583
587
589
590
590
590
591
592
593
595
596
597
600
601
604
604
Extended Contents xv
11 Echocardiography
11.1 Introduction
11.1.1 Overview of cardiac anatomy
11.1.2 Normal physiology
11.1.3 Role of ultrasound in cardiology
11.2 The echocardiographic examination
11.2.1 Image acquisition in standard views
11.2.2 Mmode and two dimensional echocardiography
11.2.3 doppler echocardiography
11.2.4 Threedimensional echocardiography
11.3 The ventricles
11.3.1 Ventricular volume
11.3.2 Ventricular mass
11.3.3 Ventricular function
11.3.4 Ventricular shape
11.3.5 Clinical evaluation of the left ventricle
11.4 The valves
11.4.1 Assessment of the valves from echocardiograms
11.4.2 Valve stenosis
11.4.3 Valve regurgitation
11.5 Automated analysis
11.5.1 Techniques for border detection from echocardiographic
images
11.5.2 Validation
11.6 Acknowledgments
11.7 References
609
611
611
614
616
617
617
618
619
620
621
621
626
627
633
634
637
637
638
642
650
675
676
654
661
664
664
678
679
679
683
685
686
690
690
693
694
695
Validation of results
Conclusions and further research directions
Appendix A: Comparison of mechanical models to regularization
References
698
703
704
705
711
713
715
715
718
724
728
729
737
739
741
744
746
749
750
751
753
755
757
760
770
776
776
777
779
780
781
783
784
785
785
788
791
793
793
794
809
811
812
813
814
915
917
918
919
821
834
848
848
849
851
855
858
860
861
868
876
877
880
884
886
896
899
900
905
906
920
921
923
923
924
925
926
926
927
928
928
929
932
933
933
951
955
956
956
957
958
958
959
960
963
964
965
966
970
973
974
975
982
985
986
986
1005
1006
1006
1007
1009
1011
1011
1017
1028
1028
1038
1042
1050
1050
1061
1064
1064
1064
1065
1065
1065
1067
1067
1067
1068
1068
1068
1068
1069
1069
1071
1072
1072
1073
1074
1075
1076
1077
1079
1079
1082
1084
1084
1086
1086
1086
xx Extended Contents
17.6.3 3D maps of variability and asymmetry
17.6.4 Alzheimers disease
17.6.5 Gender in schizophrenia
17.7 Cortical modeling and analysis
17.7.1 Cortical matching
17.7.2 Spherical, planar maps of cortex
17.7.3 Covariant field equations
17.8 Cortical averaging
17.8.1 Cortical variability
17.8.2 Average brain templates
17.8.3 Uses of average templates
17.9 Deformationbased morphometry
17.9.1 Deformable probabilistic atlases
17.9.2 Encoding brain variation
17.9.3 Tensor maps of directional variation
17.9.4 Anisotropic Gaussian fields
17.9.5 Detecting shape differences
17.9.6 Tensorbased morphometry
17.9.7 Mapping brain asymmetry
17.9.8 Changes in asymmetry
17.9.9 Abnormal asymmetry
17.9.10 Modelbased shape analysis
17.10 Voxelbased morphometry
17.10.1 Detecting changes in stereotaxic tissue distribution
17.10.2 Stationary Gaussian random fields
17.10.3 Statistical flattening
17.10.4 Permutation
17.10.5 Joint assessment of shape and tissue distribution
17.11 Dynamic (4D) brain maps
17.12 Conclusion
17.13 Acknowledgments
17.14 References
18 Tumor Imaging, Analysis, and Treatment Planning
18.1 Introduction
18.2 Medical imaging paradigms
18.3 Dynamic imaging
18.4 Conventional and physiological imaging
18.5 Tissuespecific and physiologic nuclear medicine modalities
18.6 Positron emission tomography
18.7 Dynamic contrastenhanced MRI
18.8 Functional CT and MRI
1088
1088
1090
1090
1090
1093
1096
1097
1097
1098
1101
1101
1101
1101
1103
1104
1105
1106
1107
1107
1108
1109
1109
1109
1110
1111
1111
1112
1114
1114
1117
1117
1131
1132
1135
1136
1136
1137
1138
1139
1142
1142
1148
1150
1150
1153
1155
1155
1157
1159
1162
1162
1163
1164
1172
1174
1179
1179
1180
1185
1192
1192
1193
1195
1196
Index
1203
1165
1166
1168
1168
1170
Milan Sonka
milansonka@uiowa.edu
J. Michael Fitzpatrick
j.michael.fitzpatrick@vanderbilt.edu
CHAPTER 1
Statistical Image Reconstruction Methods for
Transmission Tomography
Jeffrey A. Fessler
University of Michigan
Contents
1.1
1.2
Introduction
The problem
1.2.1
1.2.2
1.2.3
1.2.4
1.3
1.4
1.5
3
4
Transmission measurements
Reconstruction problem
Likelihoodbased estimation
Penalty function
5
7
10
12
1.2.5 Concavity
Optimization algorithms
13
14
1.3.1
1.3.2
14
15
1.3.3
1.3.4
Convergence rate
Parabola surrogate
16
17
EM algorithms
1.4.1 Transmission EM algorithm
1.4.2 EM algorithms with approximate Msteps
1.4.3 EM algorithm with Newton Mstep
1.4.4 Diagonallyscaled gradientascent algorithms
18
20
25
25
27
28
34
35
35
36
39
1.6
39
40
1.6.1
1.6.2
1.6.3
1.6.4
41
41
43
1.6.5
1.7
1.8
1.9
Direct algorithms
1.7.1 Conjugate gradient algorithm
1.7.2 QuasiNewton algorithm
Alternatives to Poisson models
1.8.1 Algebraic reconstruction methods
45
45
46
46
47
1.8.2
47
Methods to avoid
1.11
1.12
1.13
1.14
44
45
48
49
50
51
52
52
52
53
53
53
53
53
54
56
57
57
1.15 References
58
Introduction 3
1.1
Introduction
The problem of forming crosssectional or tomographic images of the attenuation characteristics of objects arises in a variety of contexts, including medical xray
computed tomography (CT) and nondestructive evaluation of objects in industrial
inspection. In the context of emission imaging, such as positron emission tomography (PET) [1, 2], single photon emission computed tomography (SPECT) [3],
and related methods used in the assay of containers of radioactive waste [4], it is
useful to be able to form attenuation maps, tomographic images of attenuation
coefficients, from which one can compute attenuation correction factors for use in
emission image reconstruction. One can measure the attenuating characteristics of
an object by transmitting a collection of photons through the object along various
paths or rays and observing the fraction that pass unabsorbed. From measurements collected over a large set of rays, one can reconstruct tomographic images of
the object. Such image reconstruction is the subject of this chapter.
In all the above applications, the number of photons one can measure in a
transmission scan is limited. In medical xray CT, source strength, patient motion,
and absorbed dose considerations limit the total xray exposure. Implanted objects
such as pacemakers also significantly reduce transmissivity and cause severe artifacts [5]. In industrial applications, source strength limitations, combined with the
very large attenuation coefficients of metallic objects, often result in a small fraction of photons passing to the detector unabsorbed. In PET and SPECT imaging,
the transmission scan only determines a nuisance parameter of secondary interest
relative to the objects emission properties, so one would like to minimize the transmission scan duration. All the above considerations lead to lowcount transmission scans. This chapter discusses algorithms for reconstructing attenuation images
from lowcount transmission scans. In this context, we define lowcount to mean
that the mean number of photons per ray is small enough that traditional filteredbackprojection (FBP) images, or even methods based on the Gaussian approximation to the distribution of the Poisson measurements (or logarithm thereof), are
inadequate. We focus the presentation in the context of PET and SPECT transmission scans, but the methods are generally applicable to all lowcount transmission
studies. See [6] for an excellent survey of statistical approaches for the emission
reconstruction problem.
Statistical methods for reconstructing attenuation images from transmission
scans have increased in importance recently for several reasons. Factors include
the necessity of reconstructing 2D attenuation maps for reprojection to form 3D
attenuation correction factors in septaless PET [7, 8], the widening availability of
SPECT systems equipped with transmission sources [9], and the potential for reducing transmission noise in whole body PET images and in other protocols requiring short transmission scans [10]. An additional advantage of reconstructing
attenuation maps in PET is that if the patient moves between the transmission and
emission scan, and if one can estimate this motion, then one can calculate appropri
The problem
The problem 5
Object
Collimator
Translate
Source
Rotate
Detector
1.2.1
Transmission measurements
where is the Dirac delta function. For simplicity, we assume this monoenergetic case hereafter. (In the polyenergetic case, one must consider effects such as
beam hardening [21].)
The absorption and Compton scattering of photons by the object is governed by
Beers law. Let denote the mean number of photons that would be recorded by
the detector (for the th sourcedetector position, hereafter referred to as a ray) if
the object were absent. This depends on the scan duration, the source strength,
and the detector efficiency at the source photon energy . The dependence on
reflects the fact that in modern systems there are multiple detectors, each of which
1
Some gamma emitting radioisotopes produce photons at two or more distinct energies. If the
detector has adequate energy resolution, then it can separate photons at the energy of interest from
other photons, or bin the various energies separately.
where
where
(1.1)
is the line or strip between the source and detector for the th ray, and
is the linear attenuation coefficient at the source photon energy. The number of
photons actually recorded in practice differs from the ideal expression (1.1) in several ways. First, for a photoncounting detector 2 , the number of recorded photons
is a Poisson random variable [12]. Second, there will usually be additional background counts recorded due to Compton scatter [22], room background, random
coincidences in PET [23,24], or emission crosstalk in SPECT [9,2527]. Third, the
detectors have finite width, so the infinitesimal line integral in (1.1) is an approximation. For accurate image reconstruction, one must incorporate these effects into
the statistical model for the measurements, rather than simply using the idealized
model (1.1).
Let denote the random variable representing the number of photons counted
for the th ray. A reasonable statistical model 3 for these transmission measurements
is that they are independent Poisson random variables with means given by
(1.2)
where denotes the mean number of background events (such as random coincidences, scatter, and crosstalk). In many papers, the s are ignored or assumed
to be zero. In this chapter, we assume that the s are known, which in practice
means that they are determined separately by some other means (such as smoothing
a delayedwindow sinogram in PET transmission scans [31]). The noise in these
estimated s is not considered here, and is a subject requiring further investigation and analysis. In some PET transmission scans, the random coincidences are
subtracted from the measurements in real time. Statistical methods for treating this
problem have been developed [3234], and require fairly simple modifications of
the algorithms presented in this chapter.
We assume the s are known. In PET and SPECT centers, these are determined by periodic blank scans: transmission scans with nothing but air in the
2
For a current integrating detector, such as those used in commercial xray CT scanners, the
measurement noise is a mixture of Poisson photon statistics and gaussian electronic noise.
3
Due to the effects of detector deadtime in PET and SPECT, the measurement distributions are
not exactly Poisson [2830], but the Poisson approximation seems adequate in practice.
The problem 7
scanner portal. Since no patient is present, these scans can have fairly long durations (typically a couple of hours, run automatically in the middle of the night).
Thus the estimated s computed from such a long scan have much less variability than the transmission measurements ( s). Therefore, we ignore the variability
in these estimated s. Accounting for the small variability in the estimates is
another open problem (but one likely to be of limited practical significance).
1.2.2
Reconstruction problem
denote the true line integral along the th ray. Conventionally, one forms an estimate
of
by computing the logarithm of the measured sinogram as follows:
?
(1.3)
One then reconstructs an estimate
from
using FBP [35]. There are
several problems with this approach. First, the logarithm is not defined when
, which can happen frequently in lowcount transmission scans. (Typically one must substitute some artificial value (denoted ? above) for such rays,
or interpolate neighboring rays [36], which can lead to biases.) Second, the above
procedure yields biased estimates of the lineintegral. By Jensens inequality, since
is a concave function, for any random variable ,
(1.4)
Thus, the logarithm in (1.3) systematically overestimates the line integral on average. This overestimation has been verified empirically [14, 15]. One can show
4
When the ray measurements are organized as a 2D array according their radial and angular
coordinates, the projection of a point object appears approximately as a sinusoidal trace in the array.
8
6
8
4
2
x2
6
4
2
0
8
4
6
4
2
0
x1
analytically that the bias increases as the counts decrease [14], so the logarithm is
particularly unsuitable for lowcount scans. A third problem with (1.3) is that the
variances of the
s can be quite nonuniform, so some rays are much more informative than other rays. The FBP method treats all rays equally, even those for which
is nonpositive, which leads to noisy images corrupted by streaks originating
from high variance
s. Noise is considered only as an afterthought by apodizing
the ramp filter, which is equivalent to spaceinvariant smoothing. (There are a few
exceptions where spacevariant sinogram filtering has been applied, e.g., [3840].)
Fourth, the FBP method is poorly suited to nonstandard imaging geometries, such
as truncated fanbeam or conebeam scans, e.g., [4146].
Since noise is a primary concern, the image reconstruction problem is naturally
treated as a statistical estimation problem. Since we only have a finite number
of measurements, it is natural to also represent with a finite parameterization.
Such parameterizations are reasonable in practice since ultimately the estimate of
will be viewed on a digital display with a finite number of pixels. After one
has parameterized , the reconstruction problem becomes a statistical problem:
estimate the parameters from the noisy measurements .
A general approach to parameterizing the attenuation map is to expand it in
terms of a finite basis expansion [47, 48]:
(1.5)
where
The problem 9
many possible choices for the basis functions. We would like to choose basis functions that naturally represent nonnegative functions since . We would also
like basis functions that have compact support, since such a basis yields a very
sparse system matrix in (1.9) below. The conventional basis is just the pixel
or voxel basis, which satisfies both of these requirements. The voxel basis
is 1 inside the th voxel, and is 0 everywhere else. In twospace, one can express
the pixel basis by
(1.6)
where is the center of the th pixel and is the pixel width. This basis
gives a piecewiseconstant approximation to , as illustrated in Fig. 1.2. With any
parameterization of the form (1.5), the problem of estimating is reduced to
the simpler problem of estimating the parameter vector
from
the measurement vector
where denotes vector and matrix
transpose. Under the parameterization (1.5), the line integral in (1.2) becomes the
following summation:
where
is the line integral 5 along the th ray through the th basis function. This simplification yields the following discretediscrete measurement model:
(1.7)
(1.8)
(1.9)
In practice, we use normalized strip integrals [49, 50] rather than line integrals to account for
finite detector width [51]. Regardless, the units of are length units (mm or cm), whereas the units
of the s are inverse length.
Likelihoodbased estimation
For the Poisson model (1.7), the measurement joint probability mass function is
(1.10)
The ML method seeks the object (as described by the parameter vector ) that maximizes the probability of having observed the particular measurements that were
recorded. The first paper to propose a ML approach for transmission tomography
appears to be due to Rockmore and Macovski in 1977 [47]. However, the pseudoinverse method described in [47] in general does not find the maximizer of the
likelihood .
For independent transmission measurements, we can use (1.8) and (1.10) to
express the loglikelihood in the following convenient form:
(1.11)
where we use
hereafter for expressions that are equal up to irrelevant constants
independent of , and where the marginal loglikelihood of the th measurement is
(1.12)
A typical is shown in Fig. 1.4 on page 17. For convenience later, we also list the
derivatives of here:
6
(1.13)
(1.14)
The usual rationale for the ML approach is that ML estimators are asymptotically unbiased
and asymptotically efficient (minimum variance) under very general conditions [37]. Such asymptotic properties alone would be a questionable justification for the ML approach in the case of lowcount transmission scans. However, ML estimators often perform well even in the nonasymptotic
regime. We are unaware of any datafit measure for lowcount transmission scans that outperforms
the loglikelihood, but there is no known proof of optimality of the loglikelihood in this case, so the
question is an open one.
The problem 11
The algorithms described in the following sections are based on various strategies for finding the maximizer of . Several of the algorithms are quite general
in the sense that one can easily modify them to apply to many objective functions
of the form (1.11), even when has a functional form different from the form
(1.12) that is specific to transmission measurements. Thus, even though the focus of this chapter is transmission imaging, many of the algorithms and comments
apply equally to emission reconstruction and to other inverse problems.
Maximizing the loglikelihood alone leads to unacceptably noisy images,
because tomographic image reconstruction is an illconditioned problem. Roughly
speaking, this means that there are many choices of attenuation maps that
fit the measurements reasonably well. Even when the problem is parameterized, there are many choices of the vector that fit the measurements
reasonably well, where the fit is quantified by the loglikelihood . Not all of
those images are useful or physically plausible. Thus, the likelihood alone does
not adequately identify the best image. One effective remedy to this problem is
to modify the objective function by including a penalty function that favors reconstructed images that are piecewise smooth. This process is called regularization
since the penalty function improves the conditioning of the problem 7 . In this chapter we focus on methods that form an estimate of the true attenuation map
by maximizing a penalizedlikelihood objective function of the following form:
(1.15)
For emission tomography, a popular alternative approach to regularization is simply to postsmooth the ML reconstruction image with a Gaussian filter. In the emission case, under the somewhat
idealized assumption of a shiftinvariant Gaussian blur model for the system, a certain commutability
condition ((12) of [52] holds, which ensures that Gaussian postfiltering is equivalent to Gaussian
sieves. It is unclear whether this equivalence holds in the transmission case, although some authors
have implied that it does without proof, e.g. [53].
Penalty function
(1.16)
(1.17)
The problem 13
where
is a
penalty matrix and
!
#
where
(1.18)
$
$
! #
(1.19)
Concavity
,
From the second derivative expression (1.14), when ,
which is always nonpositive, so is concave over all of "# (and strictly concave if
). From (1.11) one can easily verify that the Hessian matrix (the
matrix of second partial derivatives) of is:
!
(1.20)
where ! is a
diagonal matrix with th diagonal element . Thus
the loglikelihood is concave over all of "# when . If the s are
all strictly convex and is concave, then the objective is strictly concave
under mild conditions on
[69]. Such concavity is central to the convergence
proofs of the algorithms described below. In the case , the likelihood is
not necessarily concave. Nevertheless, in our experience it seems to be unimodal.
(Initializing monotonic iterative algorithms with different starting images seems
to lead to the same final image.) In the nonconcave case, we cannot guarantee
global convergence to the global maximum for any of the algorithms described
below. For the monotonic algorithms we usually can prove convergence to a local
maximum [70]; if in addition the objective function is unimodal, then the only local
maximum will be the global maximum, but proving that is unimodal is an open
problem.
Optimization algorithms
analytically
(1.21)
$
$
$
$
where was defined in (1.13). Unfortunately, even disregarding both the nonnegativity constraint and the penalty function, there are no closedform solutions
to the set of equations (1.21), except in the trivial case when
. Even when
there are no closedform solutions for nonseparable penalty functions. Thus
iterative methods are required to find the maximizer of such objective functions.
1.3.1
Optimization algorithms 15
1.3.2
Before delving into the details of the many algorithms that have been proposed
for maximizing , we first describe a very useful and intuitive general principle
that underlies almost all the methods. The principle is called optimization transfer.
This idea was described briefly as a majorization principle in the limited context
of 1D line searches in the classic text by Ortega and Rheinbolt [71, p. 253]. It was
rediscovered and generalized to inverse problems in the recent work of De Pierro
[72, 73] and Lange [69, 74]. Since the concept applies more generally than just to
transmission tomography, we use % as the generic unknown parameter here.
The basic idea is illustrated in Fig. 1.3. Since is difficult to maximize, at the
&th iteration we can replace with a surrogate function ' % % that is easier to
maximize, i.e., the next iterate is defined as:
%
% %
(1.22)
The maximization is restricted to the valid parameter space (e.g. % for problems
with nonnegative constraints). Maximizing ' % will usually not lead directly
to the global maximizer . Thus one repeats the process iteratively, finding a new
surrogate function ' at each iteration and then maximizing that surrogate function.
If we choose the surrogate functions appropriately, then the sequence % should
eventually converge to the maximizer [75].
Fig. 1.3 does not do full justice to the problem, since 1D functions are usually
fairly easy to maximize. The optimization transfer principle is particularly compelling for problems where the dimension of % is large, such as in inverse problems
like tomography.
It is very desirable to use algorithms that monotonically increase each iteration, i.e., for which % % . Such algorithms are guaranteed to be
stable, i.e., the sequence % will not diverge if is concave. And generally
such algorithms will converge to the maximizer if it is unique [70]. If we choose
surrogate functions that satisfy
(1.23)
then one can see immediately that the algorithm (1.22) monotonically increases .
To ensure monotonicity, it is not essential to find the exact maximizer in (1.22). It
suffices to find a value % such that ' % % ' % % , since that
alone will ensure % % by (1.23). The various algorithms described
in the sections that follow are all based on different choices of the surrogate function
', and on different procedures for the maximization in (1.22).
Rather than working with (1.23), all the surrogate functions we present satisfy
Objective
Surrogate
0.8
0.6
0.4
0.2
%
%
' % %
%
%
(1.24)
(1.25)
% %
(1.26)
Any surrogate function that satisfies these conditions will satisfy (1.23). (The middle condition follows from the outer two conditions when and ' are differentiable.)
1.3.3
Convergence rate
The convergence rate of an iterative algorithm based on the optimization transfer principle can be analyzed qualitatively by considering Fig. 1.3. If the surrogate
function ' has low curvature, then it appears as a broad graph in Fig. 1.3, which
means that the algorithm can take large steps (% % can be large) which
means that it reaches the maximizer faster. Conversely, if the surrogate function
has high curvature, then it appears as a skinny graph, the steps are small, and
many steps are required for convergence. So in general we would like to find low
curvature surrogate functions, with the caveat that we want to maintain ' to
ensure monotonicity [76]. And of course we would also like the surrogate ' to be
Optimization algorithms 17
150
Loglikelihood
Parabola surrogate (
and (
140
130
120
110
100
90
80
70
60
50
0
,
and
,
,
.
easy to maximize for (1.22). Unfortunately, the criteria low curvature and easy
to maximize are often incompatible, so we must compromise.
1.3.4
Parabola surrogate
!
(
(1.27)
(1.28)
is the th line integral through the estimated attenuation map at the &th iteration.
The choice (1.27) clearly satisfies conditions (1.24) and (1.25), but we must care
fully choose ! !
to ensure that (
so that (1.26)
is satisfied. On the other hand, from the convergence rate description in the preceding section, we would like the curvature ! to be as small as possible. In other
!
!
(1.29)
where
is for positive and zero otherwise. Fig. 1.4 illustrates the surrogate
parabola ( in (1.27) with the optimal curvature (1.29).
One small inconvenience with (1.29) is that it changes every iteration since it
depends on
. An alternative choice of the curvature that ensures ( is the
maximum second derivative of
over . In [77] we show that
(1.30)
EM algorithms
The emission reconstruction algorithm derived by Shepp and Vardi in [79] and
by Lange and Carson in [17] is often referred to as the EM algorithm in the nuclear imaging community. In fact, the expectationmaximization (EM) framework
is a general method for developing many different algorithms [80]. The appeal
of the EM framework is that it leads to iterative algorithms that in principle yield
sequences of iterates that monotonically increase the objective function. Furthermore, in many statistical problems one can derive EM algorithms that are quite
simple to implement. Unfortunately, the Poisson transmission reconstruction problem does not seem to be such a problem. Only one basic type of EM algorithm
EM algorithms 19
has been proposed for the transmission problem, and that algorithm converges very
slowly and has other difficulties described below. We include the description of the
EM algorithm for completeness, but the reader who is not interested in the historical perspective could safely skip this section since we present much more efficient
algorithms in subsequent sections.
We describe the general EM framework in the context of problems where one
observes a realization of a measurement vector , and wishes to estimate a parameter vector % by maximizing the likelihood or penalized loglikelihood. To develop an EM algorithm, one must first postulate a hypothetical collection of random
variables called the complete data space . These are random variables that, in
general, were not observed during the experiment, but that might have simplified
the estimation procedure had they been observed. The only requirement that the
complete data space must satisfy is that one must be able to extract the observed
data from , i.e. there must exist a function such that
(1.31)
This is a trivial requirement since one can always include the random vector
itself in the collection of random variables.
Having judiciously chosen , an essential ingredient of any EM algorithm is
the following conditional expectation of the loglikelihood of :
) % % %
%
%
%
(1.32)
' % % ) % % %
(1.33)
Transmission EM algorithm
*
We assume that
and
and that the s are all mutually independent and statistically independent of all
the s. Furthermore, and are independent for
. However, and
are not independent. (The distributions of and do not depend on , so
are of less importance in what follows.)
For each , let be any permutation of the set of pixel indices
. Notationally, the simplest case is just when *, which
corresponds to the algorithm considered in [88]. Lange and Carson [17] assign
to the physical ordering corresponding to the ray connecting the source to the
detector. Statistically, any ordering suffices, and it is an open (and probably academic) question whether certain orderings lead to faster convergence. Given such
an ordering, we define the remaining s recursively by the following conditional
distributions:
Binomial
EM algorithms 21
where
+
,
where
&

for *
,
is
(1.34)
sion is
. *
.
(1.35)
in (1.35) yields
Therefore, the observed measurements are related to the complete data space by
so the condition (1.31) is satisfied. As noted in [89], there are multiple orderings of
the s that can be considered, each of which would lead to a different update, but
which would leave unchanged the limit if there is a unique maximizer (and provided
this EM algorithm is globally convergent, which has never been established for the
case ).
Figure 1.5 illustrates a loose physical interpretation of the above complete data
space. For the th ordering, imagine a sequence of layers of material with attenuation coefficients and thicknesses . Suppose a
...
Poisson number of photons is transmitted into the first layer. The number that
survive passage through that layer is , which then proceed through the second
layer and so on. The final number of photons exiting the sequence of layers is
, and this number is added to the random coincidences to form the observed counts . This interpretation is most intuitive when the pixels are ordered
according to the actual passage of photons from source to detector (as in [17]), but
a physical interpretation is not essential for EM algorithm development.
It follows from (1.35) and Appendix 1.14 that
Binomial
since a cascade of independent Binomials is Binomial with the product of the success probabilities.
Having specified the completedata space, the next step in developing an EM
algorithm is to find the surrogate function ) of (1.32). It follows from the
above specifications that the joint probability mass function (PMF) of is given
by
,
,
,
By applying the chain rule for conditional probabilities [90] and using (1.34):
,
,
,
EM algorithms 23
so
,
,
Thus, following [17, 88], the EMbased surrogate function for the above complete
data space has the following form:
)
/
(1.36)
where
)
/
(1.37)
/
(1.38)
(1.39)
To complete the Estep of the EM algorithm, we must find the preceding conditional
expectations. Using the law of iterated expectation [90]:
(1.40)
0
(1.41)
/
0
(1.42)
Combining (1.36), (1.37), (1.40), and (1.41) yields an explicit expression for the
EM surrogate ) , completing the Estep.
For the Mstep of the EM algorithm, we must find the maximizer of ) .
The function ) is a separable function of the s, as shown in (1.36), so
it is easier to maximize than . Thus the Mstep reduces to the
separable 1D
maximization problems:
)
(1.43)
)
/
(1.44)
for
/
/
yields:
/
/
(1.45)
EM algorithms 25
This algorithm is very slow to converge [69] and each iteration is very computationally expensive due to the large number of exponentiations required in (1.41).
One exponentiation per nonzero is required.
Lange and Carson [17] also describe an update based on a secondorder Taylor
series, and they note that one can use their expansion to find upper and lower bounds
for the exact value of that maximizes ) .
1.4.2
Since the Mstep of the transmission EM algorithm of [17] did not yield a
closed form for the maximizer, Browne and Holmes [91] proposed a modified EM
algorithm that used an approximate Mstep based on image rotations using bilinear
interpolation. Kent and Wright made a similar approximation [89]. An advantage
of these methods is that (after interpolation) the s are all equal, which is the
case where one can solve (1.44) analytically. Specifically, if for all and
, then (1.44) simplifies to
%
/%
%
%
%
where
and /
replace /
and
When solved for , this yields the iteration
% !
/
%
(1.46)
which is the logarithm of the ratio of (conditional expectations of) the number of
photons entering the th pixel to the number of photons leaving the th pixel, di% and
vided by the pixel size. However, the interpolations required to form
/% presumably destroy the monotonicity properties of the EM algorithm. Although bookkeeping is reduced, these methods require the same (very large) number of exponentiations as the original transmission EM algorithm, so they are also
impractical algorithms.
1.4.3
Ollinger [92,93] reported that the Mstep approximation (1.45) proposed in [17]
led to convergence problems, and proposed a 1D Newtons method for maximizing
) in the context of a GEM algorithm for the Mstep. Since Newtons method is
not guaranteed to converge, the step length was adjusted by a halving strategy to
1
)
)
(1.47)
)
!
/
0 0
)
so
)
/
0
(1.48)
1
0
(1.49)
From (1.48), the curvature of ) becomes unbounded as , which appears to preclude the use of parabola surrogates as described in Section 1.4.5.2 to
form an intrinsically monotonic Mstep (1.43).
Variations on the transmission EM algorithm continue to resurface at conferences, despite its many drawbacks. The endurance of the transmission EM algorithm can only be explained by its having ridden on the coat tails of the popular
emission EM algorithm. The modern methods described in subsequent sections are
entirely preferable to the transmission EM algorithm.
EM algorithms 27
1.4.4
Several authors, e.g. [94], noted that the emission EM algorithm can be expressed as a diagonallyscaled, gradientascent algorithm, with a particular diagonal scaling matrix that (almost miraculously) ensures monotonicity and preserves
nonnegativity. (The EMNR algorithm (1.49) has a similar form.) Based on an
analogy with that emission EM algorithm, Lange et al. proposed a diagonallyscaled gradientascent algorithm for transmission tomography [95]. The algorithm
can be expressed as the following recursion:
(1.50)
$
$
(1.51)
Since the gradient of the objective function is evaluated once per iteration, the number of exponentiations required is roughly
, far fewer than required by the transmission EM algorithm (1.45).
The choice of the
diagonal scaling matrix 2 critically affects
convergence rate, monotonicity, and nonnegativity. Considering the case ,
Lange et al. [95] suggested the following diagonal scaling matrix, chosen so that
(1.50) could be expressed as a multiplicative update in the case :
(1.52)
The natural generalization of this choice to the general case where is the
diagonal matrix with the following expression for the th diagonal element:
(1.53)
Using (1.51) and (1.53) one can rewrite the diagonallyscaled gradient ascent (DSGA) algorithm (1.50) as follows:
(1.54)
This is a multiplicative update that preserves nonnegativity, and at least its positive
fixed points are stationary points of the loglikelihood. However, the particular
choice of diagonal scaling matrix (1.53) does not guarantee intrinsically monotone
(1.55)
Although the rationale for this choice was not given in [96], Lange was able to
show that the algorithm has local convergence properties, but that it may not yield
nonnegative estimates. Lange further modified the scaledgradient algorithm in
[97] to include nonseparable penalty functions, and a practical approximate linesearch that ensures global convergence for .
Considering the case , Maniawski et al. [98] proposed the following overrelaxed unregularized version of the diagonallyscaled gradientascent algorithm
(1.50):
#
#
(1.56)
where # was selected empirically to be & times the total number of measured
counts in a SPECT transmission scan. Like (1.54), this is a multiplicative update
that preserves nonnegativity. One can also express the above algorithm more generally as follows:
#
#2
Convex algorithm
De Pierro [72] described a nonstatistical derivation of the emission EM algorithm using the concavity properties of the loglikelihood for emission tomography. Lange and Fessler [69] applied a similar derivation to the transmission
loglikelihood for the case , yielding a convex9 algorithm that, like the
transmission EM algorithm, is guaranteed to monotonically increase each iteration. As discussed in Section 1.2.5, the transmission loglikelihood is concave
9
The algorithm name is unfortunate, since the algorithm itself is not convex, but rather the algorithm is derived by exploiting the concavity of the loglikelihood.
EM algorithms 29
when , so De Pierros convexity method could be applied directly in [69].
In the case , the loglikelihood is not concave, so De Pierros convexity
argument does not directly apply. Fessler [14] noted that even when , the
marginal loglikelihood functions (the s in (1.11)) are concave over a (typically)
large interval of the real line, and thereby developed an approximate convex algorithm. However, the convex algorithm of [14] is not guaranteed to be globally
monotonic.
Rather than presenting either the convex algorithm of [69], which is incomplete
since it did not consider the case , or the algorithm of [14], which is nonmonotone, we derive a new convex algorithm here. The algorithm of [69] falls
out as a special case of this new algorithm by setting . The idea is to first use
the EM algorithm to find a concave surrogate function ) that eliminates the
terms, but is still difficult to maximize directly; then we apply De Pierros convexity
argument to ) to find another surrogate function ) that is easily maximized. The
same idea was developed independently by Kim [99].
Consider a complete data space that is the collection of the following statistically independent random variables:
where
3
(1.57)
3
(1.58)
The form of (1.57) is identical to the form that (1.11) would have if all the s were
zero. Therefore, by this technique we can generalize any algorithm that has been
derived for the case to the realistic case where simply by replacing
. However, in general the convergence of an algorithm
in the algorithm with
derived this way may be slower than methods based on direct maximization of
since the curvatures of the ) components are smaller than those of since
)
1
1
(1.59)
1
1 s, provided 1
only if
3
1
1
is concave, by (1.59):
1 3
(1.60)
10
is concave, unlike .
EM algorithms 31
where
3
1
3
)
13
is
)
(1.61)
where
)
13
1 3
1
(1.62)
Since ) is a separable function, its maximization reduces to
maximization problems:
simultaneous
(1.63)
ConvexNR algorithms
)
so
)
3
$
$
)
)
1
3
1
(1.64)
In [69] and [14], the following choice for the 1 s was used, following [72]
1
(1.65)
(1.66)
1
where
(1.67)
A small advantage of the choice (1.67) over (1.65) is that the s in the denominator of (1.67) are independent of so they can be precomputed, unlike the denominator of (1.65).
EM algorithms 33
Substituting the above into (1.50) yields the following ConvexNR2 algorithm:
(1.68)
Each iteration requires one forward projection (to compute the
s) and two backprojections (one each for the numerator and denominator). In general a line search
would be necessary with this algorithm to ensure monotonicity and convergence.
1.4.5.2
ConvexPS algorithm
The function ) in (1.62) cannot be maximized analytically, but we can apply
the optimization transfer principle of Section 1.3.2 to derive the first intrinsically
monotonic algorithm presented in this chapter. Using the surrogate parabola (1.27):
!
3
(
3
3
(1.69)
where from (1.29) the optimal curvature is
! !
(1.70)
)
1
(1.71)
)
3
)
1
!
Using the choice (1.67) for the 1 s yields the following ConvexPS algorithm:
"
#
#
$
!
%
&&
'
(1.72)
The
operation enforces the nonnegativity constraint. Since this is the first intrinsically monotonic algorithm presented in this chapter, we provide the following
more detailed description of its implementation.
for &
$
$
compute !
$
for
using (1.70),
!
$
(1.73)
.
(1.74)
Orderedsubsets EM algorithm
Coordinateascent algorithms 35
of subsets of the measured data, subsampled by projection angle, rather than using
all the measurements simultaneously. Manglos et al. [44] applied this concept to
the transmission EM algorithm (1.45), yielding the iteration:
/
/
(1.75)
All the algorithms described above were given for the ML case (where ).
What happens if we want to include a nonseparable penalty function for regularization, for example in the ConvexPS algorithm? Considering (1.33), it appears that
we should replace (1.71) with
)
(1.76)
Coordinateascent algorithms
A simple and natural approach to finding the maximizer of is to sequentially maximize over each element of using the most recent values for all
$
(1.77)
The operation in (1.77) is performed in place, i.e., the new value of replaces
the old value, so that the most recent values of all elements of are always used.
An early use of such a method for tomography was in [113].
Sauer and Bouman analyzed such algorithms using clever frequency domain
arguments [13], and showed that sequential algorithms yield iterates whose high
frequency components converge fastest. This is often ideal for tomography, since
we can use a lowresolution FBP image as the initial guess, and then iterate to
improve resolution and reduce noise, which is mostly high frequency errors. (Using
a uniform or zero initial image for coordinate ascent is a very poor choice since low
frequencies can converge very slowly.)
The long string of arguments in (1.77) is quite notationally cumbersome. For
the remainder of this section, we use
%
(1.78)
as shorthand for the vector of the most recent parameter values. For simplicity, this
notation leaves implicit the dependence of % on iteration & and pixel index .
The general method described by (1.77) is not exactly an algorithm, since the
procedure for performing the 1D maximization is yet unspecified. In practice it is
impractical to find the exact maximizer, even in the 1D problem (1.77), so we settle
for methods that increase .
1.5.1
Coordinateascent NewtonRaphson
Coordinateascent algorithms 37
Raphson (CANR) algorithm, we replace (1.77) with the following update:
"
##
#
$
%
&&
&'
(1.79)
The
operation enforces the nonnegativity constraint. The first partial derivative
is given by (1.21), and the second is given by:
$
$
$
$
(1.80)
is given by (1.14).
where
Specifically, using (1.13), (1.14), (1.21), and (1.80), the update (1.79) of the
CANR algorithm becomes
"
##
#
#$
!
%
%
%
&&
&&
'
(1.81)
Literally interpreted, this form of the CANR algorithm appears to be extremely inefficient computationally, because it appears to require that % be recomputed after
every pixel is updated sequentially. This would lead to 4
flops per iteration,
which is impractical.
In the following efficient implementation of CANR, we maintain a copy of
%
as a state vector, and update that vector after each pixel is updated.
"
##
$ #
#$
%
%
%
&&
&&
'
(1.82)
%
$ %
(1.83)
the backprojection, which saves many flops and nonsequential memory accesses.
In contrast, the numerator of (1.82) contains %
s that change after each pixel is
updated, so that expression cannot be precomputed. During the backprojection
step, one must access four arrays (nonsequentially): the s, s, s, and %
s,
in addition to the system matrix elements . And one must compute an exponentiation and a handful of addition and multiplications for each nonzero . For
these reasons, coordinate ascent is quite expensive computationally per iteration.
On the other hand, experience shows that if one considers the number of iterations
required for convergence, then CANR is among the best of all algorithms. The
PSCA algorithm described in Section 1.6.4 below is an attempt to capture the convergence rate properties of CANR, but yet guaranteeing monotonicity and greatly
reducing the flop counts per iteration.
An alternative approach to ensuring monotonicity would be to evaluate the objective function after updating each pixel, and impose an interval search in the
(hopefully relatively rare) cases where the objective function decreases. Unfortunately, evaluating after every pixel adds considerable computational overhead.
Coordinateascent algorithms 39
1.5.2
Besides CPU time, another potential problem with (1.82) is that it is not guaranteed to monotonically increase , so divergence is possible. One can ensure
monotonicity by applying the optimization transfer principle to the maximization
problem (1.77). One possible approach is to use a parabolic surrogate for the 1D
. For the fastest confunction
vergence rate, the optimal parabolic surrogate would have the lowest possible curvature, as discussed in Section 1.6.2 below. The surrogate parabola (1.27) with
optimal curvature (1.29) can be applied to (1.77) to yield an algorithm of the form
(1.82) but with a different expression in the denominator. Ignoring the penalty
function, the ML coordinate ascent parabola surrogate (CAPS) algorithm is
"
$ $
%
%
'
(1.84)
where ! was defined in (1.29). Unfortunately, this algorithm suffers from the
same high CPU demands as (1.82), so is impractical. To incorporate a penalty
function, one could follow a similar procedure as in Section 1.6.2 below.
Another approach to applying optimization transfer to (1.77) was proposed by
Saquib et al. [115] and Zheng et al. [116], called functional substitution. That
method also yields a monotonic algorithm, for the case since concavity of
is exploited in the derivation. The required flops are comparable to those of CANR. We can generalize the functional substitution algorithm of [116], to the case
by exploiting the EM surrogate described in Section 1.4.5 to derive a new
monotic algorithm. Essentially one simply replaces with
in the curvature terms in [116], yielding an algorithm that is identical to (1.82) but
with a different denominator.
Coordinate ascent algorithms are sequential update algorithms because the pixels are updated in sequence. This leads to fast convergence, but requires column
access of the system matrix , and makes parallelization quite difficult. In contrast, simultaneous update algorithms can update all pixels independently in parallel, such as the EM algorithms (1.45), (1.46), and (1.49), the scaled gradient ascent
algorithms (1.50), (1.54), and (1.56), and the Convex algorithms (1.66), (1.68), and
(1.72). However, a serious problem with all the simultaneous algorithms described
above, except ConvexPS (1.72), is that they are not intrinsically monotonic. (They
can all be forced to be monotonic by adding line searches, but this is somewhat
inconvenient.) In this section we describe an approach based on the optimization
transfer principle of Section 1.3.2 that leads to a simultaneous update algorithm that
is also intrinsically monotonic, as well as a sequential algorithm that is intrinsically
monotonic like CAPS (1.84), but much more computationally efficient.
As mentioned in Section 1.3.4, a principal difficulty with maximizing (1.15)
is the fact that the s in (1.12) are nonquadratic. Maximization is much easier
for quadratic functions, so it is natural to use the surrogate parabola described in
(1.27) to construct a paraboloidal surrogate function for the loglikelihood in
(1.11).
Using (1.29), define
! !
where
parabola
(1.85)
'
(1.86)
! !
(1.87)
'
! !
'
! !
'
(1.88)
There are three problems with this algorithm. The matrix inverse is impractical,
the method appears only to apply to quadratic penalty functions, and nonnegativity
is not enforced. Fortunately all three of these limitations can be overcome, as we
describe next.
1.6.2
(
1
1 (
1
1
)
)
(1.89)
)
1
1 (
(1.90)
0
0
!
0
#
!
#
0
0
!
0
0 #
(1.91)
.
(1.92)
where
0 #
Combining
function:
'
) and
)
!
0
'
where
'
)
)
1 !
!
! # 0
! , then
! ! #
(1.93)
Since the surrogate is separable, the optimization transfer algorithm (1.22) becomes
! #
!
(1.94)
One can easily form an ordered subsets version (cf. Section 1.4.6) of the SPS
algorithm (1.94) by replacing the sums over with sums over subsets of the rays,
yielding the ordered subsets transmission (OSTR) algorithm described in [109].
Since ordered subsets algorithms are not guaranteed to converge, one may as well
further abandon monotonicity and replace the denominator in the ordered subsets
version of (1.94) with something that can be precomputed. Specifically, in [109]
we recommend replacing the ! s in (1.94) with11
!
11
(1.95)
This trick is somewhat similar in spirit to the method of Fisher scoring [119, 120], in which one
replaces the Hessian with its expectation (the Fisher information matrix) to reduce computation in
nonquadratic optimization problems.
where
. For this fast denominator approximation, the OSTR/
algorithm becomes:
! #
(1.96)
!
The results in [109] show that this algorithm does not quite find the maximizer of
the objective function , but the images are nearly as good as those produced by
convergent algorithms in terms of mean squared error and segmentation accuracy.
1.6.4
A disadvantage of simultaneous updates like (1.94) is that they typically converge slowly since separable surrogate functions have high curvature and hence
slow convergence rates (cf. Section 1.3.3). Thus, in [77, 121] we proposed to apply coordinate ascent to the quadratic surrogate function (1.86). (We focus on the
quadratic penalty case here; the extension to the nonquadratic case is straightforward following a similar approach as in Section 1.6.2.) To apply CA to (1.86),
we sequentially maximize ' over each element , using the most recent
values for all other elements of , as in Section 1.5. We again adopt the shorthand
(1.78) here. In its simplest form, this leads to a paraboloidal surrogates coordinate
ascent (PSCA) algorithm having a similar form as (1.82), but with the inner update
being:
"
$ $
(
%
%
%
'
(1.97)
!
(1.98)
(
%
s
in
Direct algorithms 45
This precomputation saves many flops per iteration, yet still yields an intrinsically
monotonic algorithm. Even greater computational savings are possibly by a fast
denominator trick similar to (1.95), although one should then check for monotonicity after each iteration and redo the iteration using the monotonicity preserving denominators (1.98) in those rare cases where the objective function decreases.
There are several details that are essential for efficient implementation; see [77].
1.6.5
Direct algorithms
The algorithms described above have all been developed, to some degree, by
considering the specific form of the loglikelihood (1.11). It is reasonable to hypothesize that algorithms that are tailor made for the form of the objective function (1.15) in tomography should outperform (converge faster) general purpose optimization methods that usually treat the objective function as a black box in the
interest of greatest generality. Nevertheless, general purpose optimization is a very
active research area, and it behooves developers of image reconstruction algorithms
to keep abreast of progress in that field. General purpose algorithms that are natural
candidates for image reconstruction include the conjugate gradient algorithm and
the quasiNewton algorithm, described next.
1.7.1
For unconstrained quadratic optimization problems, the preconditioned conjugate gradient (CG) algorithm [125] is particularly appealing because it converges
rapidly13 for suitably chosen preconditioners, e.g. [67]. For nonquadratic objective
12
QuasiNewton algorithm
The ideal preconditioner for the conjugate gradient algorithm would be the inverse of the Hessian matrix, which would lead to superlinear convergence [131].
Unfortunately, in tomography the Hessian matrix is a large nonsparse matrix, so
its inverse is impractical to compute and store. The basic idea of the quasiNewton
family of algorithms is to form lowrank approximations to the inverse of the Hessian matrix as the iterations proceed [85]. This approach has been applied by
Kaplan et al. [132] to simultaneous estimation of SPECT attenuation and emission distributions, using the public domain software for limited memory, boundconstrained minimization (LBFGSB) [133]. Preconditioning has been found to
accelerate such algorithms [132].
1.8
Some of the algorithms described above are fairly complex, and this complexity derives from the nonconvex, nonquadratic form of the transmission Poisson
loglikelihood (1.11) and (1.12). It is natural then to ask whether there are simpler
approaches that would give adequate results in practice. Every simpler approach
that we are aware of begins by using the logarithmic transformation (1.3), which
compensates for the nonlinearity of Beers law (1.2) and leads then to a linear problem
(1.99)
unknowns and try to solve for . This was the motivation for the algebraic reconstruction technique (ART) family of algorithms [11]. For noisy measurements
the equations (1.99) are usually inconsistent, and ART converges to a limit cycle for inconsistent problems. One can force ART to converge by introducing appropriate strong underrelaxation [134]. However, the limit is the minimumnorm
weighted leastsquares solution for a particular norm that is unrelated to the measurement statistics. The GaussMarkov theorem [90] states that estimator variance
is minimized when the leastsquares norm is chosen to be the inverse of the covariance matrix, so it seems preferable to approach (1.99) by first finding a statisticallymotivated cost function, and then finding algorithms that minimize that cost function, rather than trying to fix up algorithms that were derived under the unrealistic
assumption that (1.99) is a consistent system of equations.
1.8.2
Methods to avoid
The iteration looks like an upside down emission EM algorithm. The convergence properties of this algorithm are unknown.
Zeng and Gullberg [136] proposed the following steepest ascent method with a
fixed steplength parameter:
$
$
Rather than simply treating (1.99) as a system of equations, we can use (1.99)
as the rationale for a weighted leastsquares cost function. There are several choices
for the weights.
1.8.3.1
Modelweighted LS
'
(1.100)
where was defined in (1.8). A natural modelweighted leastsquares cost function is then
(1.101)
This type of cost function has been considered in [137]. Unfortunately, the above
cost function is nonquadratic, so finding its minimizer is virtually as difficult as
maximizing (1.15).
1.8.3.2
Dataweighted LS
(1.102)
Emission reconstruction 49
1.8.3.3
Reweighted LS
The two cost functions given above represent two extremes. In (1.102), the
weights are fixed onceandforall prior to minimization, whereas in (1.101), the
weights vary continuously as the estimate of changes. A practical alternative is
to first run any inexpensive algorithm (such as OSTR) for a few iterations and then
reproject the estimated image to form estimated line integrals
.
Then perform a secondorder Taylor expansion of the loglikelihood (1.12) around
to find a quadratic approximation that can be minimized easily. This approach
should avoid the biases of dataweighted leastsquares, and if iterated is known as
reweighted least squares [140, 141].
1.9
Emission reconstruction
+
(1.103)
where represents the probability that an emission from the th voxel is recorded
by the th detector, and again denotes additive background counts such as random
coincidences and scatter. (Accurate models for the s can lead to significant
improvements in image spatial resolution and accuracy, e.g. [142, 143]. The loglikelihood has a similar form to (1.11):
+
+
where
(1.104)
EM Algorithm
One can derive the classical EM algorithm for the emission problem by a formal
completedata exposition [17], which is less complicated than the transmission case
but still somewhat mysterious to many readers, or by fixedpoint considerations
[79] (which do not fully illustrate the monotonicity of the emission EM algorithm).
Instead, we adopt the simple concavitybased derivation of De Pierro [72], which
reinforces the surrogate function concepts woven throughout this chapter.
The key to the derivation is the following multiplicative trick, which applies
if + :
+
+
+
(1.105)
The
terms in parentheses are nonnegative and sum to unity, so we can apply
the concavity inequality. Since 3
that
+
is concave on
,
it follows
!
!
+ +
3
+
!
!
!
+
+
3
3
+
) + +
) + +
) + +
) + +
+
+
+
(1.106)
Thus the the following parallelizable maximization step is guaranteed to monotonically increase the loglikelihood + each iteration:
+
)
+ +
(1.107)
+
+
Emission reconstruction 51
Equating to zero and solving for + yields the famous update:
+
+
(1.108)
(1.109)
For any pixels converging towards zero, these curvatures grow without bound. This
leads to very slow convergence; even sublinear convergence rates are possible [76].
1.9.2
An improved EM algorithm
One can choose a slightly better decomposition than (1.105) to get slightly
faster converging EM algorithms [75]. First find any set of nonnegative constants
 that satisfy

(1.110)
+

+
+ 
(1.111)
) + +
+

+ 3
+
+
+


(1.112)
This algorithm was derived by a more complicated EM approach in [75], and called
MLEM3. The surrogate function derivation is simpler to present and understand,
and more readily generalizable to alternative surrogates.
Advanced topics
In this section we provide pointers to the literature for several additional topics,
all of which are active research areas.
1.10.1
Advanced topics 53
In addition to the variety of methods for choosing , there is an even larger
variety of possible choices for the potential functions in (1.17), ranging from
quadratic to nonquadratic to nonconvex and even nondifferentiable. See [155] for
a recent discussion.
The absolute value potential ( " ") is particularly appealing in problems
with piecewise constant attenuation maps. However, its nondifferentiability greatly
complicates optimization [156158].
1.10.2
In PET and SPECT imaging, the attenuation map is a nuisance parameter; the
emission distribution is of greatest interest. This has spawned several attempts
to estimate the attenuation map from the emission sinograms, without a separate
transmission scan. See e.g., [132, 159, 160].
1.10.3
Overlapping beams
Example results
" "
"
(1.113)
Example results 55
FBP
MLOSEM8
PLOSTR16
PLPSCA
Figure 1.6: Reconstructed images of thorax phantom from 12minute PET transmission
scan.
of a patient injected with FDG and scanned with PET. In this case, a 2minute
transmission scan was emulated by binomial thinning of a 12minute transmission
scan [109]. For the subfigures labeled TPL and TFBP the ACFs were computed from attenuation maps reconstructed by penalized likelihood methods or by
FBP respectively. For the subfigures labeled EPL and EFBP, the emission
data was reconstructed by penalized likelihood methods or by FBP respectively.
The best image (upper left) is formed when both the emission and transmission
images are reconstructed by statistical approaches. The second best image (upper
right) is formed by using statistical reconstruction of the attenuation map, but ordinary FBP for the emission data. Clearly for such lowcount transmission scans,
reducing the noise in the ACFs is as important, if not more so, than how the emission images are reconstructed.
EPL,TPL
EFBP,TPL
EPL,TFBP
EFBP,TFBP
Figure 1.7: FDG PET emission images, reconstructed by both FBP (EFBP) and penalizedlikelihood (EPL) methods. Attenuation correction was performed using attenuation maps
generated either by transmission FBP (TFBP) or transmission penalizedlikelihood (TPL)
reconstructions. The use of statistical reconstruction methods significantly reduces image
noise.
1.12
Summary
We have summarized a wide variety of algorithms for statistical image reconstruction from transmission measurements. Most of the ideas underlying these
algorithms are applicable to emission tomography, as well as to image recovery
problems in general.
There is a wide variety of algorithms in part because there is yet to have been
found any algorithm that has all the desirable properties listed in Section 1.3.1.
In cases where the system matrix can easily be precomputed and stored, and a
nonparallel computer is to be used, we recommend the PSCA algorithm of Section 1.6.4. For parallel computing, the conjugate gradient algorithm [31] is a reasonable choice, particularly if exact nonnegativity constraints can be relaxed. If
an inexact maximum is acceptable, the OSTR algorithm of Section 1.6.3 is a very
practical choice, and is likely to be widely applied given the popularity of the emission OSEM algorithm. Meanwhile, the search continues for an algorithm with the
simplicity of OSTR that is parallelizable, monotone and fast converging, and can
accommodate any form of system matrix.
Acknowledgements 57
1.13
Acknowledgements
The ideas in this chapter were greatly influenced by the dissertation research
of Hakan Erdogan [181], who also prepared Fig. 1.6 and 1.7. The author also
gratefully acknowledges ongoing collaboration with Neal Clinthorne, Ed Ficaro,
Ken Lange, and Les Rogers. The author thanks Ken Hanson for his careful reading
of this chapter. This work was supported in part by NIH grants CA60711 and
CA54362.
1.14
*
*
Each of the
transmitted photons may either pass unaffected (survive passage)
or may interact with the object. These are Bernoulli trials since the photons interact
independently. From Beers law we know that the probability of surviving passage
is given by
,
The number of photons / that pass unaffected through the object is a random
variable, and from Beers law:
/
&
/
&


, ,
/
 &
&
&
&

, ,
&

, 
Therefore the distribution of photons that survive passage is also Poisson, with
mean /
,
&/
/
&


&
/ 
, ,
& 
& 
&
,
,
/
/ /
/
which is useful in deriving the transmission EM algorithm proposed in [17].
1.15
References
[1] M. M. TerPogossian, M. E. Raichle, and B. E. Sobel, Positronemission tomography, Scientific American, vol. 243, pp. 171181, Oct. 1980.
[2] J. M. Ollinger and J. A. Fessler, Positron emission tomography, IEEE Sig. Proc.
Mag., vol. 14, pp. 4355, Jan. 1997.
[3] T. F. Budinger and G. T. Gullberg, Three dimensional reconstruction in nuclear
medicine emission imaging, IEEE Tr. Nuc. Sci., vol. 21, no. 3, pp. 220, 1974.
[4] T. H. Prettyman, R. A. Cole, R. J. Estep, and G. A. Sheppard, A maximumlikelihood reconstruction algorithm for tomographic gammaray nondestructive assay, Nucl. Instr. Meth. Phys. Res. A., vol. 356, pp. 40752, Mar. 1995.
[5] G. Wang, D. L. Snyder, J. A. OSullivan, and M. W. Vannier, Iterative deblurring
for CT metal artifact reduction, IEEE Tr. Med. Im., vol. 15, p. 657, Oct. 1996.
[6] R. M. Leahy and J. Qi, Statistical approaches in quantitative positron emission
tomography, Statistics and Computing, 1998.
[7] K. Wienhard, L. Eriksson, S. Grootoonk, M. Casey, U. Pietrzyk, and W. D. Heiss,
Performance evaluation of a new generation positron scanner ECAT EXACT, J.
Comp. Assisted Tomo., vol. 16, pp. 804813, Sept. 1992.
[8] S. R. Cherry, M. Dahlbom, and E. J. Hoffman, High sensitivity, total body PET
scanning using 3D data acquisition and reconstruction, IEEE Tr. Nuc. Sci., vol. 39,
pp. 10881092, Aug. 1992.
[9] E. P. Ficaro, J. A. Fessler, W. L. Rogers, and M. Schwaiger, Comparison of
Americium241 and Technicium99m as transmission sources for attenuation correction of Thallium201 SPECT imaging of the heart, J. Nuc. Med., vol. 35,
pp. 65263, Apr. 1994.
References 59
[10] S. R. Meikle, M. Dahlbom, and S. R. Cherry, Attenuation correction using countlimited transmission data in positron emission tomography, J. Nuc. Med., vol. 34,
pp. 143150, Jan. 1993.
[11] G. T. Herman, Image reconstruction from projections: The fundamentals of computerized tomography. New York: Academic Press, 1980.
[12] A. Macovski, Medical imaging systems. New Jersey: PrenticeHall, 1983.
[13] K. Sauer and C. Bouman, A local update strategy for iterative reconstruction from
projections, IEEE Tr. Sig. Proc., vol. 41, pp. 534548, Feb. 1993.
[14] J. A. Fessler, Hybrid Poisson/polynomial objective functions for tomographic image reconstruction from transmission scans, IEEE Tr. Im. Proc., vol. 4, pp. 143950,
Oct. 1995.
[15] D. S. Lalush and B. M. W. Tsui, MAPEM and WLSMAPCG reconstruction
methods for transmission imaging in cardiac SPECT, in Proc. IEEE Nuc. Sci. Symp.
Med. Im. Conf., vol. 2, pp. 11741178, 1993.
[16] J. A. Fessler, Mean and variance of implicitly defined biased estimators (such as
penalized maximum likelihood): Applications to tomography, IEEE Tr. Im. Proc.,
vol. 5, pp. 493506, Mar. 1996.
[17] K. Lange and R. Carson, EM reconstruction algorithms for emission and transmission tomography, J. Comp. Assisted Tomo., vol. 8, pp. 306316, Apr. 1984.
[18] C. Bouman and K. Sauer, Fast numerical methods for emission and transmission
tomographic reconstruction, in Proc. 27th Conf. Info. Sci. Sys., Johns Hopkins,
pp. 611616, 1993.
[19] C. A. Bouman and K. Sauer, A unified approach to statistical tomography using
coordinate descent optimization, IEEE Tr. Im. Proc., vol. 5, pp. 48092, Mar. 1996.
[20] J. A. Fessler and W. L. Rogers, Spatial resolution properties of penalizedlikelihood
image reconstruction methods: Spaceinvariant tomographs, IEEE Tr. Im. Proc.,
vol. 5, pp. 134658, Sept. 1996.
[21] P. M. Joseph and R. D. Spital, A method for correcting bone induced artifacts in
computed tomography scanners, J. Comp. Assisted Tomo., vol. 2, pp. 1008, 1978.
[22] B. Chan, M. Bergstrom, M. R. Palmer, C. Sayre, and B. D. Pate, Scatter distribution in transmission measurements with positron emission tomography, J. Comp.
Assisted Tomo., vol. 10, pp. 296301, Mar. 1986.
[23] E. J. Hoffman, S. C. Huang, M. E. Phelps, and D. E. Kuhl, Quantitation in positron
emission computed tomography: 4 Effect of accidental coincidences, J. Comp. Assisted Tomo., vol. 5, no. 3, pp. 391400, 1981.
[24] M. E. Casey and E. J. Hoffman, Quantitation in positron emission computed tomography: 7 a technique to reduce noise in accidental coincidence measurements and
coincidence efficiency calibration, J. Comp. Assisted Tomo., vol. 10, no. 5, pp. 845
850, 1986.
References 61
[40] M. N. Wernick and C. T. Chen, Superresolved tomography by convex projections
and detector motion, J. Opt. Soc. Am. A, vol. 9, pp. 15471553, Sept. 1992.
[41] S. H. Manglos, Truncation artifact suppression in conebeam radionuclide transmission CT using maximum likelihood techniques: evaluation with human subjects, Phys. Med. Biol., vol. 37, pp. 549562, Mar. 1992.
[42] E. P. Ficaro and J. A. Fessler, Iterative reconstruction of truncated fan beam transmission data, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 3, 1993.
[43] J. A. Case, T. S. Pan, M. A. King, D. S. Luo, B. C. Penney, and M. S. Z. Rabin,
Reduction of truncation artifacts in fan beam transmission imaging using a spatially
varying gamma prior, IEEE Tr. Nuc. Sci., vol. 42, pp. 22605, Dec. 1995.
[44] S. H. Manglos, G. M. Gagne, A. Krol, F. D. Thomas, and R. Narayanaswamy,
Transmission maximumlikelihood reconstruction with ordered subsets for cone
beam CT, Phys. Med. Biol., vol. 40, pp. 122541, July 1995.
[45] T.S. Pan, B. M. W. Tsui, and C. L. Bryne, Choice of initial conditions in the ML
reconstruction of fanbeam transmission with truncated projection data, IEEE Tr.
Med. Im., vol. 16, pp. 42638, Aug. 1997.
[46] G. L. Zeng and G. T. Gullberg, An SVD study of truncated transmission data in
SPECT, IEEE Tr. Nuc. Sci., vol. 44, pp. 10711, Feb. 1997.
[47] A. J. Rockmore and A. Macovski, A maximum likelihood approach to transmission
image reconstruction from projections, IEEE Tr. Nuc. Sci., vol. 24, pp. 19291935,
June 1977.
[48] Y. Censor, Finite series expansion reconstruction methods, Proc. IEEE, vol. 71,
pp. 409419, Mar. 1983.
[49] R. Schwinger, S. Cool, and M. King, Area weighted convolutional interpolation
for data reprojection in single photon emission computed tomography, Med. Phys.,
vol. 13, pp. 350355, May 1986.
[50] S. C. B. Lo, Strip and line path integrals with a square pixel matrix: A unified
theory for computational CT projections, IEEE Tr. Med. Im., vol. 7, pp. 355363,
Dec. 1988.
[51] J. A. Fessler, ASPIRE 3.0 users guide: A sparse iterative reconstruction
library, Tech. Rep. 293, Comm. and Sign. Proc. Lab., Dept. of EECS,
Univ. of Michigan, Ann Arbor, MI, 481092122, July 1995. Available from
http://www.eecs.umich.edu/ fessler.
[52] D. L. Snyder, M. I. Miller, L. J. Thomas, and D. G. Politte, Noise and edge artifacts
in maximumlikelihood reconstructions for emission tomography, IEEE Tr. Med.
Im., vol. 6, pp. 228238, Sept. 1987.
[53] J. A. Browne and T. J. Holmes, Maximum likelihood techniques in Xray computed
tomography, in Medical imaging systems techniques and applications: Diagnosis
optimization techniques (C. T. Leondes, ed.), vol. 3, pp. 11746, Amsteldijk, Netherlands: Gordon and Breach, 1997.
References 63
[69] K. Lange and J. A. Fessler, Globally convergent algorithms for maximum a posteriori transmission tomography, IEEE Tr. Im. Proc., vol. 4, pp. 14308, Oct. 1995.
[70] R. R. Meyer, Sufficient conditions for the convergence of monotonic mathematical
programming algorithms, J. Comput. System. Sci., vol. 12, pp. 10821, 1976.
[71] J. M. Ortega and W. C. Rheinboldt, Iterative solution of nonlinear equations in several variables. New York: Academic Press, 1970.
[72] A. R. De Pierro, On the relation between the ISRA and the EM algorithm for
positron emission tomography, IEEE Tr. Med. Im., vol. 12, pp. 328333, June 1993.
[73] A. R. De Pierro, A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography, IEEE Tr. Med. Im., vol. 14, pp. 132137,
Mar. 1995.
[74] K. Lange, Numerical analysis for statisticians. New York: SpringerVerlag, 1999.
[75] J. A. Fessler and A. O. Hero, Penalized maximumlikelihood image reconstruction using spacealternating generalized EM algorithms, IEEE Tr. Im. Proc., vol. 4,
pp. 141729, Oct. 1995.
[76] J. A. Fessler, N. H. Clinthorne, and W. L. Rogers, On complete data spaces for PET
reconstruction algorithms, IEEE Tr. Nuc. Sci., vol. 40, pp. 105561, Aug. 1993.
[77] H. Erdogan and J. A. Fessler, Monotonic algorithms for transmission tomography,
IEEE Tr. Med. Im., vol. 18, pp. 80114, Sept. 1999.
[78] J. A. Fessler and H. Erdogan, A paraboloidal surrogates algorithm for convergent
penalizedlikelihood emission image reconstruction, in Proc. IEEE Nuc. Sci. Symp.
Med. Im. Conf., vol. 2, pp. 11325, 1998.
[79] L. A. Shepp and Y. Vardi, Maximum likelihood reconstruction for emission tomography, IEEE Tr. Med. Im., vol. 1, pp. 113122, Oct. 1982.
[80] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Royal Stat. Soc. Ser. B, vol. 39, no. 1, pp. 138,
1977.
[81] C. F. J. Wu, On the convergence properties of the EM algorithm, Ann. Stat.,
vol. 11, no. 1, pp. 95103, 1983.
[82] X. L. Meng and D. B. Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, vol. 80, no. 2, pp. 267278, 1993.
[83] J. A. Fessler and A. O. Hero, Spacealternating generalized expectationmaximization algorithm, IEEE Tr. Sig. Proc., vol. 42, pp. 266477, Oct. 1994.
[84] C. H. Liu and D. B. Rubin, The ECME algorithm: a simple extension of EM and
ECM with faster monotone convergence, Biometrika, vol. 81, no. 4, pp. 63348,
1994.
[85] K. Lange, A QuasiNewton acceleration of the EM Algorithm, Statistica Sinica,
vol. 5, pp. 118, Jan. 1995.
[86] D. A. van Dyk, X. L. Meng, and D. B. Rubin, Maximum likelihood estimation via
the ECM algorithm: computing the asymptotic variance, Statistica Sinica, vol. 5,
pp. 5576, Jan. 1995.
References 65
[102] H. M. Hudson and R. S. Larkin, Accelerated image reconstruction using ordered
subsets of projection data, IEEE Tr. Med. Im., vol. 13, pp. 601609, Dec. 1994.
[103] C. L. Byrne, Blockiterative methods for image reconstruction from projections,
IEEE Tr. Im. Proc., vol. 5, pp. 7923, May 1996.
[104] C. L. Byrne, Convergent blockiterative algorithms for image reconstruction from
inconsistent data, IEEE Tr. Im. Proc., vol. 6, pp. 12961304, Sept. 1997.
[105] C. L. Byrne, Accelerating the EMML algorithm and related iterative algorithms by
rescaled blockiterative methods, IEEE Tr. Im. Proc., vol. 7, pp. 1009, Jan. 1998.
[106] J. Nuyts, B. D. Man, P. Dupont, M. Defrise, P. Suetens, and L. Mortelmans, Iterative reconstruction for helical CT: A simulation study, Phys. Med. Biol., vol. 43,
pp. 72937, Apr. 1998.
[107] C. Kamphius and F. J. Beekman, Accelerated iterative transmission CT reconstruction using an ordered subsets convex algorithm, IEEE Tr. Med. Im., vol. 17,
pp. 10015, Dec. 1998.
[108] H. Erdogan, G. Gualtieri, and J. A. Fessler, An ordered subsets algorithm for transmission tomography, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., 1998. Inadvertently omitted from proceedings. Available from web page.
[109] H. Erdogan and J. A. Fessler, Ordered subsets algorithms for transmission tomography, Phys. Med. Biol., vol. 44, pp. 283551, Nov. 1999.
[110] T. Hebert and R. Leahy, A Bayesian reconstruction algorithm for emission tomography using a Markov random field prior, in Proc. SPIE 1092, Med. Im. III: Im.
Proc., pp. 4584662, 1989.
[111] T. Hebert and R. Leahy, A generalized EM algorithm for 3D Bayesian reconstruction from Poisson data using Gibbs priors, IEEE Tr. Med. Im., vol. 8, pp. 194202,
June 1989.
[112] T. J. Hebert and R. Leahy, Statisticbased MAP image reconstruction from Poisson
data using Gibbs priors, IEEE Tr. Sig. Proc., vol. 40, pp. 22902303, Sept. 1992.
[113] G. Gullberg and B. M. W. Tsui, Maximum entropy reconstruction with constraints:
iterative algorithms for solving the primal and dual programs, in Proc. Tenth
Intl. Conf. on Information Processing in Medical Im. (C. N. de Graaf and M. A.
Viergever, eds.), pp. 181200, New York: Plenum Press, 1987.
[114] C. A. Bouman, K. Sauer, and S. S. Saquib, Tractable models and efficient algorithms for Bayesian tomography, in Proc. IEEE Conf. Acoust. Speech Sig. Proc.,
vol. 5, pp. 290710, 1995.
[115] S. Saquib, J. Zheng, C. A. Bouman, and K. D. Sauer, Provably convergent coordinate descent in statistical tomographic reconstruction, in Proc. IEEE Intl. Conf. on
Image Processing, vol. 2, pp. 7414, 1996.
[116] J. Zheng, S. Saquib, K. Sauer, and C. Bouman, Functional substitution methods in
optimization for Bayesian tomography, IEEE Tr. Im. Proc., Mar. 1997. IEEE Tr.
Image Proc.
References 67
[133] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited memory algorithm for bound
constrained optimization, SIAM J. Sci. Comp., vol. 16, pp. 11901208, 1995.
[134] Y. Censor, P. P. B. Eggermont, and D. Gordon, Strong underrelaxation in Kaczmarzs method for inconsistent systems, Numerische Mathematik, vol. 41, pp. 83
92, 1983.
[135] Z. Liang and J. Ye, Reconstruction of objectspecific attenuation map for quantitative SPECT, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 2, pp. 12311235,
1993.
[136] G. L. Zeng and G. T. Gullberg, A MAP algorithm for transmission computed tomography, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 2, pp. 12021204, 1993.
[137] J. M. M. Anderson, B. A. Mair, M. Rao, and C. H. Wu, A weighted leastsquares
method for PET, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 2, pp. 12926,
1995.
[138] C. Bouman and K. Sauer, Nonlinear multigrid methods of optimization in Bayesian
tomographic image reconstruction, in SPIE Neural and Stoch. Methods in Image
and Signal Proc., 1992.
[139] C. Bouman and K. Sauer, A generalized Gaussian image model for edgepreserving
MAP estimation, IEEE Tr. Im. Proc., vol. 2, pp. 296310, July 1993.
[140] P. J. Green, Iteratively reweighted least squares for maximum likelihood estimation,
and some robust and resistant alternatives, J. Royal Stat. Soc. Ser. B, vol. 46, no. 2,
pp. 149192, 1984.
[141] M. B. Dollinger and R. G. Staudte, Influence functions of iteratively reweighted
least squares estimators, J. Am. Stat. Ass., vol. 86, pp. 709716, Sept. 1991.
[142] J. Qi, R. M. Leahy, C. Hsu, T. H. Farquhar, and S. R. Cherry, Fully 3D Bayesian
image reconstruction for the ECAT EXACT HR+, IEEE Tr. Nuc. Sci., vol. 45,
pp. 10961103, June 1998.
[143] J. Qi, R. M. Leahy, S. R. Cherry, A. Chatziioannou, and T. H. Farquhar, High resolution 3D Bayesian image reconstruction using the microPET smallanimal scanner, Phys. Med. Biol., vol. 43, pp. 100114, Apr. 1998.
[144] J. A. Browne and A. R. D. Pierro, A rowaction alternative to the EM algorithm
for maximizing likelihoods in emission tomography, IEEE Tr. Med. Im., vol. 15,
pp. 68799, Oct. 1996.
[145] A. M. Thompson, J. C. Brown, J. W. Kay, and D. M. Titterington, A study of methods or choosing the smoothing parameter in image restoration by regularization,
IEEE Tr. Patt. Anal. Mach. Int., vol. 13, no. 4, pp. 326339, 1991.
[146] J. W. Hilgers and W. R. Reynolds, Instabilities in the optimal regularization parameter relating to image recovery problems, J. Opt. Soc. Am. A, vol. 9, pp. 12731279,
Aug. 1992.
[147] Y. Pawitan and F. OSullivan, Datadependent bandwidth selection for emission
computed tomography reconstruction, IEEE Tr. Med. Im., vol. 12, pp. 167172,
June 1993.
References 69
[163] P. Sukovic and N. H. Clinthorne, Penalized weighted leastsquares image reconstruction in single and dual energy Xray computed tomography, IEEE Tr. Med.
Im., 1999. Submitted.
[164] J. A. Fessler, D. F. Yu, and E. P. Ficaro, Maximum likelihood transmission image
reconstruction for overlapping transmission beams, in Proc. IEEE Nuc. Sci. Symp.
Med. Im. Conf., 1999.
[165] D. F. Yu, J. A. Fessler, and E. P. Ficaro, Maximum likelihood transmission image reconstruction for overlapping transmission beams, IEEE Tr. Med. Im., 1999.
Submitted.
[166] J. E. Bowsher, M. P. Tornai, D. R. Gilland, D. E. G. Trotter, and R. J. Jaszczak, An
EM algorithm for modeling multiple or extended TCT sources, in Proc. IEEE Nuc.
Sci. Symp. Med. Im. Conf., 1999.
[167] A. Krol, J. E. Bowsher, S. H. Manglos, D. H. Feiglin, and F. D. Thomas, An EM algorithm for estimating SPECT emission and transmission parameters from emission
data only, Phys. Med. Biol., 1999. Submitted.
[168] D. J. Rossi and A. S. Willsky, Reconstruction from projections based on detection and estimation of objectsParts i & II: Performance analysis and robustness
analysis, IEEE Tr. Acoust. Sp. Sig. Proc., vol. 32, pp. 886906, Aug. 1984.
[169] D. J. Rossi, A. S. Willsky, and D. M. Spielman, Object shape estimation from
tomographic measurementsa performance evaluation, Signal Processing, vol. 18,
pp. 6388, Sept. 1989.
[170] Y. Bresler, J. A. Fessler, and A. Macovski, Model based estimation techniques
for 3D reconstruction from projections, Machine Vision and Applications, vol. 1,
no. 2, pp. 11526, 1988.
[171] Y. Bresler, J. A. Fessler, and A. Macovski, A Bayesian approach to reconstruction
from incomplete projections of a multiple object 3D domain, IEEE Tr. Patt. Anal.
Mach. Int., vol. 11, pp. 84058, Aug. 1989.
[172] J. A. Fessler and A. Macovski, Objectbased 3D reconstruction of arterial trees
from magnetic resonance angiograms, IEEE Tr. Med. Im., vol. 10, pp. 2539, Mar.
1991.
[173] S. P. Muller, M. F. Kijewski, S. C. Moore, and B. L. Holman, Maximumlikelihood
estimation: a mathematical model for quantitation in nuclear medicine, J. Nuc.
Med., vol. 31, pp. 16931701, Oct. 1990.
[174] C. K. Abbey, E. Clarkson, H. H. Barrett, S. P. Mu ller, and F. J. Rybicki, A method
for approximating the density of maximum likelihood and maximum a posteriori
estimates under a Gaussian noise model, Med. Im. Anal., vol. 2, no. 4, pp. 395
403, 1998.
[175] P. C. Chiao, W. L. Rogers, N. H. Clinthorne, J. A. Fessler, and A. O. Hero, Modelbased estimation for dynamic cardiac studies using ECT, IEEE Tr. Med. Im., vol. 13,
pp. 21726, June 1994.
[176] Y. Amit and K. Manbeck, Deformable template models for emission tomography,
IEEE Tr. Med. Im., vol. 12, pp. 260268, June 1993.
CHAPTER 2
Image Segmentation
Benoit M. Dawant
Vanderbilt University
Alex P. Zijdenbos
McGill University
Contents
2.1
2.2
Introduction
Image preprocessing and acquisition artifacts
73
73
2.3
74
74
78
2.3.1
2.3.2
78
79
2.3.3
84
2.4
Edgebased techniques
2.4.1 Border tracing
2.4.2 Graph searching
2.4.3 Dynamic programming
2.4.4 Advanced border detection methods
2.4.5 Hough transforms
2.5
Regionbased segmentation
2.5.1 Region growing
2.5.2 Region splitting and merging
2.5.3 Connected component labeling
98
98
99
100
2.6
Classification
101
2.6.1
2.6.2
2.6.3
103
109
110
71
88
88
89
91
93
94
72 Image Segmentation
2.6.4
2.6.5
2.7
2.8
2.9
111
116
119
120
120
Introduction 73
2.1
Introduction
Image segmentation, defined as the separation of the image into regions, is one
of the first steps leading to image analysis and interpretation. The goal is to separate the image into regions that are meaningful for a specific task. This may, for
instance, involve the detection of organs such as the heart, the liver, or the lungs
from MR or CT images. Other applications may require the calculation of white
and gray matter volumes in MR brain images, the labeling of deep brain structures
such as the thalamus or the hippocampus, or quantitative measurements made from
ultrasound images. Image segmentation approaches can be classified according to
both the features and the type of technique used. Features include pixel intensities,
gradient magnitudes, or measures of texture. Segmentation techniques applied to
these features can be broadly classified into one of three groups [1]: regionbased,
edgebased, or classification. Typically, regionbased and edgebased segmentation
techniques exploit, respectively, withinregion similarities and betweenregion differences between features, whereas a classification technique assigns class labels
to individual pixels or voxels based on feature values.
Because of issues such as spatial resolution, poor contrast, illdefined boundaries, noise, or acquisition artifacts, segmentation is a difficult task and it is illusory
to believe that it can be achieved by using graylevel information alone. A priori
knowledge has to be used in the process, and socalled lowlevel processing algorithms have to cooperate with higher level techniques such as deformable and
active models or atlasbased methods. These highlevel techniques are described
in great detail in Chapters 3 and 17 of this handbook and they will only be mentioned briefly in this chapter to provide the appropriate links. This chapter focuses
on the segmentation of the image into regions based only on graylevel information. Methods relying on a single image, also called monomodality methods, will
be presented as well as multimodal methods that take advantage of several coregistered images (see Chapter 8). Because image segmentation is a broad field
that can only be touched upon in a single chapter, the reader will be introduced to
basic segmentation methods. A number of pointers to the pertinent literature will
be provided for more detailed descriptions of these methods or for more advanced
techniques that build upon the concepts being introduced.
2.2
Image quality and acquisition artifacts directly affect the segmentation process.
These artifacts, their cause, and possible solutions are discussed in volume 1 of
this series, Physics and Psychophysics. Here, two leading artifacts that affect MR
images, namely partial volume and intensity nonuniformity, are discussed because
several segmentation methods have been proposed by the medical image processing
community to take these effects into consideration in the segmentation process.
74 Image Segmentation
Tissue 1
Tissue 2
I(tissue 1)
I(tissue 2)
Figure 2.1: A 2D view of an image slice cutting through the border between two tissues
(top). The resulting intensity profile along the line is also shown (bottom).
2.2.1
Strictly speaking, the socalled partial volume effect (PVE), i.e., the mixing of
different tissue types in a single voxel, is not an artifact, but it is caused by the
finite spatial resolution of the images. PVE causes edge blurring between different
tissue types and reduces the accuracy and reliability of measurements taken on the
images, as is shown in Figure 2.1.
This figure shows how a slice image cuts through the region that separates
two tissues; it also shows the intensity profile that would result in the slice. The figure clearly illustrates that, depending on the angle with which the slice image cuts
the tissue boundary, it will be more or less difficult, if not impossible, to localize
the true location of the edge from the image data. Without increasing the spatial
resolution of the data, PVE is not correctable. It can, however, be modeled, and
the model can be incorporated into the design of a segmentation technique. This
is for instance the case in fuzzy approaches such as the fuzzy means classifiers
described in Section 2.6.
2.2.2
intensity
Figure 2.2: An MR image that exhibits intensity nonuniformity in the vertical direction (left)
and the intensity profile along line AB (right).
image affected by intensity nonuniformity. This figure also shows an intensity profile, taken along the indicated line, which clearly indicates the spatial variation of
the MRI signal.
Correction of intensity nonuniformity is typically based on a multiplicative
model of the artifact:
(2.1)
In this equation, and are respectively the observed and true (artifactfree)
signal at spatial location , and is the distortion factor at the same position.
Noise effects are ignored in this simple model. Using (2.1), the true image intensities are obtained by multiplying the observed image with the reciprocal of the
estimated nonuniformity field .
A number of methods for estimating the INU field (often called bias field)
have been proposed in the literature, based on a) specialized acquisition protocols [79], b) images acquired of a homogeneous phantom (the correction matrix
approach) [3,1012], and c) analysis of the image data itself [10,1322]. Recently,
it has been shown that most of the intensity nonuniformity is caused by the geometry and electromagnetic properties of the subject in the scanner [5, 6], which
implies that the correction matrix approach only has limited applicability. Since a
discussion of MRI physics and acquisition sequences is beyond the scope of this
chapter, we will limit ourselves to the discussion of datadriven methods.
Two main approaches have been proposed: either the nonuniformity field is
estimated from the image data directly, or in conjunction with a segmentation algorithm. A number of authors have proposed spatial filtering (often homomorphic
filtering [23]) as a means to estimate the INU field [10, 14, 16, 17, 19]. The disad
76 Image Segmentation
vantage of this technique is that the frequency spectrum of the INU field and
that of the true image are assumed to be separable, which is typically not the
case. As such, spatial filtering methods tend to introduce severe, undesirable filtering artifacts, the effects of which some authors have tried to reduce using heuristic
approaches [14, 16, 17, 19].
Dawant and Zijdenbos [15] have proposed a method for MRI brain scans which
estimates the INU field by interpolating, using a thinplate spline, the intensity values at a collection of manually labeled white matter locations. The authors subsequently proposed a modified, semiautomated version of this algorithm where the
white matter reference locations were identified using an intermediate classification step [22]. Another approach that relies on an initial segmentation of the image
has been proposed by Meyer et al. [18]. In this work, segmentation of the image
into homogeneous regions is obtained using a method known as LCJ segmentation [24, 25], and the field is modeled by a polynomial. Brechb u hler et al. [13]
also use a polynomial model of the INU field, but rather than relying on an initial
segmentation of the image, they develop a cost function related to the sharpness
of the histogram peaks. Using taboo search [26], they compute the polynomial
coefficients that minimize this function.
A robust method, called N3 (Nonparametric Nonuniform intensity Normalization) has been proposed by Sled et al. [20]. This method is fully automatic and
does not rely on an explicit segmentation of the image, nor on prior knowledge
of tissue class statistics. Instead, the N3 algorithm relies on the observation that
the INU field widens the range of intensities occupied by one tissue class. This
can be viewed as a blurring of the probability density function (pdf) of the image
(approximated by the image histogram).
In order to correct for intensity nonuniformity, N3 aims to iteratively sharpen
the image histogram by removing spatially smooth fields from the image. Since the
space of possible fields that fulfill this requirement is extremely large, reasonable
field estimates are proposed by smoothing a field of local correction factors derived
from an active histogram sharpening operation:
Algorithm 2.1: N3 algorithm for INU correction
1. Perform a logarithmic transform of the image intensities:
This permits writing Equation 2.1 as
.
.
Before
Estimated Field
After
(a)
(c)
(e)
(b)
(d)
(f)
N3 requires only two parameters: the expected pdf of the INU field and a parameter governing the smoothness of the Bspline field approximation. Sled et
al. [20] have shown that N3 is robust with respect to these parameters, largely because of the iterative nature of the algorithm. Figure 2.3 shows an example of an
MR image suffering from intensity nonuniformity, the N3estimated INU field, and
the corrected image.
78 Image Segmentation
Other methods, detailed in sections 2.3.2 and 2.6, have been proposed which
couple the INU field estimation to a segmentation technique. This includes the
ExpectationMaximization (EM) segmenter developed by Wells et al. [21], which
was subsequently improved by Guillemaud et al. [27]; an adapted version of the EM
algorithm by Van Leemput et al. [28]; and the fuzzy clustering method described
by Pham and Prince [29].
The remainder of this chapter describes methods and techniques for the segmentation and the classification of images. First, we discuss methods based on the
analysis of the graylevel histogram and we present some methods that have been
used to both segment the images and correct for the INU field. Next, edgebased
and regionbased methods are introduced. This is followed by a section on classification methods in which techniques for simultaneous classification and intensity
correction are also described.
2.3
Thresholding
Thresholding 79
detection operation and to eliminate from the histogram computation all pixels that
have been labeled as an edge. Another approach [31, 32] is to create a modified
histogram as follows:
(2.2)
designed to reduce the contribution of pixels close to the edge to the overall histogram. By varying the value of , the weight associated with each edge pixel
while building the histogram can be adjusted.
2.3.2
Optimal thresholding
80 Image Segmentation
Algorithm 2.2: Iterative Computation of Intensity Thresholds
1. Select two initial thresholds and ; these can be chosen as
and
,
respectively, with G the maximum intensity value.
2. Compute the following:
with
This algorithm is fast and despite its simplicity has good convergence properties, but it is only one example of the many algorithms of this class that have been
proposed over the years. Good reviews and comparisons of several alternatives can
be found in [35] and [36].
Parametric optimal thresholding
Assuming again a twoclass problem and assuming that the distribution of gray
levels for each class can be modeled by a normal distribution with mean and variance , the overall normalized intensity histogram can be written as the following
mixture probability density function:
with and the a priori probability for class 1 and class 2, respectively. It can
be shown [23] that the optimal thresholds, i.e., the threshold that minimizes the
Thresholding 81
probability of labeling a pixel pertaining to class one as being a pixel pertaining to
class 2 and vice and versa are the roots of the following quadratic equation:
!
with
In the case when the two variances can be assumed to be equal, this expression
simplifies to
in which
and
are the observed and hypothesized histograms, respectively, and in which we have assumed histograms with G gray levels. Unfortunately, carrying the derivations required to determine the analytical solution of
this equation, even in the case of normal distributions, leads to a set of simultaneous transcendental equations that have to be solved numerically. Numerical
methods proposed to solve these equations range from conjugate gradient, Newton,
LevenburgMarquardt [38], or tree annealing [39]. In [39] this approach was used
for the quantification of MR brain images into white matter, gray matter, CSF, as
well as three partial volume classes. These are modeled as mixtures of CSF and
gray matter, CSF and white matter, and gray and white matter, respectively.
82 Image Segmentation
Maximum likelihood methods do not try to estimate the distribution parameters
by fitting the intensity histogram. Rather they aim at estimating a set of parameters
that maximize the probability of observing the pixel intensity distribution. Suppose
the following:
1. The pixels come from a known number of classes "
$
" "
in which is the complete parameter vector . Assuming independence between pixels, their joint density function is given by
,
with the total number of pixels. When, as it is the case here, the joint density function is viewed as a function of for fixed values of the observations, it
is called the likelihood function. The maximum likelihood estimator for the set
of parameters is the set of parameters that maximizes this likelihood function or,
equivalently, that maximizes the logarithm of this function called the loglikelihood
function
. The parameters can thus be obtained by computing
the partial derivatives of the loglikelihood function and solving these equations
with respect to the parameters. Performing this differentiation and using Bayes
rule
"
"
" "
"
(2.3)
Thresholding 83
one finds
"
"
"
"
"
"
"
"
(2.4)
(2.5)
(2.6)
84 Image Segmentation
0.040
Original Estimate
Final Estimate
True Distribution
Probability
0.030
0.020
0.010
0.000
20
40
60
Gray Level
80
100
Figures 2.4 and 2.5 show examples of results obtained with the EM algorithm
on simulated and real data, respectively. The histogram in the second figure has
been obtained from an MR image volume of the brain. The three peaks correspond
to CSF, gray matter, and white matter, respectively. In both cases, the dotted lines
shows the mixture probability density function obtained with the initial parameters.
The curves labeled with the diamonds and the crosses are the true distributions and
the distributions obtained with the parameter estimated with the EM algorithm, respectively. Note that for the simulated case, the true distribution and the distribution
obtained with the EM algorithm are indistinguishable.
2.3.3
"
(2.7)
with the value of the field a location
To capture the a priori knowledge that
the field is slowly variant, it is modeled as a Mdimensional (with M the total
number of voxels in the image) zeromean Gaussian probability density function
, with the covariance matrix of this distribution. The
posterior probability of the field given the observed intensity values can be written
Thresholding 85
Original Estimate
Final Es imate
True Distribution
0.020
Probability
0.015
0.010
0.005
0.000
20
40
60
Gray Level
80
100
Figure 2.5: EM algorithm for mixture parameter estimation; histogram computed from real
MR volume.
as
The INU field is then taken as the one with the largest posterior probability. In
practice, however, the covariance matrix of the distribution used to model the field
is huge and impractical to compute. The following scheme is introduced to solve
the problem. First, the weights % are introduced:
%
" "
"
"
(2.8)
These express the probability that a biascorrected voxel with graylevel value
belongs to class Next, the mean residual values are computed as
&
%
(2.9)
Again, this equation can be best understood if one supposes that the weights
are binary i.e., % if voxel
belongs to class and zero otherwise. The
residual & is then simply the difference in intensity value between the mean of
class and the intensity value of the particular voxel. If the classification is correct,
this difference is a good estimator of the field value at this point. The difference
is further divided by the variance of the class to capture the confidence in this
86 Image Segmentation
estimator. A residue image ' is created by computing the residual value at every
voxel. The field is finally obtained by applying a lowpass filter ( derived from the
mean variance of the tissue class density functions and the covariance of the field
on the residue image. This algorithm can be summarized as follows:
This approach suffers from two main weaknesses. First, the mean and the variances of the various tissue classes are assumed to be known a priori and are not reestimated from iteration to iteration. Second, in regions where the partial volume
effect is important and when such partial volume regions are not modeled explicitly
in the original intensity probability density function, residual values can be erroneously large. This, in turn, affects the accuracy of the field estimator. Guillemaud
and Brady [27] have proposed an extension to this algorithm in which the overall
intensity probability density function is modeled as a sum of Gaussians with small
variances plus a nonGaussian distribution. They have shown that this approach
increases the robustness of the algorithm. But, this approach still requires knowing
a priori the distribution parameters. These are typically estimated by identifying
manually representative voxels for each tissue class in the image. This may result
in segmentations that are not fully reproducible.
Van Leemput et al. [28] have proposed a generalization of the EM algorithm
that is fully automatic and in which the distribution parameters are reestimated
from iteration to iteration. Rather than modeling the field using a zeromean Gaussian distribution, they use a parametric model. The field is expressed as a linear
combination ) of smooth basis functions ) In their approach, equation 2.7 is rewritten as follows:
"
)
Following the same procedure that was used to derive equations 2.42.6, the
Thresholding 87
mean and variance parameters for the distributions are computed as:
) "
"
"
..
.
* % & with *
* % *
(2.10)
)
"
(2.11)
+ +
+ +
..
.
..
.
..
(2.12)
and
% ,

&
..
.
,
"
"
"
The weights and the residual matrix & are similar to those used by Wells et
al. Here, the field is computed by fitting a polynomial to the residual image using
a weighted leastsquares (equation 2.12). The weights used in this equation are
inversely proportional to the weighted variance of the tissue distributions. If a pixel
is classified into a class with large variance, its contribution to the fit is reduced.
Conversely, pixels that have been classified to classes with small variances such as
white matter or gray matter will have a large impact on the values of the coefficients.
Algorithm 2.5: EM Algorithm for tissue class parameters and bias field estimation
1. Compute the a posteriori probability "
using estimated values
for the parameter and the bias field coefficient vector (at the first iteration, an estimate for the parameter vector can be obtained, for instance,
from the histogram using a shapebased technique, and the field can be assumed to be uniform).
88 Image Segmentation
2. Compute the parameter vector using equations 2.10, 2.11 and the a posteriori probabilities computed in step 1.
3. Using equation 2.12, and the estimators computed in step 2, compute the bias
field coefficient vector .
4. If the parameter and coefficient vectors did not change from one iteration to
the other, stop; otherwise, go to step 1.
Extension of this formalism to multimodal images and corrections for interslice intensity variations in 2D MR image acquisition sequences have been presented in [28]. These authors also propose modeling some of the classes in the
image with nonGaussian distributions and using a priori class information derived
from an atlas (see Chapter 17 for more information on this topic). This procedure allows them to use both intensity information and spatial information in the
segmentation process.
2.4
Edgebased techniques
Border tracing
Edgebased techniques 89
(4 or 8connected) for which the following inequalities hold:
)
)
mod
Graph searching
90 Image Segmentation
Figure 2.6: Left panel, edge map (magnitude and direction); right panel directed graph
derived from the edge map.
costs, this algorithm does require an estimate of the remaining distance to the end
node. For boundary detection, such an estimate can be computed as the difference
between the current length of the boundary and its expected total length.
Defining the cost of a node along a particular path 0 0 as
0
2 0 with 2 0 and 0 the local and transition
costs, the total cost at a particular node can be written as 0 0
0, with
0 the lower bound estimate. The * algorithm can then be described as follows.
Algorithm 2.6: Algorithm for optimal path search with lower bound estimate (Adapted from [43].)
1. Select the starting node, expand it, put all its successors on an OPEN list, and
set a pointer back from each node to its predecessor.
2. If no node is on the OPEN list, stop; otherwise continue.
3. From the OPEN list, select the node 0 with the smallest cost 0. Remove
it from the OPEN list, and mark it CLOSED.
4. If 0 is is a goal node, backtrack from this node to the start node following the
pointers to find the optimum path and stop.
5. Expand 0, generating all of its successors.
6. If a successor is not CLOSED or on the OPEN list, set 0 0
0 0
2 0 , put it on the open list, and set a pointer to its predecessor.
Edgebased techniques 91
7. If a successor 0 is already on the OPEN list or CLOSED, update it us
ing 0 0 0
0 0 , mark OPEN the CLOSED nodes
whose cost was lowered and redirect the pointers of the nodes for which the
cost was lowered to 0 Go to Step 2
In general this algorithm does not lead to an optimal path, but if 0 is truly
a lower bound estimate on the cost from node 0 to an end node, the path is optimal [44].
2.4.3
Dynamic programming
5
4
92 Image Segmentation
Figure 2.7: Left panel: static cost matrix (small number indicates highly likely edge location); middle panel: cumulative cost matrix computed; right panel: pointer array and optimum
path (shaded circles).
the figure. Entries in this matrix simply point to the node from which the optimum
path reaching a particular node originates. The optimum path is thus determined
by starting at the end node with the lowest value and following the pointers back to
the first node.
In this example, the search for an optimal path is greatly simplified because
the boundary is elongated and the search is, in fact, only a 1D search performed
on the columns of the matrix for each row. Typical applications for this technique
include the detection of vessels in medical images. But, dynamic programming has
also been used successfully for the detection of closed contours. The only thing required to do so is to apply a spatial transformation (such as a polar transformation
as proposed by Gerbrands [46] for the detection of the ventricle in scintigraphic
images) to the image prior to boundary detection. The purpose of this geometric
transformation is to transform a 2D search problem into a 1D problem. Figure 2.8
illustrates a possible approach applied to the detection of the brain in MR images.
In this case, an approximate contour of the brain can be obtained either by manual delineation or by applying a series of image processing operators [30]. Lines
perpendicular to this first approximation are computed (second panel) and the transformed image (third panel) is created row by row by interpolating intensity values
in the original image along each perpendicular line. The optimum path is computed
in the transformed image, and mapped back to the original image (fourth panel).
To create a closed contour, the first row of the transformed matrix is copied to the
bottom. The last point in the path is then forced to be the same as the first one.
If the starting node is not known, each of the voxels in the first row is chosen as
a starting point and a closed contour is computed for each of these. The closed
Edgebased techniques 93
(a)
(b)
(c)
(d)
Figure 2.8: Example of a geometric transform: a) the original image with an approximation
of the true boundary; b) perpendicular lines computed along the original contour; c) spatially
transformed image; d) optimum contour.
94 Image Segmentation
interfaces.
2D dynamic programming has also been proposed [49, 50] for rapid semiautomatic segmentation. In this approach, called livewire, a starting point is specified and the cumulative cost and pointer matrices are computed for every pixel in
the search region. The user clicks on one pixel located on a boundary of interest
and the entire boundary between this point and the starting point is returned as being the minimum cost path between these points. Later, a bestfirst graph search
method was used to speed up the process [51].
Graph searching has also been used to detect borders in sequences of images [52] despite the fact that this technique does not extend readily from 2D contour to 3D surface detection. The problem was solved by transforming the sequence
of images into a data structure suitable for graphsearching. But the complexity of
the algorithm was such that, in practice, optimal paths could not be computed.
Heuristics were used to find a suboptimal, yet satisfactory, solution.
Another suboptimal, yet efficient, method for surface detection has been proposed by Frank [53], a good description of which can be found in [54]. The algorithm works in two passes. First, a 3D cumulative cost matrix that has the same
dimension as the original image is created. For a planelike surface (i.e., a surface
that is not closed) this is done using a method referred to as surfacegrowing. The
image is traversed in  coordinate order and partial costs are accumulated from
voxel to voxel. Once the cost matrix is created, it is traversed in reverse order. The
surface is computed by choosing, for each voxel, predecessors that have minimum
cost and meet connectivity constraints. The method has been applied, for instance,
to the detection of the midsagittal plane in MR images. This technique has also
been used for the detection of surfaces with cylindrical shape such as the arterial
lumen in ultrasound images.
A good source for a C code implementation of several of the algorithms discussed in this section as well as in subsequent sections is the book by Pitas [55].
2.4.5
Hough transforms
The Hough transform [56] permits the detection of parametric curves (e.g., circles, straight lines, ellipses, spheres, ellipsoids, or more complex shapes) in a binary
image produced by thresholding the output of an edge detector operator. Its major
strength is its ability to detect object boundaries even when lowlevel edge detector
operators produce sparse edge maps. The Hough transform will be introduced first
for the simplest case involving the detection of straight lines in a binary image. To
do so, define the parametric representation of a line in the image as 
6. In
the parameter space  6, any straight line in image space is represented by a single point. Any line that passes through a point in image space corresponds
to the line 6 
in parameter space. If two points are colinear in image
space and located on the line 
6 , their corresponding lines in parameter
space intersect at  6 This is illustrated in Figure 2.9 and it suggests a very
Edgebased techniques 95
y
b
b = ax1 + y1
y = a0x+b0
y2
b = ax2 + y2
b0
y1
x1
x2
a0
Figure 2.9: Left panel: straight line in the image space; right panel: loci of straight lines
passing through
simple procedure to detect straight lines in an image. First, discretize the parameter
space  6 and create a two dimensional accumulator array. Each dimension in
this array corresponds to one of the parameters. For every on pixel in the
binary image (i.e., every pixel that has been retained as a possible boundary pixel),
compute 6 
for every value of the discrete parameter  and increment
the value of the entry  6 in the accumulator array by one. At the end of the procedure, the count in each entry in the accumulator array *  6 corresponds to
the number of points lying on a straight line 
6 The accumulator array is
then thresholded above a predefined value to detect lines above a minimum length
and to eliminate spurious line segments.
Although convenient for explanation purposes, the parametric model used before is inadequate to represent vertical lines; a case for which  approaches infinity.
To address this problem, the normal representation of a line can be used:
3 7
7
This equation describes a line having orientation 7 at a distance 3 from the origin
as shown on the left panel of Figure 2.10. Here, a line passing through the point
in the image corresponds to a sinusoidal curve 3 7
7 in
the 3 7 parameter space. Points located on the line passing through and
in the original image, are located at the intersection of the two sinusoidal
curves 3 7
7 and 3 7
7 in the parameter space.
The right panel of Figure 2.10 shows three sinusoidal curves corresponding to three
points , , and
located on the line shown on the left panel of this figure.
The intersection of the sinusoidal curves is located at 7 3 . The angle
7 ranges
to measured from the axis. The parameter 3 varies
from
to
with and the dimensions of the image.
from
Negative values of 3 correspond to lines with a negative intercept while positive
values of 3 correspond to lines with a positive intercept.
The Hough transform can be used for the detection of more complex paramet
96 Image Segmentation
y
50
P3
r
P2
P1
50
Figure 2.10: Left panel: straight line in the image space; right panel: loci of straight lines
passing through , , and in the parameter space.

6 3
(2.13)
needs three parameters: the radius 3 and the coordinates  6 of its center (a sphere
would require four parameters). Figure 2.11 illustrates how the Hough transform
works in this case. In both panels, the dotted line is the circle with radius 3 and
centered at  6 to be detected in the image. For a fixed radius, and for an edge
pixel , the locus of points in parameter space is a circle centered at ,
i.e.,

3 7
3 7
(2.14)
The left panel in the figure shows the locus of points (solid lines) in parameter
space for a number of edge pixels when the radius 3 is chosen smaller than the true
radius 3 . The right panel shows the locus of points in parameter space for four
edge pixels when the radius 3 is equal to 3 In this case all the circles intersect
at  6 The accumulator cell at  6 3 will thus be larger than any other
accumulator cell in the array.
The amount of computation required to build the accumulator array can be
greatly reduced if the direction of the edge can also be obtained from the edge
detector operator [55, 57]. Suppose, again, the problem of detecting a circle in the
image. If the edge direction of an edge pixel is known, this edge pixel can only be
part of one of the two circles of a given radius that are tangent to the edge. For a
Edgebased techniques 97
b0
b0
a0
a0
Figure 2.11: Left panel: loci of parameters a and b for a fixed radius smaller than ; Right
given 3, these correspond to only two points in parameter space. This would require
incrementing only two cells for each radius, i.e, limit the value of 7 in Eq. (2.14)
to two values
) with ) the direction of the edge. In practice, however,
the accurate estimation of an edge direction is difficult and several accumulator
arrays are incremented. This is done by allowing ) to vary within an interval the
width of which is dictated by the reliability of the edge direction estimator. The
Hough transform can be generalized further for any curve with a known parametric
expression using the same procedure. It should be noted, however, that the rapid
increase in the size of the accumulator arrays limits its use to curves with only few
parameters.
Often, the parametric representation of a shape of interest is not known. In this
case, the generalized Hough transform [57,58] can be used. This technique builds a
parametric representation of a structure boundary from examples and it permits the
detection of this boundary, possibly rotated and scaled in new images. The Hough
transform has and is being used as part of segmentation procedures in a wide variety of applications such as the detection of the longitudinal fissure in tomographic
head images [59], the registration of sequences of retinal images [60], the detection
of the left ventricle boundary in echocardiographic images [61], the classification
of parenchymal patterns in mammograms [62], or the tracking of guide wire in
the coronary arteries in xray images [63]. A good description and comparison of
different varieties of the Hough transform can be found in [64]. Several implementations of the generalized Hough transforms are compared in [65].
98 Image Segmentation
2.5
Regionbased segmentation
Regionbased techniques segment the image ' into regions & based on some
homogeneity property. This process can be formally described as follows:
'
&
for
&
&
*49 for
&
&
&
in which is a logical predicate. These equations state that the regions & &
need to cover the entire image and that two regions are disjoint sets. The predicate
captures the set of conditions that must be satisfied by every pixel in a region, usually homogeneity criteria such as average intensity value in the region, texture, or
color. The last equation states that regions & and & are different, according to the
set of rules expressed by the predicate Regionbased segmentation algorithms
fall into one the following broad categories: region growing, region splitting, and
splitandmerge.
2.5.1
Region growing
The simplest regionbased segmentation algorithms start with at least one seed
(a starting point) per region. Neighbors of the seed are visited and the neighbors
that satisfy the predicate (a simple predicate compares the intensity values of the
pixel to the average intensity value of the region) are added to the region. Pixels
that satisfy the predicate of more than one region are allocated to one of these
arbitrarily. A good seeded region algorithm is the one proposed by Adams and
Bischof [66]. Suppose there are 0 regions & & & After steps of the
algorithm, the set of all pixels that have not yet been allocated to any region and
which are neighbors of at least one of the regions that have been created is:
&
&
Regionbased segmentation 99
Algorithm 2.7: Seeded region growing (SRG) algorithm for region segmentation
1. Label seed points using a manual or automatic method.
2. Put neighbors of seed point in the SSL.
3. Remove first pixel from the top of the SSL.
4. Test the neighbors of .
If all neighbors of that are already labeled have the same label, assign
this label to , update the statistics of the corresponding region, and add the
neighbors of that are not yet labeled to the SSL according to their value.
Else, label with the boundary label.
5. If the SSL is not empty, go to 3, otherwise stop.
In this implementation, pixels that touch more than one region are labeled as
boundary pixels. This information can be used for display purposes or for contour
detection. It should be noted that this algorithm does not require parameter adjustments and that it can easily be extended to 3D. Despite its simplicity, it was found
to be robust and reliable for applications such as the extraction of the brain in 3D
MR image volumes [67].
2.5.2
Region splitting methods take the opposite approach to region growing. These
methods start from the entire image. If it does not meet homogeneity criteria, it is
split into 4 subimages (or 8 in 3D). This procedure is applied recursively on each
subimage until each and every subimage meets the uniformity criteria. When this
is done, the image can be represented as a quadtree which is a data structure in
which each parent node has four children (in 3D each parent node has eight children and the structure is called an octree). These structures can be used for efficient
storage and comparison between images [68]. The main drawback of the region
splitting approach is that the final image partition may contain adjacent regions
with identical properties. The simplest way to address this issue is to add a merging
step to the region splitting algorithm, leading to a splitandmerge approach [69].
One possibility is to first split an inhomogeneous region until homogeneous regions are created. When a homogeneous region is created, its neighboring regions
are checked and the newly created region is merged with an existing one if they
have identical properties. If the similarity criteria are met by more than one adjacent region, the new region is merged with the most similar one. This procedure
Classification 101
and left neighbors have different labels, note that these labels are equivalent and
assign one of these to the pixel. The same scheme can be used for 8connected
neighborhoods if the two upper diagonal neighbors are also examined. After the
image has been scanned, a single label is assigned to equivalent classes. This can be
done efficiently by computing the transitive closure of the binary matrix capturing
class equivalence information [23]. The transitive closure of this matrix is itself a
5 5 matrix, where 5 is the number of classes. The elements of this matrix
are one if the classes
and are equivalent and zero otherwise. The blob coloring
algorithm can be extended to 3D [70]; this is done by first labeling pixels in 2D,
and subsequently relabeling them by identifying equivalent classes along the third
dimension.
Labeling objects in an image is an important step for image interpretation and
the topic has been studied extensively. In addition to the basic methods described
here a large number of algorithms have been proposed over the years. These include, among others, approaches based on splitandmerge algorithms (see for instance [71]), parallel algorithms designed for multiprocessor machines [72] or algorithm based on the watershed transformation (see Chapter 4) [73].
2.6
Classification
Figure 2.12: Example MRI images. From top to bottom: T weighted, T weighted, and
protondensityweighted MRI acquisitions of the same subject, in, from left to right, transverse, sagittal, and coronal crosssections.
Classification 103
500
T1weighted intensity
400
# pixels
300
200
100
0
500
400
300
200
100
500
T1weighted int.
0 0
0
T2weighted int.
200
400
600
T2weighted intensity
Figure 2.13: Twodimensional histogram (left) and scatter plot (right) of a multimodal T /T weighted MRI scan. A (linear) decision boundary that roughly discriminates between
brain parenchyma and CSF is also shown.
Figure 2.14: 3feature, 4class classification of the example image volumes shown in Figure 2.12. The four classes are background (black), white matter (white), gray matter (light
gray), and CSF (dark gray). The image set was classified using a supervised artificial neural
network classifier.
Parallelepiped
The parallelepiped classifier [80, 82] is essentially a multidimensional thresholding technique: the user specifies lower and upper bounds on each feature value,
for each class. This is usually done based on the class mean and variance in each
dimension, parameters which can be estimated from the sampling set provided by
the user. In this case, given the estimated means and standard deviations for
class
and modality , data vector is assigned to class " , if
.
(2.15)
.
(2.16)
The class mean vectors are calculated from the set of data samples. This classifier is essentially one pass of the means algorithm (see below), where the initial
Classification 105
Figure 2.15: Parallelepiped classifier. The left panel shows all data points, including the
ones labeled by the user (indicated by a dot). The right panel shows the cluster centroids
and the decision regions (based on the sampled points) are shown. Four points are left
unclassified.
cluster centroids are determined from a set of usersupplied samples. Figure 2.16
shows the minimum distance class assignments of the data points shown in Figure 2.15.
One should be aware that the use of an Euclidean distance measure favors
hyperspherical clusters of approximately the same size, conditions that may not
be satisfied in practical situations. Without loss of generality, the Euclidean distance can be replaced with another distance measure, such as the Mahalanobis distance [76, 78] which favors a hyperellipsoid cluster geometry:
,
(2.17)
where is the covariance matrix of the multivariate normal density function, estimated from the data points.
$ means and ISODATA
Common clustering methods are formulated around a criterion function or objective function that is used to express the quality of the clusters. A variety of criterion functions can be designed to fit a particular problem, but a popular function
is the sumofsquarederror criterion # [75], also denoted # [76] and [78].
# 8
: ,
(2.18)
Figure 2.16: Minimum distance classifier. Left: data points and the userlabeled samples;
right: membership assignments based on the calculated cluster centroids.
where and 0 are the number of clusters and data points, respectively, and the
parameters 8 , , : , and , are described below. The vector is the cluster
centroid for cluster :
8 ,
, calculated from:
The matrix 8
:
:
(2.19)
:
into
:
:
: . 0
.
..
.
..
.
..
.
..
..
.
(2.20)
(2.21)
Classification 107
where the columns correspond to the 0 data vectors , and each row corresponds
to one of the clusters : . , is a distance, or similarity, measure between and
. The generalization of 8 for fuzzy membership functions is described in the
section about the fuzzy means algorithm below.
Using the Euclidean distance as a distance measure, we have:
,
(2.22)
(2.23)
Here again, as for the minimum distance classifier, it should be noted that the Euclidean distance measure, which favors a hyperspherical cluster geometry, can be
replaced by a Mahalanobis distance measure favoring hyperellipsoidal clusters.
A common way to minimize # is the means, also called hard means [75],
clustering algorithm. This is an iterative approach, in which the cluster centroids
are updated at each iteration, starting from an initial estimate of the cluster centers:
Weaknesses of this method are its sensitivity to the locations of the initial cluster
centroids, and to the choice of a distance measure. The initial cluster centroids
are important because the means algorithm is not guaranteed to find a global
minimum of # . Instead, it tends to find a local minimum that is close to the initial
cluster centroids.
ISODATA, essentially an adaptation of means, differs from the means algorithm in that the number of clusters is not fixed: clusters are split or merged
if certain conditions are met. These conditions are based on distance, number of
patterns in a cluster, and withincluster and betweencluster variance measures.
:
:
.
: . 0
(2.24)
:
,
(2.25)
"
"
" /
"
(2.26)
where " is the conditional probability density function of given class " ,
and " is the a priori probability for class " . In this context it is equivalent to
take the logarithm of " " on both sides of Equation 2.26, resulting in the
decision functions
,
"
"
(2.27)
The Bayes classifier is often used when it is reasonable to assume that the conditional probability density functions " are multivariate Gaussian. In this case
Equation 2.27 can be rewritten as:
,
"
with and the mean and covariance matrices for class
, respectively. Since
the multivariate Gaussian probability density function is described completely by
Classification 109
its mean vector and covariance matrix, it is often referred to as a parametric density
function. All that is required to perform the classification is to estimate the mean
vector and covariance matrix for each class using the following equations:
(2.28)
(2.29)
where 0 is the number of training samples. A pattern is assigned to the class for
which the value of the decision function is the largest. If a priori statistical information is available on the mean and covariance matrices (it could, for instance,
be known that the mean vectors are themselves distributed normally with known
mean and covariance matrices), iterative procedures can be designed to refine these
estimates using the training data a procedure known as Bayesian learning [76].
Parzen window, nearest neighbor
Rather than estimating parametric density functions, the Parzen window and
nearest neighbor methods are used to estimate nonparametric density functions
from the data [7678]. Formulated as a classification technique, the Parzen window
method can be used to directly estimate the a posteriori probabilities " from
the given samples. In effect, is labeled with " if a majority of samples in a
hypercubic volume ; centered around has been labeled " . In this approach,
the volume ; is often chosen as a specific function of the total number of samples
0, such as ; 0 .
The nearest neighbor classifier is very similar to the Parzen window classifier,
only in this case the majority vote for the class labeling of is taken over the
nearest neighbors (according to a predefined, usually Euclidean, distance measure)
of in the sampling set, rather than in a fixed volume of the feature space centered
around . See [76] for details.
The advantage of these nonparametric classifiers over parametric approaches
such as the Bayes classifier, is that no assumptions about the shape of the class
probability density functions are made. The disadvantage is that the error rate of
the classifier strongly depends on the number of data samples that are provided.
The more samples that are available, the lower the classification error rate will be.
2.6.2
Pham and Prince [29] have developed an adaptive fuzzy means (AFCM) algorithm that incorporates estimation of the INU field. In order to achieve this, the
to
# 8
:
<
<
Where
:
=
(2.30)
=
=
(2.31)
is the unknown INU field to be estimated, & is the number of spatial dimensions
in the images, and = is a known finite difference operator along the 3th image
dimension. The notation =
refers to the convolution of
with difference
kernel =, which effectively acts as a derivative operation. Note that for simplicity
reasons the INU field is assumed to be scalar, i.e. the same for each image feature.
The last two terms in Equation 2.31 are the first and second order regularization
terms that force the bias field
to be spatially smooth and slowly varying. With
< < , i.e., without these regularization terms, one could always find a
bias field that results in # . When < and < are set sufficiently large,
Decision trees
Decision tree classifiers, originating from the field of artificial intelligence, represent classification rules in the form of a symbolic decision tree, where each node
describes a test on a feature (attribute) value and the branches leaving that node
represent the possible outcomes of the test. The leaves at the bottom of the tree
correspond to the various classes. A decision tree is a supervised classifier; the tree
is constructed based on (induced from) a set of known data samples. The advantage of a decision tree classifier over other types of classifiers is that it results in a
description of the classification process as a set of rules, which are relatively easy
to interpret and may provide more insight in the data structure of the data set. The
description of decision trees given here is based on the ID3 algorithm [83, 84], a
Classification 111
member of the TDIDT (TopDown Induction of Decision Trees) family. Although
a variety of other decision tree algorithms exist, the discussion of ID3 is illustrative
because many algorithms are based on it, and because it has been used in medical
imaging applications [85].
Initially, the tree starts as a single node, containing all training samples. Then,
the tree is iteratively grown by adding branches at every step, until all training
samples contained in each node belong to the same class and no more branches can
be grown. Central to this algorithm is an entropy measure, used to select the most
discriminating feature to partition the data samples at a given node. The entropy
measure used in the ID3 algorithm is defined as [83]:
Entropy
9
(2.32)
# samples in branch
# samples at the parent node
(2.33)
9
(2.34)
where is the probability of class in the branch, estimated from the number
of class training samples in the branch. If a node of the tree contains samples
from more than one class, that node must be expanded into a subtree. To do this,
all possible expansions (tests on a feature) of the node into branches are examined
and their entropy values calculated. Then, the test on the feature yielding the least
entropy in the resulting partitioning of the data samples is associated with that node
and used to create new branches.
2.6.4
(1)
w11
(0)
o1 = x1
o1
f()
(2)
w11
(1)
(2)
w21
w21
(1)
(0)
2
o = x2
w12
(l)
f()
o1
f()
o2
f()
on
(2)
(1)
w12
(1)
w22
o2
f()
(l)
(2)
w1,n1
(1)
w1,n1
(1)
(0)
on1 = xn
on1
f()
(0)
on = 1
f()
(l)
(1)
on = 1
dummy
Figure 2.17: Feedforward ANN topology. Note: the number of nodes differs between
layers, and between and . The subscript for indicating the layer number is omitted
2
(2.35)
where the superscript in parentheses indicates the layer number, and the operator
represents the activation function
, applied to all elements of the vector
..
.
..
.
(2.36)
Classification 113
and weight matrix
..
.
..
.
..
..
.
(2.37)
where the varying number of nodes in each layer is indicated by the subscript for 0.
The dummy nodes and the corresponding constant inputs ( ) serve as bias values;
their use originates in the number of coefficients (weights) that is needed to describe
the discriminant hyperplanes [102]. It should be noted that for each hidden layer,
should be set to in the forward pass (2.35) through the network.
The objective is to determine the weight values in matrices such that
an input vector that belongs to class " results in a high value of output node
. An algorithm known as the generalized delta rule is usually used to train the
network (i.e., adjust the weight values) by error backpropagation based on a set of
samples for each class. This algorithm implements a gradient descent technique,
in which weight values are changed in order to minimize the square error between
the output vector and a target output vector , representing the known class
label of the training input pattern . During the training phase, the network cycles
through the training set until a stopping criterion is met. This stopping criterion
usually depends on the mean square error between the output and target vectors,
taken over the entire training set. There are different factors that will affect the
convergence of the network, such as when the weight values are changed, in which
order the training patterns are presented to the net, or whether a momentum factor
is used. These, and other practical considerations, are discussed in detail in [102].
Assuming that the weights are adjusted after each training pattern, the output
error vector is propagated back through the network, adjusting the weights in the
following manner:
>
2
(2.38)
2
(2.39)
(2.40)
(2.41)
In the following, the feedforward ANN architecture trained with the generalized
delta rule will be denoted backpropagation ANN (BPANN). As mentioned already, the number of hidden nodes in the BPANN is related to the complexity of
discriminant functions that the network implements. In practical applications, this
number is determined in an empirical fashion.
A modification of the BPANN that exhibits a dynamical structure is called the
cascadecorrelation ANN [103]. This network is initially trained without any hidden nodes while monitoring the error at the output nodes. If this error is sufficiently
low, the training is halted; otherwise a new node, fully connected to all other nodes
in the network, is added and the procedure is repeated.
Kohonen ANN
Clearly, both the backpropagation and cascadecorrelation networks are supervised, i.e., the desired behavior is learned by example. Another type of network,
known as the Kohonen ANN, operates in an unsupervised fashion. This network
is a singlelayer feedforward network, trained with the socalled winnertakeall
learning rule, that essentially performs an unsupervised clustering in the feature
space. As with the BPANN, each output node corresponds to an output class. The
first step in the training phase is to normalize the weight vectors:
(2.42)
where is the number of classes (i.e., the number of nodes in the output layer), and
the weight vectors
are the rows of the weight matrix :
..
.
(2.43)
Now the winnertakeall learning rule dictates that only the weight vector
is
updated that most closely approximates input vector . That means that only the
weight vector
is updated for which
(2.44)
When the neuron with the highest output is identified, its weight vector
is
modified (in the direction of the gradient in weight space) using:
>
(2.45)
Classification 115
Hopfield ANN
Another type of ANN, the Hopfield ANN, has a singlelayer feedback architecture as illustrated in Figure 2.18. This type of network is a dynamical system which
o1
w21
i1
w12
f()
o1
i2
f()
o2
in
f()
on
wn1
o2
wn2
w1n
on
1
Figure 2.18: Hopfield ANN topology (modified from [102]).
for = 1, 2, ...
(2.46)
(2.47)
2.6.5
(2.48)
Contextual classifiers
The classification techniques described so far typically label each pixel or voxel
individually, without taking the spatial relationships between neighbors into account. As a result, the classified images are often noisy. Classification noise can
be reduced by using spatial information in the approach. This can be done either
retrospectively, for instance by applying morphological filtering operations to the
classified image, or by incorporating it in the design of the classifier. This type
of classifier is referred to as a contextual classifier. These are usually formulated
as an optimization procedure, in which a set of constraints must be satisfied or a
cost function must be minimized. Relaxation labeling and stochastic relaxation are
examples of this type of approach.
Relaxation labeling (RL) assigns labels to objects under a set of constraints,
typically describing the interaction between neighboring objects. The term relaxation labeling was first introduced by Rosenfeld, Hummel and Zucker [104].
Later, the theory was further developed by Hummel and Zucker [105], who gave a
formal description of RL and proposed an algorithm to solve the RL problem. The
following summary of RL is based on their theory.
A labeling problem is based on:
1. A set of objects,
2.
0;
A set of labels for each object
,
<
;
Classification 117
A solution to the labeling problem is a label assignment for all objects that is
consistent with the given labeling constraints. This notion of consistency is important in RL, and it is described by the compatibility functions 3 < < , capturing
the relative support for label < at object
given that object has a label < . Generally, the magnitude of 3 < < is proportional to the strength of the constraint,
whereas the sign indicates locally consistent (3 / ) or inconsistent (3 . )
labelings. The compatibility 3 is zero when objects
and are not neighbors, or
when there is no interaction between labels. Labels are also assigned to objects
independently of their neighbors using weights satisfying the following properties:
<
<
(2.49)
Note that these properties are essentially identical to the fuzzy membership functions used for the FCM classifier (Equation 2.24). Using these fuzzy label assignments, the complete labeling of object
can be represented by the dimensional
vector
. Collecting these assignment vectors for all
objects, the complete labeling of all objects can be described by the 0dimensional
vector formed by the concatenation
. This vector is equivalent to the
assignment matrix 8 described for the FCM classifier.
is the space of weighted labeling assignments, containing all
under
the constraints (2.49). The support 1 < for label < at object
by the assignment
is now defined as
1 < 1 <
(2.50)
<1 <
!
<
(2.51)
is consistent if
!
(2.52)
Relaxation labeling algorithms are designed to convert a given, initial labeling into
a consistent one by solving the variational inequality (2.52). There are different
ways to solve such an inequality. Together with their theory summarized here,
Hummel and Zucker [105] present an algorithm to solve (2.52), a method which is
2
(2.53)
/
3
3
).
(2.54)
of
from the observations
Bayes theory dictates that the MAP estimation
maximizes
2
(2.55)
This is a global maximization, which is computationally extremely expensive. Simulated annealing [107, 110], which models the behavior of a physical system when
cooling down, has been used to find good local solutions; another technique used
for this purpose is a modification of simulated annealing called mean field annealing [111].
(2.56)
Acknowledgements
Parts of this chapter have been adapted from an article published in Critical
Reviews in Biomedical Engineering 22(5/6):401465 (1994). The authors thank
Begell House for their permission. The authors also thank Milan Sonka for his
bibliography file, which is a very precious resource.
2.9
References
References 121
[12] D. A. G. Wicks, G. J. Barker, and P. S. Tofts, Correction of intensity nonuniformity
in MR images of any orientation, Magnetic Resonance Imaging, vol. 11, no. 2,
pp. 183196, 1993.
[13] C. BrechBuhler, G. Gerig, and G. Szekely, Compensation of spatial inhomogeneity
in MRI based on a parametric bias estimate, in Proceedings of the Fourth International Conference on Visualization in Biomedical Computing (VBC) (K.H. Hohne
and R. Kikinis, eds.), (Hamburg, Germany), pp. 141146, Springer, 1996.
[14] W. W. Brey and P. A. Narayana, Correction for intensity falloff in surface coil
magnetic resonance imaging, Medical Physics, vol. 15, pp. 241245, Mar./Apr.
1988.
[15] B. M. Dawant, A. P. Zijdenbos, and R. A. Margolin, Correction of intensity variations in MR images for computeraided tissue classification, IEEE Transactions on
Medical Imaging, vol. 12, pp. 770781, Dec. 1993.
[16] J. Haselgrove and M. Prammer, An algorithm for compensation of surfacecoil images for sensitivity of the surface coil, Magnetic Resonance Imaging, vol. 4, no. 6,
pp. 469472, 1986.
[17] K. O. Lim and A. Pfefferbaum, Segmentation of MR brain images into cerebrospinal fluid spaces, white and gray matter, Journal of Computer Assisted Tomography, vol. 13, pp. 588593, July/Aug. 1989.
[18] C. R. Meyer, P. H. Bland, and J. Pipe, Retrospective correction of intensity inhomogeneities in MRI, IEEE Transactions on Medical Imaging, vol. 14, pp. 3641,
Mar. 1995.
[19] P. A. Narayana and A. Borthakur, Effect of radio frequency inhomogeneity correction on the reproducibility of intracranial volumes using MR image data, Magnetic
Resonance in Medicine, vol. 33, pp. 396400, Mar. 1995.
[20] J. G. Sled, A. P. Zijdenbos, and A. C. Evans, A nonparametric method for automatic
correction of intensity nonuniformity in MRI data, IEEE Transactions on Medical
Imaging, vol. 17, Feb. 1998.
[21] W. M. Wells III, W. E. L. Grimson, R. Kikinis, and F. A. Jolesz, Adaptive segmentation of MRI data, IEEE Transactions on Medical Imaging, vol. 15, pp. 429442,
Aug. 1996.
[22] A. P. Zijdenbos, B. M. Dawant, and R. A. Margolin, Inter and intraslice intensity correction in MRI, in Information Processing in Medical Imaging (IPMI)
(Y. Bizais, C. Barillot, and R. D. Paola, eds.), (France), pp. 349350, Kluwer, June
1995.
[23] R. C. Gonzalez and R. E. Woods, Digital Image Processing.
AddisonWesley, 1993.
Reading, MA:
[24] S.P. Liou, A. H. Chiu, and R. C. Jain, A parallel technique for signallevel perceptual organization, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 13, pp. 317325, Apr. 1991.
[25] S.P. Liou and R. C. Jain, An approach to threedimensional image segmentation,
Computer Vision, Graphics, and Image Processing, vol. 53, no. 3, pp. 237252,
1991.
References 123
[42] N. J. Nilsson, Principles of Artificial Intelligence. Berlin: Springer Verlag, 1982.
[43] P. H. Winston, Artificial Intelligence. Reading, MA: AddisonWesley, 3rd ed., 1992.
[44] P. Hart, N. Nilsson, and B. Raphael, A formal basis for the heuristic determination
of minimumcost paths, IEEE Transactions on Systems, Man, and Cybernetics,
vol. SMC4, pp. 100107, 1968.
[45] R. Bellmann, Dynamic Programming. Princeton, NJ: Princeton University Press,
1957.
[46] J. J. Gerbrands, Segmentation of noisy images. Ph.D. thesis, ETN8995461, Technische University, Delft, The Netherlands, 1988.
[47] M. Sonka, M. D. Winniford, and S. M. Collins, Robust simultaneous detection
of coronary borders in complex images, IEEE Transactions on Medical Imaging,
vol. 14, no. 1, pp. 151161, 1995.
[48] D. Geiger, A. Gupta, L. A. Costa, and J. Vlontzos, Dynamic programming for detecting, tracking, and matching deformable contours, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 17, no. 3, pp. 294302, 1995.
[49] A. Falcao, J. Udupa, S. Samarasekera, S. Sharma, B. Hirsch, and R. de a Lotufo,
Usersteered image segmentation paradigms: live wire and live lane, Graphical
Models and Image Processing, vol. 60, no. 4, pp. 233260, 1998.
[50] E. Mortensen, B. Morse, W. Barrett, and J. Udupa, Adaptive boundary detection
using livewire twodimensional dynamic programming, in Computers in Cardiology, (Los Alamitos, CA), pp. 635638, IEEE Computer Society Press, 1992.
[51] W. A. Barret and E. N. Mortensen, Interactive livewire boundary detection, Medical Image Analysis, vol. 1, no. 4, pp. 331341, 1996.
[52] D. R. Thedens, D. J. Skorton, and S. R. Fleagle, Methods of graph searching for
border detection in image sequences with application to cardiac magnetic resonance
imaging, IEEE Transactions on Medical Imaging, vol. 14, pp. 4255, 1995.
[53] R. J. Frank, Optimal surface detection using multidimensional graph search: Applications to intravascular ultrasound, Masters thesis, University of Iowa, 1996.
[54] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision. New York: PWS publishing, 1998.
[55] I. Pitas, Digital Image Processing Algorithms. Hemel Hempstead, UK: PrenticeHall, 1993.
[56] P. V. C. Hough, A Method and Means for Recognizing Complex Patterns. US Patent
3,069,654, 1962.
[57] D. H. Ballard and C. M. Brown, Computer Vision. Englewood Cliffs, NJ: PrenticeHall, 1982.
[58] D. H. Ballard, Generalizing the Hough transform to detect arbitrary shapes, Pattern Recognition, vol. 13, pp. 111122, 1981.
[59] M. E. Brummer, Hough transform detection of the longitudinal fissure in tomographic head images, IEEE Transactions on Medical Imaging, vol. 10, no. 1,
pp. 7481, 1991.
References 125
[75] L. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithm. New
York: Plenum Press, 1981.
[76] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York:
John Wiley and Sons, 1973.
[77] K. Fukunaga, Introduction to statistical pattern recognition. New York: Academic
Press, 1972.
[78] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. Englewood Cliffs,
New Jersey: Prentice Hall, Inc., 1988.
[79] M. James, Classification Algorithms. New York: John Wiley, 1985.
[80] I. L. Thomas, V. M. Benning, and N. P. Ching, Classification of Remotely Sensed
Images. Bristol: Adam Hilger, 1987.
[81] T. Y. Young and K.S. Fu, eds., Handbook of Pattern Recognition and Image Processing. Academic Press, 1986.
[82] J. A. Richards, Remote Sensing Digital Image Analysis. Berlin: SpringerVerlag,
1986.
[83] A. R. Mirzai, ed., Artificial Intelligence: Concepts and applications in engineering.
Cambridge, MA: MIT Press, 1990.
[84] J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, pp. 81106,
1986.
[85] M. Kamber, R. Shinghal, D. L. Collins, G. S. Francis, and A. C. Evans, Modelbased 3D segmentation of multiple sclerosis lesions in magnetic resonance brain
images, IEEE Transactions in Medical Imaging, vol. 14, pp. 442453, Sept. 1995.
[86] S. Aleynikov and E. MicheliTzanakou, Classification of retinal damage by a neural
network based system, Journal of Medical Systems, vol. 22, pp. 12936, Jun 1998.
[87] S. C. Amartur, D. Piraino, and Y. Takefuji, Optimization neural networks for the
segmentation of magnetic resonance images, IEEE Transactions on Medical Imaging, vol. 11, pp. 215220, June 1992.
[88] M. Binder, H. Kittler, A. Seeber, A. Steiner, H. Pehamberger, and K. Wolff, Epiluminescence microscopybased classification of pigmented skin lesions using computerized image analysis and an artificial neural network, Melanoma Research,
vol. 8, pp. 2616, Jun 1998.
[89] S. Cagnoni, G. Coppini, M. Rucci, D. Caramella, and G. Valli, Neural network segmentation of magnetic resonance spin echo images of the brain, Journal of Biomedical Engineering, vol. 15, pp. 35562, Sep 1993.
[90] M. S. Gebbinck, J. T. Verhoeven, J. M. Thijssen, and T. E. Schouten, Application of neural networks for the classification of diffuse liver disease by quantitative
echography, Ultrasonic Imaging, vol. 15, pp. 20517, Jul 1993.
[91] L. O. Hall, A. M. Bensaid, L. P. Clarke, R. P. Velthuizen, M. S. Silbiger, and J. C.
Bezdek, A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain, IEEE Transactions on Neural
Networks, vol. 3, pp. 672682, Sept. 1992.
[93] M. Ozkan,
B. M. Dawant, and R. J. Maciunas, Neuralnetworkbased segmentation of multimodal medical images: A comparative and prospective study, IEEE
Transactions on Medical Imaging, vol. 12, pp. 534544, Sept. 1993.
[94] D. Pantazopoulos, P. Karakitsos, A. IokimLiossi, A. Pouliakis, E. BotsoliStergiou,
and C. Dimopoulos, Back propagation neural network in the discrimination of
benign from malignant lower urinary tract lesions, Journal of Urology, vol. 159,
pp. 161923, May 1998.
[95] W. E. Polakowski, D. A. Cournoyer, S. K. Rogers, M. P. DeSimio, D. W. Ruck, J. W.
Hoffmeister, and R. A. Raines, Computeraided breast cancer detection and diagnosis of masses using difference of gaussians and derivativebased feature saliency,
IEEE Transactions on Medical Imaging, vol. 16, pp. 8119, Dec 1997.
[96] W. E. Reddick, J. O. Glass, E. N. Cook, T. D. Elkin, and R. J. Deaton, Automated
segmentation and classification of multispectral magnetic resonance images of brain
using artificial neural networks, IEEE Transactions on Medical Imaging, vol. 16,
pp. 9118, Dec 1997.
[97] H. Sujana, S. Swarnamani, and S. Suresh, Application of artificial neural networks
for the classification of liver lesions by image texture parameters, Ultrasound in
Medicine & Biology, vol. 22, no. 9, pp. 117781, 1996.
[98] G. D. Tourassi and C. E. Floyd Jr., Lesion size quantification in spect using an artificial neural network classification approach, Computers & Biomedical Research,
vol. 28, pp. 25770, Jun 1995.
[99] O. Tsujii, M. T. Freedman, and S. K. Mun, Automated segmentation of anatomic
regions in chest radiographs using an adaptivesized hybrid neural network, Medical Physics, vol. 25, pp. 9981007, Jun 1998.
[100] A. J. Worth, S. Lehar, and D. N. Kennedy, A recurrent cooperative/competitive
field for segmentation of magnetic resonance brain images, IEEE Transactions on
Knowledge and Data Engineering, vol. 4, pp. 156161, Apr. 1992.
[101] A. P. Zijdenbos, B. M. Dawant, R. A. Margolin, and A. C. Palmer, Morphometric analysis of white matter lesions in MR images: Method and validation, IEEE
Transactions on Medical Imaging, vol. 13, pp. 716724, Dec. 1994.
[102] J. M. Zurada, Introduction to Artificial Neural Systems. St. Paul, MN: West Publishing Company, 1992.
[103] S. E. Fahlman and C. Lebeire, The cascadecorrelation learning architecture, tech.
rep., School of Computer Science, Carnegie Mellon University, Feb. 1990.
[104] A. Rosenfeld, R. A. Hummel, and S. W. Zucker, Scene labeling by relaxation operations, IEEE Transactions on Systems, Man, and Cybernetics, vol. 6, pp. 420433,
June 1976.
[105] R. A. Hummel and S. W. Zucker, On the foundation of relaxation labeling proceses, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, no. 3,
pp. 259288, 1983.
References 127
[106] S. Peleg, A new probabilistic relaxation scheme, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 2, pp. 362369, July 1980.
[107] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721741, 1984.
[108] J. Besag, On the statistical analysis of dirty pictures, Journal of the Royal Statistical Society, Series B, vol. 48, no. 3, pp. 259302, 1986.
[109] J. Besag, Spatial interaction and the statistical analysis of lattice systems, Journal
of the Royal Statistical Society, Series B, vol. 36, no. 2, pp. 192236, 1974.
[110] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimization by simulated annealing, Science, vol. 220, pp. 671680, 1983.
[111] G. Bilbro, R. Mann, T. K. Miller, W. E. Snyder, D. E. Van den Bout, and M. White,
Optimization by mean field annealing, in Advances in Neural Information Processing Systems (D. S. Touretzky, ed.), vol. I, (San Mateo), MorganKaufmann,
1989.
[112] R. C. Dubes, A. K. Jain, S. G. Nadabar, and C. C. Chen, MRF modelbased algorithms for image segmentation, in Proceedings of the 10th International Conference
on Pattern Recognition, vol. 1, pp. 808814, 1990.
[113] H. S. Choi, D. R. Haynor, and Y. Kim, Partial volume tissue classification of multichannel magnetic resonance images  a mixel model, IEEE Transactions on Medical Imaging, vol. 10, pp. 395407, Sept. 1991.
[114] Z. Kato, J. Zerubia, and M. Berthod, Unsupervised parallel image classification
using markovian models, Pattern Recognition, vol. 32, pp. 591604, Apr. 1999.
[115] J. C. Rajapakse, J. N. Giedd, and J. L. Rapoport, Statistical approach to segmentation of singlechannel cerebral MR images, IEEE Transactions on Medical Imaging, vol. 16, pp. 176186, Apr. 1997.
[116] Z. Wu, H. W. Chung, and F. W. Wehrli, A bayesian approach to subvoxel tissue
classification in nmr microscopic images of trabecular bone, Magnetic Resonance
in Medicine, vol. 31, pp. 3028, Mar 1994.
[117] M. X. H. Yan and J. S. Karp, An adaptive bayesian approach to threedimensional
MR brain segmentation, in Information Processing in Medical Imaging (IPMI)
(Y. Bizais, C. Barillot, and R. D. Paola, eds.), pp. 201213, Kluwer, June 1995.
CHAPTER 3
Image Segmentation Using Deformable Models
Chenyang Xu
The Johns Hopkins University
Dzung L. Pham
National Institute of Aging
Jerry L. Prince
The Johns Hopkins University
Contents
3.1
3.2
3.3
3.4
3.5
Introduction
Parametric deformable models
3.2.1 Energy minimizing formulation
3.2.2 Dynamic force formulation
3.2.3 External forces
3.2.4 Numerical implementation
3.2.5 Discussion
Geometric deformable models
3.3.1 Curve evolution theory
3.3.2 Level set method
3.3.3 Speed functions
3.3.4 Relationship to parametric deformable models
3.3.5 Numerical implementation
3.3.6 Discussion
Extensions of deformable models
3.4.1 Deformable Fourier models
3.4.2 Deformable models using modal analysis
3.4.3 Deformable superquadrics
3.4.4 Active shape models
3.4.5 Other models
Conclusion and future directions
129
131
133
134
136
138
144
145
146
146
147
150
152
153
154
154
155
157
159
161
167
167
Further reading
Acknowledgments
168
168
3.8
References
168
Introduction 131
3.1
Introduction
In the past four decades, computerized image segmentation has played an increasingly important role in medical imaging. Segmented images are now used
routinely in a multitude of different applications, such as the quantification of tissue
volumes [1], diagnosis [2], localization of pathology [3], study of anatomical structure [4, 5], treatment planning [6], partial volume correction of functional imaging
data [7], and computerintegrated surgery [8, 9]. Image segmentation remains a
difficult task, however, due to both the tremendous variability of object shapes and
the variation in image quality (see Fig. 3.1). In particular, medical images are often
corrupted by noise and sampling artifacts, which can cause considerable difficulties when applying classical segmentation techniques such as edge detection and
thresholding. As a result, these techniques either fail completely or require some
kind of postprocessing step to remove invalid object boundaries in the segmentation
results.
To address these difficulties, deformable models have been extensively studied and widely used in medical image segmentation, with promising results. Deformable models are curves or surfaces defined within an image domain that can
move under the influence of internal forces, which are defined within the curve or
surface itself, and external forces, which are computed from the image data. The
internal forces are designed to keep the model smooth during deformation. The external forces are defined to move the model toward an object boundary or other desired features within an image. By constraining extracted boundaries to be smooth
and incorporating other prior information about the object shape, deformable models offer robustness to both image noise and boundary gaps and allow integrating
boundary elements into a coherent and consistent mathematical description. Such
a boundary description can then be readily used by subsequent applications. Moreover, since deformable models are implemented on the continuum, the resulting
boundary representation can achieve subpixel accuracy, a highly desirable property for medical imaging applications. Figure 3.2 shows two examples of using
deformable models to extract object boundaries from medical images. The result is
a parametric curve in Fig. 3.2(a) and a parametric surface in Fig. 3.2(b).
Although the term deformable models first appeared in the work by Terzopoulos and his collaborators in the late eighties [1215], the idea of deforming a template for extracting image features dates back much farther, to the work of Fischler and Elschlagers springloaded templates [16] and Widrows rubber mask
technique [17]. Similar ideas have also been used in the work by Blake and Zisserman [18], Grenander et al. [19], and Miller et al. [20]. The popularity of deformable models is largely due to the seminal paper Snakes: Active Contours by
Kass, Witkin, and Terzopoulos [13]. Since its publication, deformable models have
grown to be one of the most active and successful research areas in image segmentation. Various names, such as snakes, active contours or surfaces, balloons,
and deformable contours or surfaces, have been used in the literature to refer to
(a)
(b)
Figure 3.1: Variability of object shapes and image quality. (a) A 2D MR image of the heart
left ventricle and (b) a 3D MR image of the brain.
(a)
(b)
Figure 3.2: Examples of using deformable models to extract object boundaries from medical
images. (a) An example of using a deformable contour to extract the inner wall of the left
ventricle of a human heart from a 2D MR image. The circular initial deformable contour is
plotted in gray and the final converged result is plotted in white [10]. (b) An example of using
a deformable surface to reconstruct the brain cortical surface from a 3D MR image [11].
deformable models.
There are basically two types of deformable models: parametric deformable
models (cf. [13, 2123]) and geometric deformable models (cf. [2427]). Paramet
In this section, we first describe two different types of formulations for parametric deformable models: an energy minimizing formulation and a dynamic force
formulation. Although these two formulations lead to similar results, the first formulation has the advantage that its solution satisfies a minimum principle whereas
the second formulation has the flexibility of allowing the use of more general types
of external forces. We then present several commonly used external forces that can
effectively attract deformable models toward the desired image features. A numerical implementation of 2D deformable models or deformable contours is described
at the end of this section. Since the implementation of 3D deformable models or
deformable surfaces is more sophisticated than those of deformable contours, we
provide several references in Section 3.2.4 for additional reading rather than presenting an actual implementation.
3.2.1
The basic premise of the energy minimizing formulation of deformable contours is to find a parameterized curve that minimizes the weighted sum of internal energy and potential energy. The internal energy specifies the tension or the
smoothness of the contour. The potential energy is defined over the image domain
and typically possesses local minima at the image intensity edges occurring at object boundaries (see Fig. 3.3). Minimizing the total energy yields internal forces
and potential forces. Internal forces hold the curve together (elasticity forces)
and keep it from bending too much (bending forces). External forces attract the
curve toward the desired object boundaries. To find the object boundary, parametric curves are initialized within the image domain, and are forced to move toward
the potential energy minima under the influence of both these forces.
Mathematically, a deformable contour is a curve
,
, which moves through the spatial domain of an image to minimize the following energy functional:
(3.1)
(3.2)
The firstorder derivative discourages stretching and makes the model behave like
an elastic string. The secondorder derivative discourages bending and makes the
(3.3)
(3.4)
where is a positive weighting parameter, is a twodimensional Gaussian function with standard deviation , is the gradient operator, and is the
2D image convolution operator. If the desired image features are lines, then the
appropriate potential energy function can be defined as follows:
(3.5)
(3.6)
To gain some insight about the physical behavior of deformable contours, we can
view Eq. (3.6) as a force balance equation
(3.7)
(3.8)
(3.9)
The internal force discourages stretching and bending while the potential force
pulls the contour toward the desired object boundaries. In this chapter, we
define the forces, derived from the potential energy function
given in either
Eq. (3.4) or Eq. (3.5), as Gaussian potential forces.
To find a solution to Eq. (3.6), the deformable contour is made dynamic by
as a function of time as well as
i.e.,
. The partial
treating
derivative of
with respect to is then set equal to the lefthand side of Eq. (3.6)
as follows:
(3.10)
The coefficient is introduced to make the units on the left side consistent with
stabilizes, the left side vanishes and we
the right side. When the solution
achieve a solution of Eq. (3.6). We note that this approach of making the time
derivative term vanish is equivalent to applying a gradient descent algorithm to find
the local minimum of Eq. (3.1) [34]. Thus, the minimization is solved by placing
an initial contour on the image domain and allowing it to deform according to
Eq. (3.10). Figure 3.4 shows an example of recovering the left ventricle wall using
Gaussian potential forces.
3.2.2
In the previous section, the deformable model was modeled as a static problem,
and an artificial variable was introduced to minimize the energy. It is sometimes
more convenient, however, to formulate the deformable model directly from a dynamic problem using a force formulation. Such a formulation permits the use of
more general types of external forces that are not potential forces, i.e., forces that
cannot be written as the negative gradient of potential energy functions. According
must satisfy
to Newtons second law, the dynamics of a contour
(3.11)
(a)
(b)
Figure 3.4: An example of recovering the left ventricle wall using Gaussian potential forces.
(a) Gaussian potential forces and (b) the result of applying Gaussian potential forces to a
deformable contour, with the circular initial contour shown in gray and the final deformed
contour in white.
(3.12)
The internal forces are the same as specified in Eq. (3.8). The external forces can
be either potential forces or nonpotential forces. We note, however, nonpotential
forces cannot be derived from the variational energy formulation of the previous
section. An alternate variational principle does exist (see [36]); however, it is not
physically intuitive.
External forces are often expressed as the superposition of several different
forces:
External forces
(3.13)
(3.14)
In the parametric formulation of deformable models, the normal direction is sometimes assumed
to be outward. Here we assume an inward direction for consistency with the geometric formulation
of deformable models introduced in Section 3.3.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 3.5: An example of pressure forces driven deformable contours. (a) Intensity CT
image slice of the left ventricle. (b) Edge detected image. (c) Initial deformable contour. (d)(f) Deformable contour moving toward the left ventricle boundary, driven by inflating pressure
force. Images courtesy of McInerney and Terzopoulos [23], The University of Toronto.
that at the concavity, distance potential forces point horizontally in opposite directions, thus preventing the contour from converging into the boundary concavity. To
address this problem, Xu and Prince [10, 43] employed a vector diffusion equation that diffuses the gradient of an edge map in regions distant from the boundary,
yielding a different force field called the gradient vector flow (GVF) field. The
amount of diffusion adapts according to the strength of edges to avoid distorting
object boundaries.
A GVF field is defined as the equilibrium solution to the following vector partial
differential equation:
(3.15)
(a)
(b)
(c)
Figure 3.6: An example of distance potential force field. (a) A Ushaped object, a closeup
of its (b) boundary concavity, and (c) the distance potential force field within the concavity.
boundary and can be derived using any edge detector. The definition of the GVF
field is valid for any dimension. Two examples of and are
where is a positive scalar. GVF has been shown to have a large attraction range
and improved convergence for deforming contours into boundary concavities [10,
43]. An example of using a GVF force field is shown in Fig. 3.7.
Dynamic distance force
An external force that is similar to distance potential force but does not possess the boundary concavity problem has been proposed [44, 45]. This approach
derives an external force by computing a signed distance at each point on the deformable contour or surface. This signed distance is calculated by determining the
closest boundary point or other image feature along the models normal direction.
The distance values are recomputed each time the model is deformed. Several criteria can be used to define the desired boundary point to be searched. The most
common one is to use image pixels that have a high image intensity gradient magnitude or edge points generated by an edge detector. A threshold is specified for
the maximum search distance to avoid confusion with outliers and to reduce the
computation time. The resulting force, which we refer to as the dynamic distance
(a)
(b)
Figure 3.7: An example of the gradient vector flow driven deformable contours. (a) A
gradient vector flow force field and (b) the result of applying gradient vector flow force to a
deformable contour, with the circular initial contour shown in gray and the final deformed
contour in white.
(3.16)
(3.17)
Spring forces act to pull the model toward . The further away the model is from
, the stronger the pulling force. The point is selected by finding the closest
point on the model to using a heuristic search around a local neighborhood of .
(3.18)
to
(3.19)
(a)
(b)
Figure 3.8: Example of interactive forces. (a) A CT image slice of a canine left ventricle. (b)
A deformable contour moves toward high gradients in the edge detected image, influenced
by landmark points near the center of the image and a spring force that pulls the contour
toward an edge at the bottom right. Image courtesy of McInerney and Terzopoulos [23], The
University of Toronto.
3.2.4
Numerical implementation
, we can rewrite Eq. (3.12) as
(3.20)
(3.21)
,
, and
are
matrices, and is an
pentadiagonal banded matrix with being the number of sample points.
Equation (3.21) can then be solved iteratively by matrix inversion using the following equation:
where
,
(3.22)
The inverse of the matrix can be calculated efficiently by LU decom
Discussion
LU decomposition stands for Lower and Upper triangular decomposition, a wellknown technique in linear algebra.
(3.23)
where is a coefficient determining the speed and direction of deformation. Constant deformation plays the same role as the pressure force in parametric deformable
models. The properties of curvature deformation and constant deformation are
complementary to each other. Curvature deformation removes singularities by
smoothing the curve, while constant deformation can create singularities from an
initially smooth curve.
The basic idea of the geometric deformable model is to couple the speed of
deformation (using curvature and/or constant deformation) with the image data, so
that the evolution of the curve stops at object boundaries. The evolution is implemented using the level set method. Thus, most of the research in geometric
deformable models has been focused in the design of speed functions. We review
several representative speed functions in Section 3.3.3.
3.3.2
We now review the level set method for implementing curve evolution. The
level set method is used to account for automatic topology adaptation, and it also
provides the basis for a numerical scheme that is used by geometric deformable
models. The level set method for evolving curves is due to Osher and Sethian [32,
58, 59].
In the level set method, the curve is represented implicitly as a level set of a 2D
scalar function referred to as the level set function which is usually defined
on the same domain as the image. The level set is defined as the set of points that
have the same function value. Figure 3.9 shows an example of embedding a curve
as a zero level set. It is worth noting that the level set function is different from the
level sets of images, which are sometimes used for image enhancement [60]. The
sole purpose of the level set function is to provide an implicit representation of the
evolving curve.
(a)
(b)
(c)
Figure 3.9: An example of embedding a curve as a level set. (a) A single curve. (b) The
level set function where the curve is embedded as the zero level set (in black). (c) The
height map of the level set function with its zero level set depicted in black.
Figure 3.10: From left to right, the zero level set splits into two curves while the level set
function still remains a valid function.
Instead of tracking a curve through time, the level set method evolves a curve by
updating the level set function at fixed coordinates through time. This perspective
is similar to that of an Eulerian formulation of motion as opposed to a Lagrangian
formulation, which is analogous to the parametric deformable model. A useful
property of this approach is that the level set function remains a valid function
while the embedded curve can change its topology. This situation is depicted in
Fig 3.10.
We now derive the level set embedding of the curve evolution equation (3.23).
Given a level set function ! with the contour
as its zero level set,
we have
!
Differentiating the above equation with respect to and using the chain rule, we
(3.24)
!!
(3.25)
Using this fact and Eq. (3.23), we can rewrite Eq. (3.24) as
!
!
(3.26)
! ! !
! !
!
!
!
!
! !
(3.27)
The relationship between Eq. (3.23) and Eq. (3.26) provides the basis for performing curve evolution using the level set method.
Three issues need to be considered in order to implement geometric deformable
contours:
1. An initial function ! must be constructed such that its zero level
set corresponds to the position of the initial contour. A common choice is
to set ! , where is the signed distance from each
grid point to the zero level set. The computation of the signed distance for an
arbitrary initial curve is expensive. Recently, Sethian and Malladi developed
a method called the fast marching method, which can construct the signed
distance function in " , where is the number of pixels. Certain
situations may arise, however, where the distance may be computed much
more efficiently. For example, when the zero level set can be described by
the exterior boundary of the union of a collection of disks, the signed distance
function can be computed in " as
where , # is the number of initial disks,
and
2. Since the evolution equation (3.26) is derived for the zero level set only, the
, in general, is not defined on other level sets. Hence,
speed function
we need a method to extend the speed function to all of the level sets.
Speed functions
In this section, we provide a brief overview of three examples of speed functions used by geometric deformable contours.
The geometric deformable contour formulation, proposed by Caselles et al. [24]
and Malladi et al. [25], takes the following form:
!
$ !
(3.28)
where
$
(3.29)
Positive shrinks the curve, and negative expands the curve. The curve evolution is coupled with the image data through a multiplicative stopping term $. This
scheme can work well for objects that have good contrast. However, when the object boundary is indistinct or has gaps, the geometric deformable contour may leak
out because the multiplicative term only slows down the curve near the boundary
rather than completely stopping the curve. Once the curve passes the boundary, it
will not be pulled back to recover the correct boundary.
Figure 3.11: Contour extraction of cyst form ultrasound breast image via merging multiple
initial level sets. Images courtesy of Yezzi [63], Georgia Institute of Technology.
To remedy the latter problem, Caselles et al. [26,64] and Kichenassamy et al. [63,
65] used an energy minimization formulation to design the speed function. This
leads to the following geometric deformable contour formulation:
!
$ ! $ !
(3.30)
Note that the resulting speed function has an extra stopping term $ ! that can
pull back the contour if it passes the boundary. This term behaves in similar fashion
to the Gaussian potential force in the parametric formulation. An example of using
this type of geometrical deformable contours is shown in Fig. 3.11.
The latter formulation can still generate curves that pass through boundary
gaps. Siddiqi et al. [66] partially address this problem by altering the constant
speed term through energy minimization, leading to the following geometric deformable contour:
!
% $! $ ! $
$!
(3.31)
In this case, the constant speed term in Eq. (3.30) is replaced by the second
term, and the term $ provides additional stopping power that can prevent
the geometrical contour from leaking through small boundary gaps. The second
term can be used alone as the speed function for shape recovery as well. Figure 3.12
shows an example of this deformable contour model. Although this model is robust
to small gaps, large boundary gaps can still cause problems.
Figure 3.12: Segmentation of the brain using only the second term in (3.31). Left to right
and top to bottom: iterations 1, 400, 800, 1200, and 1600. Images courtesy of Siddiqi [66],
McGill University.
In the previous section, we described three types of geometric deformable contours that behave similarly to the parametric deformable contours but have the advantage of being able to change their topology automatically. The relationship between parametric deformable contours and geometric deformable contours can be
formulated more precisely. Through an energy minimization formulation, Caselles
et al. [64] showed that the geometric deformable contour in Eq. (3.30) is equivalent to the parametric deformable contour without the rigidity term. This derivation
only permits the use of a speed function induced by a potential force, a property
shared by almost all the geometric deformable models. In this section, we derive an
explicit mathematical relationship between a dynamic force formulation of parametric deformable models and a geometric deformable model formulation, thus
permitting the use of speed functions derived from nonpotential forces, i.e., forces
that cannot be expressed as the negative gradient of potential energy functions.
(3.32)
Note that since the use of a pressure force can cause singularities dur
&
(3.33)
, and
. Here, we have divided through
where & ,
by so that both sides have units of velocity. If we let &
,
where is given by Eq. (3.25), and substitute into Eq. (3.26), we obtain the
! !
! &
(3.34)
If we allow both & and to be functions defined on the image domain, then
Eq. (3.34) generalizes Eq. (3.31) and can be used to implement almost any parametric deformable model as a geometric deformable model.
3.3.5
Numerical implementation
!
&
'
'
(3.35)
!
!
!
!
!
!
A detailed description of the principle behind this numerical method is described in [33]. We note that more efficient implementations of geometric deformable models have been developed, including the particularly noteworthy
narrowband level set method described in [25, 67].
3.3.6
Discussion
Although topological adaptation can be useful in many application, it can sometimes lead to undesirable results. Geometric deformable models may generate
shapes that have inconsistent topology with respect to the actual object, when applied to noisy images with significant boundary gaps. In these situations, the significance of ensuring a correct topology is often a necessary condition for many
subsequent applications. For example, in the brain functional study using fMRI or
PET data, it is necessary to unfold the extracted cortical surface and create a flat or
spherical map so that a user can visualize the functional activation in deep buried
cortical regions (see [68, 69]). Parametric deformable models are better suited to
these applications because of their strict control on topology.
3.4
)
$
) *
$
+,
+,
(3.36)
)
)
*
+
+
+
+,
+,
Figure 3.13: Segmenting the corpus callosum from an MR midbrain sagittal image using
a deformable Fourier model. Top left: MR image (146106). Top right: positive magnitude
of the Laplacian of the Gaussian (
).
Bottom right: final contour on the corpus callosum of the brain. Images courtesy of Staib
and Duncan [70], Yale University.
Staib and Duncan apply a Bayesian approach to incorporating prior information into their model. A prior probability function is defined by first manually or
semiautomatically delineating structures of the same class as the structure to be
extracted. Next, these structures are parameterized using the Fourier coefficients,
or using the converted parameter set based on ellipses. Mean and variance statistics
are finally computed for each of the parameters.
Assuming independence between the parameters, the multivariate Gaussian
+
(3.37)
3.4.2
Another way to restrict the mostly unstructured motion associated with the standard deformable model is to use modal analysis (Pentland and Horowitz [72], Nastar and Ayache [53]). This approach is similar to the deformable Fourier model
except that both the basis functions and the nominal values of their coefficients are
derived from a template object shape.
Deformable models based on modal analysis use the theory of finite elements
[73]. An object is assumed to be represented by a finite set of elements whose
positions are defined by the positions of nodes, which are points in dimensional
space. The node positions can be stacked into a vector , which has length , and
element interpolation characterizes the complete object shape on the continuum. If
the object moves or deforms, its new position is given by
, where
is a
vector of length representing the collection of nodal displacements.
The equation governing the objects motion can be written as a collection of
ordinary differential equations constraining the nodal displacements. This is compactly written as
where
, , and
are the mass, damping, and stiffness matrices of the system and is an dimensional vector of external forces acting on the nodes.
and are assumed to be functions of time. Derivation of
, , and
Both
are described in the literature (cf. Pentland and Horowitz [72], Terzopoulos and
Metaxas [47]).
Solution of the generalized eigenvalue problem
/
Two modified definitions for the parameter vector were also proposed in [70].
and eigenvalues / ,
.
where is the (orthogonal) matrix whose columns comprise the modes and
vector of motion coefficients. The governing equation can then be written as
is a
(3.38)
(3.39)
where is the matrix consisting of the first . columns of , and is the vector comprising the first . motion coefficients from . The governing equations
become
where
and
(3.40)
(3.41)
3.4.3
Deformable superquadrics
Another extension of deformable models that has been used for incorporating local and global shape features is the deformable superquadric, proposed by
Terzopoulos and Metaxas [47]. This is essentially a hybrid technique where a superquadric surface, which can be defined with a relatively small number of parameters, is allowed to deform locally for reconstructing the shape of an object.
Although the fitting of global and local deformations is performed simultaneously,
the global deformation is forced to account for as much of the object shape as
possible. The estimated superquadric therefore captures the global shape characteristics and can readily be used in object recognition applications, while the local
deformations capture the details of the object shape.
Terzopoulos and Metaxas consider models that are closed surfaces, denoted by
, where the parametric coordinates
' ( . This surface can be expressed
as
(3.42)
where
is a translation vector, and is a rotation matrix. The vector function
denotes the model shape irrespective of pose and can further be expressed as
(3.43)
where is a reference shape consisting of the low parameter global shape
model, and is a displacement function consisting of the local deformations.
The reference shapes in this case are superquadrics, which are an extension
of standard quadric surfaces. These surfaces have been used in a variety of applications for computer graphics and computer vision, because of their ability to
accommodate a large number of shapes with relatively few parameters. The kind
of superquadric of interest here is the superellipsoid, which can be expressed implicitly as [74]
)
0
)
(3.44)
where ) ) ) are aspect ratio parameters, and & & control the
squareness of the shape. Using a spherical coordinate reference frame, Terzopou
) 1 1
) ) 1 2
) 2
(3.45)
+ ' + , + ( +, 2 3 3, and 1
3 3 . The parameter ) controls the scale of the shape. Thus, the
where
(3.47)
where is a matrix of the basis functions and is a vector of the local deforma
(3.48)
where the first term represents damping forces controlled by the damping matrix
, the second term represents internal forces of the model controlled by the stiffness matrix , and are the external forces. As with the parametric deformable
model, the model deforms according to Eq. (3.48) until these forces reach equilibrium.
An important aspect in such a hybrid model is that the global reference shape
should account for as much of the shape to be reconstructed as possible. This is
accomplished in Eq. (3.48) by appropriately defining the stiffness matrix . In
particular, all entries of
that do not correspond to local deformations are set to
zero. This amounts to imposing no penalty on the evolution of the rotation, translation, and superquadric parameters. On the other hand, entries corresponding to
the local deformation parameters are selected such that their evolution is restricted
with respect to their magnitude and first derivative.
Active shape models (ASMs) proposed by Cootes et al. [78, 79] use a different
approach to incorporate prior shape information. Their prior models are not based
on the parameterization, but are instead based on a set of points defined at various
features in the image. In the following, we summarize how the prior model is
constructed and used to enhance the performance of a deformable model and how
the ASM paradigm can be extended to incorporate prior information on the image
intensity rather than on the shape alone.
Construction of the ASM prior model
The ASM prior model is constructed by first establishing a set of labeled point
features, or landmarks, within the class of images to be processed [see Figs. 3.14(a)
and (b)]. These points are manually selected on each of the images in the training
set4 . Once selected, the set of points for each image is aligned to one another
with respect to translation, rotation, and scaling. This is accomplished using an
iterative algorithm based on the Procrustes method [80]. This linear alignment
allows studying the object shape in a common coordinate frame, which we will
refer to as the model space of the ASM. After the alignment, there is typically still
a substantial amount of variability in the coordinates of each point. To compactly
describe this variability as a prior model, Cootes and Taylor developed the Point
Distribution Model (PDM), which we now describe.
Given aligned shapes
in the model space, where
is a dimensional vector describing the coordinates
of the points from the th shape, the mean shape, , is defined to be
4
(3.49)
See the remarks at the end of this section for recent work on automated landmark labeling.
(a)
(b)
(c)
Figure 3.14: An example of constructing Point Distribution Models. (a) An MR brain image, transaxial slice, with 114 landmark points of deep neuroanatomical structures superimposed. (b) A 114point shape model of 10 brain structures. (c) Effect of simultaneously
varying the models parameters corresponding to the first two largest eigenvalues (on a
bidimensional grid). Images courtesy of Duta and Sonka [81], The University of Iowa.
(3.50)
The eigenvectors corresponding to the largest eigenvalues of the covariance matrix describe the most significant modes of variation. Because almost all of the
variability in the model can be described using these eigenvectors, only such
eigenvectors are selected to characterize the entire variability of the training set.
Note that in general, is significantly smaller than the number of points in the
model.
Using a principal component analysis (PCA), any shape
in the training set
can be approximated by
(3.51)
and
# 4
(3.52)
(3.53)
(a)
(b)
(c)
(d)
Figure 3.15: An example of Active Shape Models. (a) An echocardiogram image. (b) The
initial position of the heart chamber boundary model. The location of the model after (c) 80
and (d) 200 iterations. Images courtesy of Cootes et al. [79], The University of Manchester.
where is the scaling factor, 4 is the rotation angle, # 4 is a linear transformation that performs scaling and rotation on , and is the center
of the model instance.
First, a global fit is performed by adjusting the pose parameters so that the generated model instance aligns best with the expected model instance . The
proper pose parameter adjustments, , 4, and , can be estimated efficiently
using a standard leastsquares approach (see [79] for details).
#
4 4
yields
(3.54)
# 4 4
(3.55)
where # 4 . Note that to derive Eq. (3.55), both Eq. (3.52)
and #
4 #
4 are used.
Having solved for , we next find the adjustments, , to the shape parame
(3.56)
(3.57)
Note that .
To summarize, an iteration step of the ASM consists of first finding a displacement of the model instance in the image space, then calculating the corresponding
adjustments to both the pose and shape parameters, and updating the parameters
accordingly. Note that in practice, weighted adjustments are usually used to update
both the pose and shape parameters [79]. When the shape parameters are updated,
their values are limited within a specified range so that the shape of the model
instance remains similar to the shapes of the training examples.
Active appearance models
A limitation of the ASM is that its prior model does not consider graylevel variation of the object instance across images. To overcome this difficulty, Edwards,
Cootes, and Taylor [8486] proposed an extension to the ASM, called active appearance models (AAM). In AAM, a new prior model is constructed using both
shape and greylevel information. Because the objects represented by AAMs are
more specific than those represented by ASMs, in many applications, AAMs can
lead to more robust results than ASMs.
We will now describe how AAMs are constructed. First, the shape difference
of each object instance is compensated by warping the instance image in such a
way that the warped instance shape matches the mean shape obtained through the
PDM procedure of the ASM. The warping step is implemented using a triangulation
(3.58)
Here, for consistency with Eq. (3.58), and
(3.59)
are used to denote the significant
modes of shape variation and the shape parameters, respectively. Thus, given any
instance image of the object of interest, its shape and graylevel pattern can be
represented compactly using the vectors and .
Because the shape and greylevel parameters may be correlated, a further PCA
is applied to these combined shape and greylevel vectors , where
is a diagonal matrix of weights to compensate the difference in units between
the shape and greylevel parameters. The PCA yields another linear model
(3.60)
where is a set of orthogonal modes, 5 and 5 are the corresponding submatrices for the shape and graylevel parameters, respectively, and is referred to as the
appearance parameters that regulate the variations of both the shape and graylevel
pattern of the model.
The final representation of the shape and graylevels in terms of is given by
(3.61)
(3.62)
Despite the fact that the number of the appearance parameters is less than the total
number of the parameters in the original graylevel vector, matching the appearance
model to an unseen image can be a timeconsuming task. In [85], Cootes, Edwards,
and Taylor proposed a fast matching algorithm that first learns a linear relationship
between matching errors and desired parameter adjustments from training examples, then uses this information to predict the parameter adjustments in the real
matching process.
Remarks
In addition to the AAM extension to the ASM, there are many other extensions.
Duta and Sonka [81] applied the ASM to segment subcortical structures from MR
Other models
Additional extensions have also been proposed to use global shape information
or prior shape information. For example, Ip and Shen [91] incorporated prior shape
information by using an affine transformation to align a shape template with the
deformable model and guide the models deformation to produce a shape consistent
with the template.
The deformable Fourier model, active shape model, and other extensions we
discussed so far are all parametric deformable models. Guo and Vemuri [92] have
proposed a framework for incorporating global shape prior information into geometric deformable models. Like the deformable superquadric, their hybrid geometric deformable model uses a combination of an underlying, low parameter,
generator shape that is allowed to evolve. Their model thus retains the advantages
of traditional geometric deformable models, such as topological adaptivity.
External forces for deformable models are typically defined from edges in the
image. Fritsch et al. [93] have developed a technique called deformable shape loci,
which uses information on the medial loci or cores of the shapes to be extracted (see
Section 14.3.11). The incorporation of cores provides greater robustness to image
disturbances such as noise and blurring than purely edgebased models. This allows
their model to be fairly robust to initialization as well as imaging artifacts. They
also employed a probabilistic prior model for important shape features as well as
for the spatial relationships between these features.
3.5
In this chapter, we have described the fundamental formulation of both parametric and geometric deformable models and shown that they can be used in recovering shape boundaries. We have also derived an explicit mathematical relationship between these two formulations that allows one to share the design of
external forces and speed functions. This may lead to new, improved deformable
models. Finally, we give a brief overview of several important extensions of deformable models that use applicationspecific prior knowledge and/or global shape
Further reading
Several current texts deal with deformable models. The book by Blake and
Yuille [95] contains an excellent collection of papers on the theory and practice
of deformable models. Application of deformable models in motion tracking is
covered in great depth in two recent books [96] by Metaxas and [97] by Blake and
Isard, respectively. The book edited by Singh, Goldgolf, and Terzopoulos [98] consists of a valuable collection of papers on deformable models and their application
in medical image analysis. The book by Sethian [33] on level set methods is a
comprehensive resource for geometric deformable models. A recent survey paper
by McInerney and Terzopoulos [99] provides an excellent source for learning the
application of deformable models in medical image analysis.
3.7
Acknowledgments
The authors would like to thank Milan Sonka, Michael Fitzpartrick, and David
Hawkes for reading and commenting upon the draft of this chapter. The work was
supported in part by an NSF Presidential Faculty Grant (MIP9350336) and an NIH
Grant (R01NS37747).
3.8
References
References 169
[4] A. J. Worth, N. Makris, V. S. Caviness, and D. N. Kennedy, Neuroanatomical segmentation in MRI: technological objectives, Intl J. Patt. Recog. Artificial Intell.,
vol. 11, pp. 11611187, 1997.
[5] C. A. Davatzikos and J. L. Prince, An active contour model for mapping the cortex,
IEEE Trans. Med. Imag., vol. 14, pp. 6580, 1995.
[6] V. S. Khoo, D. P. Dearnaley, D. J. Finnigan, A. Padhani, S. F. Tanner, and M. O.
Leach, Magnetic resonance imaging (MRI): considerations and applications in radiotheraphy treatment planning, Radiother. Oncol., vol. 42, pp. 115, 1997.
[7] H. W. MullerGartner, J. M. Links, J. L. Prince, R. N. Bryan, E. McVeigh, J. P. Leal,
C. Davatzikos, and J. J. Frost, Measurement of radiotracer concentration in brain
gray matter using positron emission tomography: MRIbased correction for partial
volume effects, J. Cereb. Blood Flow Metab., vol. 12, pp. 571583, 1992.
[8] N. Ayache, P. Cinquin, I. Cohen, L. Cohen, F. Leitner, and O. Monga, Segmentation of complex threedimensional medical objects: a challenge and a requirement
for computerassisted surgery planning and performance, in ComputerIntegrated
Surgery: Technology and Clinical Applications (R. H. Taylor, S. Lavallee, G. C. Burdea, and R. Mosges, eds.), pp. 5974, MIT Press, 1996.
[9] W. E. L. Grimson, G. J. Ettinger, T. Kapur, M. E. Leventon, W. M. Wells, et al., Utilizing segmented MRI data in imageguided surgery, Intl J. Patt. Recog. Artificial
Intell., vol. 11, pp. 13671397, 1997.
[10] C. Xu and J. L. Prince, Generalized gradient vector flow external forces for active
contours, Signal Processing An International Journal, vol. 71, no. 2, pp. 131139,
1998.
[11] C. Xu, D. L. Pham, M. E. Rettmann, D. N. Yu, and J. L. Prince, Reconstruction
of the human cerebral cortex from magnetic resonance images, IEEE Trans. Med.
Imag., vol. 18, pp. 467480, 1999.
[12] D. Terzopoulos, On matching deformable models to images. Technical Report 60,
Schlumberger Palo Alto research, 1986. Reprinted in Topical Meeting on Machine
Vision, Technical Digest Series, Vol. 12, 1987, 160167.
[13] M. Kass, A. Witkin, and D. Terzopoulos, Snakes: active contour models, Intl J.
Comp. Vis., vol. 1, no. 4, pp. 321331, 1987.
[14] D. Terzopoulos and K. Fleischer, Deformable models, The Visual Computer, vol. 4,
pp. 306331, 1988.
[15] D. Terzopoulos, A. Witkin, and M. Kass, Constraints on deformable models: recovering 3D shape and nonrigid motion, Artificial Intelligence, vol. 36, no. 1, pp. 91
123, 1988.
[16] M. A. Fischler and R. A. Elschlager, The representation and matching of pictorial
structures, IEEE Trans. on Computers, vol. 22, no. 1, pp. 6792, 1973.
[17] B. Widrow, The rubbermask technique, Pattern Recognition, vol. 5, pp. 175
211, 1973.
[18] A. Blake and A. Zisserman, Visual Reconstruction. Boston: MIT Press, 1987.
References 171
[34] I. Cohen, L. D. Cohen, and N. Ayache, Using deformable surfaces to segment 3D images and infer differential structures, CVGIP: Imag. Under., vol. 56, no. 2,
pp. 242263, 1992.
[35] R. Courant and D. Hilbert, Methods of Mathematical Physics, vol. 1. New York:
Interscience, 1953.
[36] J. L. Prince and C. Xu, Nonconservative force models in active geometry, in
Proc. IEEE Image and Multidimensional Signal Processing Workshop (IMDSP98),
pp. 139142, 1998.
[37] R. Ronfard, Regionbased strategies for active contour models, Intl J. Comp. Vis.,
vol. 13, no. 2, pp. 229251, 1994.
[38] C. S. Poon and M. Braun, Image segmentation by a deformable contour model incorporating region analysis, Phys. Med. Biol., vol. 42, pp. 18331841, 1997.
[39] H. Tek and B. B. Kimia, Volumetric segmentation of medical images by threedimensional bubbles, Comp. Vis. Imag. Under., vol. 65, pp. 246258, 1997.
[40] L. D. Cohen and I. Cohen, Finiteelement methods for active contour models and
balloons for 2D and 3D images, IEEE Trans. Patt. Anal. Mach. Intell., vol. 15,
no. 11, pp. 11311147, 1993.
[41] P. E. Danielsson, Euclidean distance mapping, Comp. Graph. Imag. Proc., vol. 14,
pp. 227248, 1980.
[42] G. Borgefors, Distance transformations in arbitrary dimensions, Comp. Vis. Graph.
Imag. Proc., vol. 27, pp. 321345, 1984.
[43] C. Xu and J. L. Prince, Snakes, shapes, and gradient vector flow, IEEE Trans. Imag.
Proc., vol. 7, no. 3, pp. 359369, 1998.
[44] H. Delingette, Simplex meshes: a general representation for 3D shape reconstruction, Tech. Rep. TR2214, INRIA, SophiaAntipolis, France, 1994.
[45] D. MacDonald, D. Avis, and A. C. Evans, Multiple surface identification and matching in magnetic resonance images, in SPIE Proc. Visualization in Biomedical Computing, vol. 2359, pp. 160169, 1994.
[46] D. J. Williams and M. Shah, A fast algorithm for active contours and curvature estimation, CVGIP: Imag. Under., vol. 55, no. 1, pp. 1426, 1992.
[47] D. Terzopoulos and D. Metaxas, Dynamic 3D models with local and global deformations: deformable superquadrics, IEEE Trans. Patt. Anal. Mach. Intell., vol. 13,
pp. 703714, 1991.
[48] A. Gupta, L. von Kurowski, A. Singh, D. Geiger, C.C. Liang, M.Y. Chiu, L. P. Adler,
M. Haacke, and D. L. Wilson, Cardiac MR image segmentation using deformable
models, in Proc. IEEE Conf. Computers in Cardiology, pp. 747750, 1993.
[49] H. Delingette, Adaptive and deformable models based on simplex meshes, in Proc.
IEEE Workshop on Motion of NonRigid and Articulated Objects, pp. 152157, 1994.
[50] S. Kumar and D. Goldgof, Automatic tracking of SPAMM grid and the estimation of
deformation parameters from cardiac MR images, IEEE Trans. Med. Imag., vol. 13,
pp. 122132, 1994.
References 173
[68] P. C. Teo, G. Sapiro, and B. A. Wandell, Creating connected representations of cortical gray matter for functional MRI visualization, IEEE Trans. Med. Imag., vol. 16,
pp. 852863, 1997.
[69] C. Xu, D. L. Pham, and J. L. Prince, Finding the brain cortex using fuzzy segmentation, isosurfaces, and deformable surface models, in Proc. Information Processing
in Medical Imaging (IPMI97), pp. 399404, 1997.
[70] L. H. Staib and J. S. Duncan, Boundary finding with parametrically deformable models, IEEE Trans. Patt. Anal. Mach. Intell., vol. 14, no. 11, pp. 10611075, 1992.
[71] K. Delibasis, P. E. Undrill, and G. G. Cameron, Designing Fourier descriptorbased
geometric models for object interpretation in medical images using genetic algorithms, Comp. Vis. Imag. Under., vol. 66, pp. 286300, 1997.
[72] A. Pentland and B. Horowitz, Recovery of nonrigid motion and structure, IEEE
Trans. Patt. Anal. Mach. Intell., vol. 13, pp. 730742, 1991.
[73] K. H. Huebner, E. A. Thornton, and T. G. Byrom, The Finite Element Method for
Engineers. New York: John Wiley & Sons, 3rd ed., 1994.
[74] E. Bardinet, L. D. Cohen, and N. Ayache, A parametric deformable model to fit
unstructured 3D data, Comp. Vis. Imag. Under., vol. 71, pp. 3954, 1998.
[75] D. Metaxas and D. Terzopoulos, Shape and nonrigid motion estimation through
physicsbased synthesis, IEEE Trans. Patt. Anal. Mach. Intell., vol. 15, pp. 580591,
1993.
[76] B. C. Vemuri and A. Radisavljevic, Multiresolution stochastic hybrid shape models
with fractal priors, ACM Trans. Graph., vol. 13, pp. 177207, 1994.
[77] D. Metaxas, E. Koh, and N. J. Badler, Multilevel shape representation using global
deformations and locally adaptive finite elements, Intl J. Comp. Vis., vol. 25, pp. 49
61, 1997.
[78] T. F. Cootes, A. Hill, C. J. Taylor, and J. Haslam, Use of active shape models for locating structures in medical images, Imag. Vis. Computing J., vol. 12, no. 6, pp. 355
366, 1994.
[79] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, Active shape models their
training and application, Comp. Vis. Imag. Under., vol. 61, no. 1, pp. 3859, 1995.
[80] J. C. Gower, Generalized Procrustes analysis, Psychometrika, vol. 40, pp. 3351,
1975.
[81] N. Duta and M. Sonka, Segmentation and interpretation of MR brain images: an
improved active shape model, IEEE Trans. Med. Imag., vol. 17, pp. 10491062,
1998.
[82] Y. Wang and L. H. Staib, Boundary finding with correspondence using statistical
shape models, in Proc. IEEE Conf. Comp. Vis. Patt. Recog., pp. 338345, 1998.
[83] G. H. Golub and C. F. V. Loan, Matrix Computations. Baltimore, MD: The Johns
Hopkins University Press, 3rd ed., 1996.
CHAPTER 4
Morphological Methods for Biomedical Image
Analysis
John Goutsias
The Johns Hopkins University
Sinan Batman
The Johns Hopkins University
Contents
4.1
4.2
Introduction
Binary morphological operators
177
182
4.2.1
182
4.2.2
4.2.3
4.2.4
182
185
187
4.2.5
4.2.6
4.2.7
4.3
4.4
195
195
199
201
201
4.3.2
202
204
206
207
209
4.4.3
209
175
214
214
4.4.6
4.5
4.6
4.7
218
218
220
221
223
223
225
227
227
230
4.6.3
233
Examples
237
237
240
4.7.3
4.7.4
4.7.5
4.7.6
Geodesic SKIZ
242
Watershedbased segmentation of overlapping particles 245
Grayscale segmentation
246
4.7.7 Examples
4.8 Conclusions and further discussion
4.9 Acknowledgments
4.10 References
253
260
262
263
Introduction 177
This chapter is an introduction to an image processing and analysis tool known
as mathematical morphology. Our purpose is to provide an easytoread overview
of basic aspects of this area of research and illustrate its applicability in image
processing and analysis problems related to biomedical imaging.
4.1
Introduction
(4.1)
(4.2)
If
then
for
otherwise
for
otherwise
(4.3)
Introduction 179
F2 \ F1 = F1c I F2
F1 U F2
F2
F1
F2
F1
(a)
(b)
or
(4.4)
Any set operator that satisfies Eq. (4.3) is called a binary erosion, whereas any set
operator that satisfies Eq. (4.4) is called a binary dilation. Erosions and dilations
are the most elementary operators of mathematical morphology [47]. We may
therefore argue that mathematical morphology is a natural algebraic framework
for binary image processing and analysis.
Mathematical morphology was first introduced in a seminal book by George
Matheron, entitled Random Sets and Integral Geometry [4]. This book laid down
the foundation of mathematical morphology and introduced it as a novel technique
for image processing and analysis. Mathematical morphology was subsequently
enriched and popularized by the highly inspiring book Image Analysis and Mathematical Morphology by Jean Serra [5]. Today, mathematical morphology is considered to be a powerful tool for image processing and analysis and has been used in
numerous applications, including industrial inspection, automatic target detection,
biomedical imaging, and remote sensing, just to mention a few.
From a theoretical point of view, mathematical morphology studies operators
between complete lattices (i.e., nonempty sets furnished with a partial order relationship for which every subset has an infimum and a supremum); see [6, 810] for
a lattice theoretic approach to mathematical morphology. Reasons why the theory
of complete lattices is the right algebraic framework for mathematical morphology
can be found in [11]. The power of mathematical morphology stems from the fact
that any translation invariant operator between complete lattices can be represented
by means of elementary morphological operators (e.g., see [12, 13]). An image operator can be built by composing elementary morphological operators. However,
this approach is not practical due to the need of using a prohibitively large number of elementary operators. Fortunately, most applications can be satisfactorily
(4.5)
(4.6)
or
Any operator that satisfies Eq. (4.5) is called a grayscale erosion, whereas any
operator that satisfies Eq. (4.6) is called a grayscale dilation.
Later in this chapter, we will see that for a grayscale translation invariant erosion , we have
(4.7)
(4.8)
where is a grayscale image known as the structuring function. Notice the
striking similarity between Eq. (4.1) and Eq. (4.7), Eq. (4.8). For example, by
comparing Eq. (4.1) with Eq. (4.8), it is clear that Eq. (4.8) is obtained by replacing
the integral in Eq. (4.1) with the supremum, the point spread function with the
structuring function, and the product with an addition. For this reason, Eq. (4.7)
and Eq. (4.8) are sometimes called morphological convolutions.
Introduction 181
Our purpose in this chapter is to provide an easytoread overview of basic
aspects of mathematical morphology and illustrate its use in image processing
and analysis problems related to biomedical imaging. The reader is referred to
[2, 3, 14, 15] for an elementary treatment and to [47, 1619] for a more advanced
exposition. The number of publications available on mathematical morphology has
grown substantially. Due to space limitation, we limit our references here to publications that complement our exposition. For works using mathematical morphology in biomedical imaging applications, the reader is referred to [2053], among
many other publications.
In this chapter, we first choose to discuss mathematical morphology on binary
images. This is the subject of Sections 4.2 and 4.3. We then extend our exposition
to mathematical morphology on grayscale images. This is the subject of Sections
4.4 and 4.5. Although such a route produces some redundancy, we believe that is
more pleasing to the reader for a very simple reason: binary morphology can be
easily visualized by means of geometry, making the exposition very intuitive. In
Section 4.6, we discuss the problem of binary and grayscale morphological image
reconstruction. In mathematical morphology, image reconstruction is the process
of extracting desirable parts from a given image, which have been marked by a
set of markers. Image reconstruction turns out to be very effective in problems of
object detection and image segmentation. The problem of binary and grayscale
image segmentation by means of morphological operators and, in particular, by
means of the watershed transform is discussed in Section 4.7. Finally, Section 4.8
summarizes our concluding remarks and provides a brief discussion on some recent
developments in mathematical morphology. Moreover, this section gives a list of
web sites where mathematical morphology software and toolboxes can be found.
Throughout this chapter, we illustrate basic concepts by means of examples
derived from biomedical imaging applications. Most of these examples are used for
illustration purposes only and they should not be thought of as validated solutions
to the biomedical imaging problems associated with them.
Before we proceed, we would like to make some remarks regarding our notation. Grayscale images (functions) are denoted with small letters . Capital
letters
denote binary images (sets). For simplicity of presentation, we
limit ourselves to images defined over the twodimensional Euclidean plane or
the twodimensional discrete plane . However, most of the concepts discussed
here can be extended to higher dimensions (e.g., see [31]). The pair will
denote a point (pixel) in or . However, we also use to denote points
in . Finally, set operators (i.e., operators that apply on binary images and produce binary images) will be denoted with capital Greek letters , whereas
grayscale operators (i.e., operators that apply on grayscale images and produce
grayscale images) will be denoted with small Greek letters .
Mathematical morphology is a tool for extracting geometric information from binary and grayscale images. A shape probe (known as a structuring element) is used
to build an image operator whose output depends on whether or not this probe fits
inside a given image. Clearly, the nature of the extracted information depends on
the shape and size of the probe used. To illustrate this concept, we initially restrict
our discussion to the case of binary images.
4.2.1
The most elementary set operators of interest to mathematical morphology are operators that are increasing. A set operator is increasing if
for every translation . A set operator may be increasing but not translation
invariant and vice versa. Most frequently, however, we are interested in operators
that are both increasing and translation invariant.
4.2.2
(4.9)
for every pair of binary images, is a binary erosion, whereas, a set operator
, for which
for every pair of binary images, is a binary dilation.
(4.10)
As a direct consequence of Eq. (4.9) and Eq. (4.10), both erosion and dilation are increasing
operators (e.g., if
, then Eq. (4.9) implies
).
Every translation invariant erosion is of the form
B+h
F 8B
F 8B
B:
(
B+h
B:
(a)
(b)
Figure 4.2: The effect of translation invariant erosion, in (a), and translation invariant dilation, in (b), on a shape
. Notice that
(b).
This formula provides a geometric interpretation for the translation invariant erosion. It suggests that the translation invariant erosion of a set (shape) by a structuring element comprises all points of such that the structuring element
located at fits entirely inside .
The effect of the translation invariant erosion, using a disk structuring element,
is illustrated in Fig. 4.2(a). Notice that is shrunk (i.e.,
) in a manner
determined by the shape and size of the structuring element . This is always true
when the structuring element contains the origin. In general, a set operator for
which
, for every image , is called antiextensive.
Similarly, every translation invariant dilation is of the form
for some structuring element . In set theory, the translation invariant dilation is
known as the Minkowski addition. It can be shown that
is the reflection of with respect to the origin.
where
This
formula suggests that the dilation of a set (shape) by a structuring element
Translation invariance
F 8 B1 F 8 B2 if B1 B2
F 8( B1 U B2 ) = ( F 8 B1 ) I( F 8 B2 )
Parallel composition
( F1 I F2 )8 B = ( F1 8 B ) I( F2 8 B )
Distributivity of intersection
F 8( B1 I B2 ) ( F 8 B1 ) U( F 8 B2 )
( F1 U F2 )8 B ( F1 8 B ) U( F2 8 B )
( F 8 B1 )8 B2 = F 8( B1 B2 )
Serial composition
F1 F2 F1 8 B F2 8 B
rF 8rB = r ( F 8 B )
Homogeneity
Anti extensitivity
Duality
comprises all points such that the reflected structuring element translated to
hits (intersects) . Moreover, , which is a form of duality
between erosion and dilation. This relationship simply says that the dilation of a
shape by a structuring element is the set complement of the erosion of the set
complement of with structuring element .
The effect of the translation invariant dilation, using a disk structuring element,
is illustrated in Fig. 4.2(b). Notice that is expanded (i.e.,
) in a
manner determined by the shape and size of . This is always true when contains
the origin. In general, a set operator for which , for every image , is
called extensive.
Some properties of erosion and dilation are listed in Tables 4.1 and 4.2. In these
tables,
is the scaled version of . Throughout this chapter,
we simply refer to translation invariant erosion and dilation as erosion and dilation,
respectively.
The effect of erosion and dilation on a binarized image of a histological breast
sample is shown in Fig. 4.3. Notice that erosion by a disk structuring element
with a diameter of pixels, effectively removes debris, producing the clean image
depicted in Fig. 4.3(c). However, when a larger disk is used, significant portions of
the image are removed as well. This is depicted in Fig. 4.3(d). Clearly, the erosion
is very sensitive to pepper noise (i.e., small black dots) in the image, which tends
Commutativity
( F + h) B = F ( B + h) = ( F B) + h
Translation invariance
F B1 F B2 if B1 B2
F ( B1 U B2 ) = ( F B1 ) U( F B2 )
Parallel composition
( F1 U F2 ) B = ( F1 B ) U( F2 B )
Distributivity of union
F ( B1 I B2 ) ( F B1 ) I( F B2 )
( F B1 ) B2 = F ( B1 B2 )
Serial composition
F1 F2 F1 B F2 B
rF rB = r ( F B )
Homogeneity
Extensitivity
Duality
to get larger in size. Reciprocally, dilation effectively expands the main features
in an image, as depicted in Figs. 4.3(e),(f).
4.2.3
(4.11)
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.3: Binary erosion and dilation: (a) original grayscale image of the histologic appearance of fibrocystic changes in breast; (b) original image after binarization; (c) erosion
where the underlined digit denotes the center of the structuring element. Notice
that the cross mask contains five elements. The output of the median filter, with
the mask placed at a pixel of an image , is if at least three of the pixels
in the mask have value and is zero otherwise. This is equivalent to at least one
of the structuring elements in the basis, shifted at pixel , being a subset of
. According to Eq. (4.11), with being replaced by , the median filter
over the cross mask can be implemented as the union of ten erosions with the
structuring elements listed above. Notice that the erosion of by any of these
structuring elements can be implemented by an AND operation, whereas the union
of erosions can be implemented by an OR operation. Therefore, the representation
of Eq. (4.11) on one hand provides a link between erosions and median filters, and
on the other hand it provides a way of implementing median filters by means of
AND and OR operations.
4.2.4
In general, the erosion and dilation operators do not enjoy an often desirable property, known as idempotence. An idempotent shape operator excerpts all information
from a single application, while consecutive applications of the same operator will
not have any effect. The ideal bandpass filter is an example of a linear operator
that shares this property. The erosion and dilation operators on the other hand keep
modifying the image at each successive application. However, when a translation
invariant erosion is composed with a translation invariant dilation , the
resulting operator is idempotent, regardless of the order of composition. The composition is called an opening, whereas the composition
is called a closing.
The notions of opening and closing are, however, more general: a set operator that is increasing (i.e.,
), antiextensive (i.e.,
), and idempotent (i.e., ) is called an opening, whereas
a set operator that is increasing, extensive (i.e.,
), and idempotent is
called a closing. It can be shown that openings and closings are closed with respect
to union and intersection, respectively. This means that union of openings is an
opening, whereas intersection of closings is a closing.
The operator
(4.13)
B+h
F B
F B
F
B:
B:
(
B+h
(a)
(b)
Figure 4.4: The effect of structural opening, in (a), and structural closing, in (b), on a shape
.
This formula provides a geometric interpretation for the structural opening in Eq.
(4.13). It suggests that is the union of all translated structuring elements
that fit inside . The effect of the structural opening (using a disk
structuring element) is illustrated in Fig. 4.4(a). Notice that attempts to undo
the effect of erosion , by applying the associated dilation .
Opening a shape with a structuring element removes all components of that
are smaller than , in the sense that they cannot contain any translated replica
of . It therefore acts as a smoothing filter. The amount and type of smoothing is
determined by the shape and size of the structuring element used.
Similarly, the operator
This formula suggests that is the collection of all pixels such that all
translated structuring elements which contain intersect . The effect
of the structural closing (using a disk structuring element) is illustrated in
Fig. 4.4(b). Notice that attempts to undo the effect of dilation by
Translation invariance
F ( B + h) = F B
F1 F2 F1 B F2 B
rF rB = r ( F B )
Homogeneity
FB F
Anti extensitivity
( F B) B = F B
(
F B = ( F c B )c
Idempotence
Duality
element removes all components of that are smaller than . This is a direct
consequence of the duality between structural openings and closings, which says
that .
Some properties of structural openings and closings are listed in Tables 4.3 and
4.4, respectively. The effect of these operators on a binarized image of a histological breast sample is depicted in Fig. 4.5 (see also Fig. 4.3).
Another useful opening operator is the socalled binary area opening. This
operator removes a grain from a binary image whose area is less than a given value.
By a grain of an image , we mean a connected component of . A component
is connected if, given two pixels in , there exists at least one path
that connects these two pixels and lies entirely in . Mathematically, the binary
area opening is expressed by
where
are the grains of and denotes the area of
(when is a discrete set, denotes the number of its elements, or the socalled
cardinality). It is not difficult to see that, for a fixed value of , this operator is
increasing, antiextensive, and idempotent and, therefore, is an opening. The area
opening is a morphological filter that filtersout grains of area less than a specified
value, whereas it passes grains of area larger than this value.
By duality, the binary area closing is defined as follows:
The area closing fills in the holes in a binary image, whose area is strictly smaller
than .
Translation invariance
F ( B + h) = F B
4.2.5
F1 F2 F1 B F2 B
rF rB = r ( F B )
Homogeneity
F F B
Extensitivity
( F B ) B = F B
(
F B = ( F c B )c
Idempotence
Duality
As we mentioned in the previous subsection, the union of openings is still an opening, whereas the intersection of closings is a closing. This simple observation is
very useful in practice, since it allows the design of openings and closings by taking the union or intersection of elementary openings or closings, respectively. This
observation, however, is more general. It can be shown (e.g., see [6]) that any translation invariant opening
(i.e., a translation invariant operator that is increasing,
antiextensive, and idempotent) can be written as a union of structural openings,
whereas any translation invariant closing
(i.e., a translation invariant operator
that is increasing, extensive, and idempotent) can be written as an intersection of
structural closings, i.e.,
(4.14)
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.5: An example of binary structural opening and closing: (a) original grayscale image of the histologic appearance of fibrocystic changes in breast; (b) original image after
binarization; (c) structural opening by a disk structuring element with a diameter of pixels;
(d) structural opening by a disk structuring element with a diameter of
Instead of only probing the inside, or the outside, of a given binary image with a
structuring element, it may be fruitful in certain applications to probe both the background and the foreground at the same time. The hitormiss operator formalizes
this idea. It is defined by
and
Therefore, the hitormiss transformed set contains all points that simultaneously
belong to the erosion of the foreground by the structuring element and the
erosion of the background by the structuring element . Some properties of
this operator are listed in Table 4.5. It is required that
for the hitormiss
operator not to result in an empty set. The hitormiss operator is well suited to the
task of locating points inside an object with certain (local) geometric properties;
e.g., isolated points, edge points, corner points, etc. (e.g., see [6, 56]).
The hitormiss operator leads to an extraordinary result, due to Banon and Barrera [12]: any translation invariant set operator can be represented as a union of
hitormiss operators. This is clearly a more powerful result than the representation
by a union of erosions (or an intersection of dilations), which is limited to the case
of increasing and translation invariant operators.
To be formal, we first need two definitions. Given two sets and , the interval
is the collection of all sets such that
. Given a set operator ,
(a)
(b)
(c)
(d)
(e)
Figure 4.6: Filtering via a union of openings: (a) original grayscale image of aortic atheromatous plaque; (b) original image after binarization; (c) the result of applying a union of
openings, with linear structuring elements (a horizontal, a left diagonal, and a right diagonal) of length , on the image in (b); (d) filter residue (the set difference between images (b)
and (c)); (e) the result in (c) overlaid on the original grayscale image in (a). Data courtesy
of E. C. Klatt, Department of Pathology, University of Utah. Used with permission.
Translation Invariance
F c ( A, B ) = F ( B, A)
F ( A + h, B + h ) = [ F ( A, B )]  h
rF ( rA, rB ) = r[ F ( A, B )]
Homogeneity
A = F ( A, B ) = F 8 B
Decreasing operator
B = F ( A, B ) = F 8 A
Increasing operator
A I B F ( A, B ) =
where is the kernel of operator , defined by Eq. (4.12).
translation invariant set operator , we have that (e.g., see [6])
(4.15)
Morphological gradients
(4.16)
where is a structuring element that contains the origin, is known as the morphological gradient. is a boundary peeler that estimates the boundary of an
object . This operator is often used to obtain surface area estimates in 3D binary
images. The result depends on the size and shape of the structuring element and
is less affected by boundary noise when compared to differential edge detectors.
By using variants of Eq. (4.16), it is possible to detect external or internal
boundaries. Indeed, the external morphological gradient operator
(4.17)
The morphological gradient, and its external and internal variants, are always
positive because dilations and erosions with structuring elements that contain the
origin are extensive and antiextensive, respectively. An illustration of the morphological gradient operators is depicted in Fig. 4.7. The cross structuring element
, centered at the origin , is used
in this case. Notice that the internal gradient operator detects cellular boundaries
correctly, whereas the gradient and the external gradient operators may connect the
boundaries of some cells which are in close proximity to each other. For more
information on morphological gradients the reader is referred to [58, 59].
4.2.8
Conditional dilation
If an image is dilated by a structuring element that contains the origin, it is expanded. If this type of dilation is repeated indefinitely, the original image will grow
out of bound. One way to avoid this problem is to restrict the dilation within
a mask element . This defines a new type of (nontranslation invariant) dilation,
known as conditional dilation, given by
It will become clear in the following (see Section 4.6) that the conditional dilation
plays a key role in defining a new morphological operator, known as opening by reconstruction, which turns out to be very useful in object detection and segmentation
problems.
4.2.9
defines an opening (i.e., it is increasing, antiextensive, and idempotent). This is
known as the annular opening. The dual
(a)
(b)
(c)
(d)
(e)
(f)
yearold anemic
female with a diagnosis of hereditary spherocytosis; (b) original image after binarization; (c)
gradient using the cross structuring element ; (d) external gradient; (e) internal gradient;
(f) gradient overlaid on the original grayscale image in (a). Data courtesy of K. C. Klatt,
Department of Pathology, University of Utah. Used with permission.
2D
(a)
Figure 4.8:
(a) A grain
F \ (F B) I F
(F B) I F
(b)
such that
;
ement .
(a)
(b)
(c)
(d)
(e)
(f)
diagnosis of acute promyelocytic leukemia demonstrates many fragmented red blood cells
due to disseminated intravascular coagulation; (b) original image after binarization and area
closing; (c) annular opening by a circular structuring element; (d) normal red blood cell markers extracted from (c); (e) marked normal red blood cells overlaid on the original grayscale
image in (a) (dark shapes); (f) marked fragmented, overlapping, and suspicious red blood
cells overlaid on the original grayscale image in (a) (dark shapes). Data courtesy of K. C.
Klatt, Department of Pathology, University of Utah. Used with permission.
Morphological filters
are binary morphological filters as well. Based on these compositions, the operators
(recall that
, for every structuring element )
where
dilations
(4.18)
(4.19)
are binary morphological filters. In the literature, these filters are known as alternating filters (AF), since they alternate between opening and closing. The operators
(a)
(b)
(c)
(d)
Figure 4.10: Morphological filtering of the binary image depicted in Fig. 4.3(b): (a) original
binary image; (b) filtering by the AF
; (c) filtering by the morphological filter
; (d) filtering
by the ASF
.
are binary morphological filters as well. Finally, we can compose alternating filters
to form another class of binary morphological filters known as alternating sequential filters (ASF), given by
Usually, the ASFs are more preferable in practice than the AFs. This is primarily due to the fact that, for a given value of , an ASF filters out shape components
by gradually increasing the size of the structuring element, from to , whereas
an AF filters out shape components by only applying the structuring element .
Figure 4.10 depicts the results obtained by filtering the binary image of Fig. 4.3(b)
with the AF , in (b), the morphological filter , in (c), and the ASF , in (d).
In all cases, is taken to be the cross structuring element. For more information
on binary morphological filters, the reader is referred to [6, 54, 6163].
where
for
is an orthogonal shape decomposition scheme, in the sense that it decomposes a
binary image into disjoint components. Moreover,
PATTERN SPECTRUM
details on this subject). Notice that the notion of size is directly related to the structuring element used: a grain of is of size if there exists at least one
translated replica of that fits inside , whereas there is no translated replica of
that fits inside . Similar remarks hold for ,
as it pertains to the holes of .
Based on these remarks, it is now clear that the DST is a multiresolution image
decomposition scheme in terms of successive differences of openings and closings
with structuring elements of increasing size. For , the component of the
DST contains only grains that are of size . On the other hand, for , the
component of the DST contains only holes that are of size .
4.3.2
The DST can be thought of as the morphological analogue of the Fourier transform. Both transforms decompose an image into orthogonal components that are
sufficient for reconstructing the image under consideration. It is quite common, in
Fourierbased image processing and analysis techniques, to characterize images by
means of the magnitude of the Fourier transform, known as the Fourier spectrum,
while discarding phase information. A similar approach applies in the morphological case as well. An image is characterized by the magnitude of the DST, known
PF ; B ( k )
0 12
01
0 08
0 06
0 04
0 02
0
20
15
10
10
15
20
Figure 4.11: The (normalized) pattern spectrum
of a binary image. Notice that the
large peak at size indicates the prominent presence of particles of size .
where is the area (or cardinality) of set . Notice that the pattern spectrum depends on the particular structuring element used (i.e., for a given image, different
pattern spectra can be obtained for different structuring elements).
As in the case of the Fourier spectrum, the information conveyed by the pattern
spectrum of an image is not sufficient for reconstructing . However, the pattern
spectrum conveys some useful information regarding the shape/size content of a
binary image. For example:
(a) The boundary roughness of , relative to the structuring element , appears
in the lower part of the pattern spectrum, for a rough boundary, and in the
higher part of the pattern spectrum, for a smooth boundary.
(b) Long capes or bulky protruding parts in of pattern show up as isolated
impulses or jumps at the positive sizes of the pattern spectrum.
(c) Big jumps at negative sizes illustrate the existence of prominent intruding
gulfs or holes in .
All these properties are the morphological analogues of similar properties of
the Fourier spectrum. Table 4.6 summarizes similarities between the Fourier and
the pattern spectrum. Figure 4.11 depicts the (normalized) pattern spectrum of the
binary image in Fig. 4.9(b), with the structuring element being a disk of diameter
. Normalization produces a pattern spectrum
from
, such that
. Notice that the large peak at size indicates the prominent
presence of particles of size .
h
h
Maximal
disk at h
Skeleton
D( h )
D( h ) D F
"
where
"
(4.20)
"
# # #
where
#
The set #
is called the skeleton subset of
#
#
for
. In this case,
#
#
for
!
#
(4.21)
#
#
(b)
(a)
Figure 4.13: Binary morphological skeleton using the cross structuring element
skeleton of the binary image
the complement
, superimposed on
:
(a)
; (b) skeleton of
Notice that
#
$
(4.22)
t1
t2
F ( t2 )
F ( t1 )
(a)
(b)
of .
Threshold decomposition
The collection
known as the threshold decomposition of . An image
by its threshold decomposition, since
is uniquely characterized
% %
at every pixel . Figure 4.14 depicts two cross sections of a 1D signal for a
high and a low value of the threshold %. Notice that, as % increases, % decreases.
In particular, if % % , then %
% . This is known as the stacking
property.
An image has a unique threshold decomposition . However, given a collec % % % of non empty sets, there might not exist an
tion
image with threshold decomposition . This is because the cross sections of an
image should satisfy the stacking property.
Threshold decomposition provides a useful link between grayscale and binary
images. This link is used as a way to combine grayscale images that is compatible
with mathematical morphology, by means of the following three steps:
and
,
respec
%
3. Set
% %
and
and
,
respec
%
%
% % %
3. Set
% %
In this case,
Evidently, the grayscale analogues of union and intersection are the pixelwise
supremum and infimum, respectively. It is also true that
% %
for every %
As we said before, the most elementary operators of interest to mathematical morphology are increasing and translation invariant. A grayscale image operator
is said to be increasing if
implies that . On the other
hand, if denotes the spatially translated image , then
a grayscale image operator is spatially translation invariant if
invariant if
&
&
Notice that, here, we are dealing with two types of translation invariance, spatial and grayscale, as opposed to the binary case where we only deal with spatial
translation invariance. In the grayscale case, when we speak about translation invariance, we mean both spatial and grayscale translation invariance.
When is realvalued, the negative image of is defined by
takes finite values in
, the negative image
When
by
(4.23)
of
is defined
Notice that when
(i.e., in the case of binary images), is the set complement. Finally, a grayscale image operator is antiextensive, if , extensive, if , and idempotent, if .
4.4.3
As in the binary case, where it is opted to use operators that distribute over unions
and intersections, it is desirable to deal with grayscale image operators that distribute over suprema and infima. Any image operator such that
, for every pair of grayscale images, is called a grayscale
, for
erosion. Any image operator such that
every pair of grayscale images, is called a grayscale dilation. As a direct
consequence of these definitions, both grayscale erosion and dilation are increasing
operators.
where is a grayscale image known as the structuring function (e.g., see [6]). In
the special case when
for
otherwise
(4.24)
This is called flat (grayscale) erosion and is denoted (with a slight abuse of notation)
by . A structuring function of the form of Eq. (4.24) is usually referred to
as a flat structuring function. The flat erosion replaces the value of an image
at a pixel by the infimum of the values of over a structuring element .
Some properties of the grayscale erosion are summarized in Table 4.7. In this
is the reflection of the structuring function around
table,
the origin. An example of a flat grayscale erosion is depicted in Fig. 4.15.
When the grayscale dilation is translation invariant, then
for some structuring function . In the special case when is flat, then
This is called flat (grayscale) dilation and is denoted by . The flat dilation
replaces the value of an image at a pixel by the supremum of the values
(a)
(b)
(c)
(d)
Figure 4.15: Flat grayscale erosion: (a) original grayscale image of a lateral pulmonary
artery angiogram; (b)(d) flat erosions by a disk structuring element with a diameter of , ,
and pixels, respectively. Data courtesy of the University of Washington Digital Anatomist
(a)
(b)
(c)
(d)
Figure 4.16: Flat grayscale dilation: (a) original grayscale image of a lateral pulmonary
artery angiogram; (b)(d) flat dilations by a disk structuring element with a diameter of , ,
and pixels, respectively. Data courtesy of the University of Washington Digital Anatomist
, y 0 8 b )( x ,
y ) = ( f 8b x
,  y0
)( x , y )
= ( f 8b)( x  x0 , y  y0 )
( f + v )8b = f 8( b  v ) = ( f 8b) + v
f 8b1 f 8b2 if b1 b2
f 8( b1 b2 ) = ( f 8b1 ) ( f 8b2 )
Parallel composition
Distributivity of minimum
f 8( b1 b2 ) ( f 8b1 ) ( f 8b2 )
( f 8b1 )8b2 = f 8( b1 b2 )
Serial composition
f1 f 2 f1 8b f 2 8b
f 8b f if b( 0) 0
(
f 8b = ( f * b )*
Anti extensitivity
Duality
, y0
Commutativity
b)( x , y ) = ( f bx
, y0
)( x , y )
= ( f b)( x  x0 , y  y0 )
( f + v ) b = f ( b + v ) = ( f b) + v
f b1 f b2 if b1 b2
f ( b1 b2 ) = ( f b1 ) ( f b2 )
Parallel composition
( f1 f 2 ) b = ( f1 b) ( f 2 b)
Distributivity of maximum
f ( b1 b2 ) ( f b1 ) ( f b2 )
( f b1 ) b2 = f ( b1 b2 )
Serial composition
f1 f 2 f1 b f 2 b
f b f if b( 0) 0
(
f b = ( f * 8b )*
Extensitivity
Duality
and
is the dual of operator , given by
4.4.5
, y0
b)( x , y ) = ( f b)( x  x0 , y  y0 )
y ) = ( f b)( x  x0 , y  y0 )
b)( x ,
0 , y0
( f + v ) b = ( f b) + v
( f + v ) b = ( f b) + v
f1 f 2 f1 b f 2 b
f1 f 2 f1 b f 2 b
f b f
Anti extensitivity
f f b
Extensitivity
( f b) b = f b
( f b) b = f b
(
f b = ( f * b )*
Idempotence
Duality
Another useful opening is the socalled grayscale area opening. This operator
removes grains from the cross sections of a grayscale image with area below a given
value. Mathematically, a grayscale area opening is expressed by
where %
. It is not difficult to see that, for a fixed value of , this operator is increasing,
antiextensive, and idempotent; therefore, it is an opening.
By duality, the grayscale area closing is defined as follows:
where is defined in Eq. (4.23). This operator fills in the holes of the cross
sections % of image , whose area is strictly smaller than . Efficient algorithms
for the implementation of grayscale area openings and closings can be found in
[89].
An example, illustrating the grayscale area opening operator, is depicted in
Fig. 4.19. The blood smear, depicted in the first row of Fig. 4.19(a), is to be binarized in order to obtain the regions occupied by individual cells. The second row
of Fig. 4.19(a) depicts the result of such a binarization. Due to the fact that blood
cells have a zone of central pallor, which produces the bright regions within each
cell, the result of binarization is not satisfactory: most cells produce a region filled
with a black hole. Area opening can be effectively used to ameliorate this problem.
This is evident from the results depicted in Figs. 4.19(b),(c). As the value of , in
Eq. (4.25), increases, the bright regions within each cell are effectively suppressed
and binarization produces a more acceptable result.
(a)
(b)
(c)
(d)
Figure 4.17: Flat structural opening: (a) original grayscale image of a lateral pulmonary
artery angiogram; (b)(d) structural openings by a disk structuring element with a diameter
of , , and
pixels, respectively.
(a)
(b)
(c)
(d)
Figure 4.18: Flat structural closing: (a) original grayscale image of a lateral pulmonary
artery angiogram; (b)(d) structural closing by a disk structuring element with a diameter
of , , and
where denotes the invariance domain of the grayscale operator , given by
.
4.4.7
Flat image operators are increasing. This follows directly from the way these
operators are constructed. Moreover, they enjoy properties directly induced from
the set operator applied on each cross section. For example, if is an erosion,
then is an erosion as well. The same is true when is a dilation, opening,
or closing; then, is a dilation, opening, or closing, respectively. Moreover, if
, then
. Thus, a flat (grayscale) erosion with
(a)
(b)
(c)
Figure 4.19: Grayscale area opening: (a) original grayscale image of a blood smear (first
row) and the result obtained after binarization by means of thresholding (second row); (b)
the result of area opening (first row), applied on the grayscale image in (a), obtained by
means of Eq. (4.25) with
thresholding (second row); (c) the result of area opening (first row), applied on the grayscale
image in (a), obtained by means of Eq. (4.25) with
f
f 8B
f B
B:
F (t )8 B
F (t ) B
F (t )
F (t )
t
(a)
(b)
Figure 4.20: An example of flat grayscale erosion, in (a), and of flat opening, in (b).
Morphological gradients
'
'
'
provided that is continuously differentiable. In this case, the morphological gradient with a disk structuring element of radius converges to the magnitude of the
of , as decreases to zero.
gradient
Similar gradient operators are the external and internal gradients, given by
and
Figure 4.21 depicts an example of applying the grayscale morphological gradient, and its variants, on a magnetic resonance (MR) image of the human brain.
In this case, the square structuring element
, , ,
, , , , , , centered at the origin , has been
used. Notice that, in this example, the internal gradient provides the best separation
between anatomical structures.
4.4.9
produces such peaks and ridges. This is known as the opening tophat operator.
Dually, the operator
produces the hollows and ravines of
the topographic surface of . This is known as the closing tophat operator. Figure
4.22 illustrates the use of this operator to extract and visualize the sulci from a lateral view of the right brain hemisphere. The procedure is shown for a 2D grayscale
image. However, it can be used for a 3D representation of the brain as well. Correct extraction and visualization of sulci is important for identifying landmarks to
achieve functional segmentation of the brain and to guide the surgeon during an
operation.
(a)
(b)
(c)
(d)
Figure 4.21: Grayscale morphological gradient: (a) an MR image of the human brain;
(b) histogram equalized grayscale morphological gradient with the square structuring element
(a)
(b)
Figure 4.22: Morphological closing tophat: (a) lateral view of the right brain hemisphere;
(b) closing tophat by a disk structuring element with a diameter of pixels. Data courtesy
4.4.10
Conditional dilation
If a grayscale image is dilated with a structuring element that contains the origin,
its subgraph grows. If this type of dilation is successively repeated, the subgraph
of the original image grows without bound. One way to restrict this growth is to
restrict the dilation of an image by a structuring element within a mask
image !. This defines the socalled grayscale conditional dilation, given by
! !
It will become clear in the following (see Section 4.6) that the grayscale conditional dilation plays a key role in defining a new morphological operator, known
as grayscale opening by reconstruction, which turns out to be very useful in image
segmentation problems.
4.4.11
Morphological filters
(a)
(b)
(c)
(d)
regions of interest (from [90]); (b) detected edges, obtained by means of thresholding the
image in (a) and applying the internal morphological gradient operator [Eq. (4.17)] on the
binary result; (c) the result of applying the AF
on
;
means of thresholding the image in (c) and applying the internal morphological gradient
operator [Eq. (4.17)] on the binary result.
(
(4.26)
where is given by Eq. (4.19), are grayscale morphological filters. These filters are known as grayscale alternating filters (AF), since they alternate between
opening and closing. The operators
"
)
+
( (
(
In this section, we discuss the grayscale analogues of the discrete size transform
and the associated pattern spectrum. The reader is referred to [66] for more details
on these subjects. Although the skeleton transform can be defined for grayscale
images as well, the resulting image representation is not very useful in practice.
Therefore, we will not be discussing this representation here.
Given a grayscale image , its discrete size transform (DST) is defined by
for
for
with defined in Eq. (4.19). We limit our presentation here to flat openings and
closings, although extension to more general structural openings and closings is
possible [66]. Notice that
for
.
always nonnegative, for every . Moreover,
, obtained by successive approximations of by means of structural openings and closings. Notice that
is the opening
is the closing tophat
tophat transform of , whereas
transform. In general,
, for , contains a layer scrapped from the subgraph of by means of the difference
, whereas
,
by means of the
for , contains a layer scrapped from the subgraph of
difference
. Since both opening and closing are smoothing
(lowpass) filters, the DST can be thought of as the output of a filterbank comprising a collection of bandpass filters. However, the term band is not associated here
to the frequency content of , as is customary in linear filterbank techniques (e.g.,
see [64]), but to the particular layer scrapped from the subgraph of .
The pattern spectrum of a grayscale image , in terms of a structuring element
, is given by
where
for
for
In this section, we discuss a powerful tool for object extraction from binary and
grayscale images, known as morphological image reconstruction. This is an iterative tool that extracts regions of interest from an image marked by a set of markers.
We start by limiting our exposition to the binary case. We then extend our discussion to the grayscale case by means of threshold decomposition.
4.6.1
Consider a shape , like the one depicted in Fig. 4.24(a), comprising several
(nonoverlapping) grains. Suppose that we are interested in an operator that automatically extracts all grains ! of that are marked by a marker (i.e., a set
). In practice, may mark important targets of interests (objects) that
need to be extracted from image .
Let be the collection of all grains of and let ! denote the
portion of that contains all grains marked by . Under certain conditions, !
can be computed by means of elementary morphological operators. Indeed, let
times
It can be shown that if is a structuring element that contains the origin, such that
for every
,
(4.27)
!

(4.28)
Shape F
Marker F m
Reconstructed Shape F$
(a)
F$ = RB ( F m  F )
B
dB1 ( F m  F )
Shape F
Marker F m
Reconstructed Shape F$
(b)
Figure 4.24: (a) The problem of binary morphological image reconstruction. (b) Binary
morphological image reconstruction implemented by means of conditional dilations.
for
(4.29)
. See Fig. 4.24(b) for an illustration.
with


is a closing (i.e., an operator that is increasing, extensive, and idempotent) and is
called closing by reconstruction.
The binary conditional reconstruction operator can be used to define another
useful morphological operator, known as the binary closehole operator. This operator is given by

(4.30)
where the binary marker is the boundary of the image window. As illustrated
in Fig. 4.25, the closehole operator fills in all holes in a binary image that do not
touch the image window boundary.
(a)
(b)
(c)
(d)
of
notice that only grains that touch the image window boundary are recon
structed; (d) the set complement of the image in (c) all holes in
4.6.2
F m (t)
d1B (f m  f )
fm
F (t)
d1B ( F m ( t ) F ( t ))
(a)
(b)
f
f$ = rB (f m  f )
m
f m F$ ( t ) = RB ( F ( t ) F ( t ))
(c)
reconstructed signal
.
of the marker
!
% 
% %
Because 
is an increasing operator, given ,
is a flat
image operator generated by 
! . Under certain conditions,
can
be computed by means of elementary grayscale morphological operators. Indeed,
let
times
It can be shown that if is a structuring element that contains the origin, with
%
% for every , and every %
where % is the grain of the cross section % of , then (grayscale) morpho
s;
logical image reconstruction can be achieved by taking the supremum of all
i.e.,
Notice that
with
struction process.
for
(4.31)
(4.32)
(4.33)
(4.34)
where the grayscale marker takes value at pixels on the boundary of the
image window, and , otherwise. Clearly, this operator fills in all holes in the
cross sections of image that do not touch the image window boundary.
Morphological image reconstruction is a time consuming process. Sequential
implementation, by means of Eq. (4.29) or Eq. (4.31), is inefficient. However, a
number of alternative algorithms have been proposed in the literature that result in
faster implementation. For more information on this subject, the reader is referred
to [95].
4.6.3
Examples
Morphological image reconstruction is a powerful tool for extracting objects of interest from a given image. In the following, we illustrate this by means of two
examples: detection of the lateral ventricle in an MR image of the brain and extraction of filarial worms in a microscopic image of blood stream. Additional examples
may be found in [95].
Feature detection in MR imaging.
This example illustrates the use of grayscale morphological image reconstruction
for detecting the lateral ventricle in an MR image of the brain, depicted in Fig.
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.27: Detection of the lateral ventricle in an MR image of the brain: (a) original image
(from [96]); (b) grayscale structural opening of the image in (a) by a disk structuring element
with a diameter of
of that part of the image in (a) marked by the image in (b); (d) the differences between the
images in (a) and (c); (e) the result of thresholding the image in (d); (f) the boundary of the
result of applying a binary area opening on the image in (e), overlaid on the original data in
(a).
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 4.28: Extraction of filarial worms: (a) a microscopic image of filarial worms in the
bloodstream, indicated by the two white arrows; (b) the difference between the original image in (a) and a grayscale closing by reconstruction operator applied on (a); (c) the grayscale
structural opening of the image in (b) by the cross structuring element ; (d) the grayscale
area opening of the image in (c); (e) the result of thresholding the image in (d); (f) the morphological skeleton of the image in (e); (g) the result of a binary area opening applied on
the image in (f); (h) binary morphological reconstruction of the filarial worms from the image
in (e), using the image in (g) as a marker; (i) the reconstruction result in (h), overlaid on the
original image in (a). Images courtesy of SDC Information Systems. Used with permission.
on the image depicted in (a). This is known as the closingbyreconstruction tophat operator.
4.7
The distance transform is a basic tool for the construction of morphological segmentation operators. It will soon become apparent that the distance transform is
intrinsically related to many morphological set operators, like translation invariant
erosions, dilations, and skeletons, to mention a few.
Let us consider images defined over the twodimensional Euclidean space .
A function ! ! from " into the set of nonnegative real numbers is
called a distance function, if the following three properties are satisfied:
1. &
2. &
if and only if
& , &
.
.
.
One calls & the distance between points and & . Examples of common distance functions between two points and & & & in are
&
&,
cityblock distance
&
& &, Euclidean distance
&
" & &, chessboard distance.
Given a distance function , the distance transform of a binary image
at a point is defined by
&
&
(4.35)
is a disk structuring element with radius centered at the origin, is given by
(4.36)
(a)
(b)
(c)
(d)
(e)
(f)
Figure 4.29: The distance transform: (a) a grayscale microscopic image of red blood cells
from a patient with hereditary spherocytosis; (b) binarized version
(a)
(b)
(c)
Figure 4.30: Disk structuring elements with increasing radius, for three choices of the
distance function: (a) cityblock distance; (b) Euclidean distance; (c) chessboard distance.
This explains the fact that the boundaries of successive erosions with a disk structuring element coincide with the level lines in Fig. 4.29(e). Notice that Eq. (4.35)
and Eq. (4.36) depend on the particular choice for the distance function , which
leads to a particular choice for the disk structuring element . Figure 4.30 depicts disk structuring elements , with increasing value of , for three different
choices of the distance function.
By using the duality between erosions and dilations, the level lines of the distance transform can be associated with the boundaries of erosions with a
disk structuring element. If is a closed and bounded subset of , we have that
0
(4.37)
1
2
for every
,
The set complement of the union of the influence zones of all grains of form the
SKIZ of ; i.e.,
#341
1
Clearly, the SKIZ is the collection of all points in that do not belong to any
influence zone. Moreover, because the influence zone 1 is an open subset of
, the SKIZ is a closed subset of .
Interestingly, it can be shown that the SKIZ of a shape follows the ridges, or
crest lines, of the distance transform . Roughly speaking, a ridge, or crest
line, of a grayscale image is a curve on the topographic surface of , such that, as
we walk along this line, the points to the right and to the left are lower than the ones
we are on. A more precise mathematical characterization, based on elementary
concepts such as gradient, directional derivative, and vector inner product, can be
found in [97]. A detailed analysis of the crest points (i.e., the points forming a crest
line) in a discrete setting can be found in [98], with emphasis on the identification
of the crest points of the distance function.
An example of binary image segmentation based on the SKIZ is depicted in
Fig. 4.31. The distance transform of the binary image in Fig. 4.29(b)
is depicted in Fig. 4.31(a), plotted as a grayscale function and overlaid on image
. The crest lines, extracted from , produce the SKIZ, which is depicted in
Fig. 4.31(b), overlaid on (refer to [99] for methods on how to extract crest lines
and ravines in grayscale images). Clearly, the SKIZ forms a continuous net that
effectively segments the image into disjoint partitions, each partition containing at
most one red blood cell.
4.7.3
(a)
(b)
Figure 4.31: Binary segmentation using the SKIZ: (a) the distance transform
of the
image
of the topographic surface of such that a drop of water falling at 6 slides along
the surface until it reaches . A small hole is now punched at the bottom of each
catchment basin of , and the subgraph of is slowly immersed into
water. While the subgraph is flooded, by water passing through the holes, the water level raises uniformly over the subgraph; see Fig. 4.32(b). At some moment,
water filling a catchment basin starts merging with water coming from an adjacent
catchment basin. At this particular moment, a dam is erected to prevent this from
happening. When no new dams need to be constructed, the procedure is stopped.
The collection of all erected dams are depicted in Fig. 4.32(c). When the subgraph
of is totally immersed into water, the only visible structures at the water surface will be the top of the dams. These form the socalled watershed lines, which
are depicted in Fig. 4.32(d). It turns out that the crest lines of the distance function,
or equivalently the SKIZ, coincide with these watershed lines.
4.7.4
Geodesic SKIZ
Although the SKIZ is an effective tool for segmenting binary images of nonoverlapping particles, what happens when particles overlap? We answer this question
in this subsection. However, we first need to modify the notion of distance between
two points and & in .
Consider two points & inside a binary image
. A path in between
and & is any curve joining these two points that is always in . If the points & can
be connected by a path in , then there exists a path joining and & whose length
is not greater than the length of any other path with the same endpoints. This path
is called a geodesic path. See Fig. 4.33 for an example. Given two points & ,
the length of the geodesic path connecting these two points is called the geodesic
distance between and & and is denoted by & . If there exists no geodesic
flooding
water
catchment
basin
(a)
(b)
(c)
(d)
dam
Figure 4.32: The binary watershed transform: (a) the distance transform
image
of the
depicted in Fig. 4.29(b); (b) the subgraph of is immersed into water water
fills in from holes punched through the bottom of catchment basins; (c) the dams constructed
to avoid water from merging from two adjacent catchment basins; (d) the top of the dams in
(c) defines the watershed lines, which are overlaid on
u
v
geodesic path
a path that
is not a geodesic
Figure 4.33: Geodesic path between two points and in a binary image
for every 7
&
Subtracting from the union of the influence zones of all markers in , we obtain
the geodesic SKIZ, given by
#341
1
It is not difficult to see that the geodesic SKIZ can be computed by extracting the
ridges, or crest lines, of the geodesic distance transform
. Clearly, the
geodesic SKIZ can be used to segment partially overlapping particles, as illustrated
in Fig. 4.34(b). However, the segmentation result depends strongly on the choice
for the markers.
To illustrate a method for choosing appropriate markers, let us consider the
simple case of two partially overlapping disks and of the same radius, like
F = F1 U F2
M1
M1
M2
F1
M2
F1
F2
Z F ( M1 )
F2
geodesic SKIZ
Z F (M 2 )
(b)
(a)
Figure 4.34: (a) Two overlapping grains
have been marked by two markers
geodesic SKIZ.
the ones depicted in Fig. 4.35(a). We can mark these two disks by their centers.
Then, the geodesic SKIZ, based on these markers, provides the desirable segmentation. The centers of the two disks can be easily obtained by means of the distance
transform: we first calculate the distance transform , where
;
the desirable disk centers can then be computed as the two points in where
assumes its maximum value. This simple example suggests that appropriate markers for segmenting a binary image of overlapping particles based on the geodesic
SKIZ can be obtained by means of determining the peaks, or regional maxima, of
the distance transform . Recall that a peak, or regional maximum, of a
grayscale image is a connected component of pixels in with a given grayscale
value & , such that every pixel in the neighborhood of has a value strictly smaller
than & . However, keep in mind that this approach may produce misleading results.
This is clear from the example depicted in Fig. 4.35(b).
4.7.5
F = F1 U F2
F2
F1
!
M2
!
M1
geodesic SKIZ
(a)
F1
F2
!
M1
!
M2
geodesic SKIZ
desirable
segmentation
(b)
Figure 4.35: (a) Two overlapping disks with the same radius, marked by their centers, and
the segmentation obtained by means of the geodesic SKIZ. (b) The geodesic SKIZ produces
the wrong segmentation result when the radius of one of the two disks is reduced.
ment basins, we erect dams. Once the surface is totally immersed into water, the
top of the erected dams provides the watershed lines, which, in turn, segment the
image into the desirable regions; see Fig. 4.36(c). Notice that the watershed transform is not applied on the distance transform , as it was done in the case
of nonoverlapping particles, but on the negative distance transform . The
watershed lines thus produced do not necessarily coincide with the geodesic SKIZ.
4.7.6
Grayscale segmentation
regional
minima
(b)
(a)
(c)
Figure 4.36: Binary watershed segmentation: (a) a binary image
(b) the negative distance transform
(c) the watershed lines, overlaid on
of overlapping particles;
f
region
crest lines
f o = gradient of f
catchment basin
Figure 4.37: Mapping regions and region contours in
lines of the gradient of .
(a)
(b)
(c)
Figure 4.38: Oversegmentation as a result of watershedbased segmentation using the
morphological gradient: (a) original image ; (b) the result of the morphological gradient
(a)
(b)
(c)
(d)
Figure 4.39: The effect of prefiltering on watershedbased segmentation, using the morphological gradient: (a) original image
;
by means of an
opening by reconstruction followed by a closing by reconstruction; (c) the result of the morphological gradient applied on the image in (b); (d) the resulting watershed lines, overlaid
on .
(a)
(b)
(c)
(d)
(e)
(f)
in (b) and by immersing the subgraph of into water; (d) the combined internal and external
markers; (e) the combined internal and external markers overlaid on ; (f) the watershed
lines obtained by piercing the topographic surface of the morphological gradient of , de
picted in Fig. 4.38(b), at the points of the combined marker in (d), and by immersing the
subgraph of the morphological gradient into water.
Examples
The watershed transform is a powerful tool for extracting regions of interest from
a given image. In the following, we illustrate this by means of two examples:
segmentation of MR images of the prostate and segmentation of the left ventricle
in tagged MR images of the heart.
Segmentation of MR images of the prostate.
This example illustrates the use of the watershed transform for segmenting MR
images of the prostate, like the one depicted in Fig. 4.41(a). Here, we are interested in segmenting major anatomical features, such as the prostate, the rectum,
and the Denonvillies fascia (which separates the prostate from the anterior surface
of the rectum). In this application, segmentation is very important for pre and
postoperative assessment of the prostate. For more information on this imaging
technique, the reader is referred to [109]. Figure 4.41(a) depicts an MR image of
the prostate with labeling indicating the three regions of interest. Our main objective is to automatically determine a set of internal and external markers that can
be used for a successful watershedbased segmentation. Figure 4.41(b) depicts the
negative of the image in (a). Notice that, in this image, the regions of interest are
characterized by high graylevel values, as compared to the gray values associated
with the tissues surrounding the prostate and rectum. Thresholding will produce
appropriate markers for these regions. However, before thresholding is applied, we
need to reduce the graylevel variation within individual regions. To accomplish
this goal, the image in (b) is subjected to the opening by reconstruction operator
[Eq. (4.32)], followed by the closing by reconstruction operator [Eq. (4.33)]. The
structuring element in Eq. (4.32) and Eq. (4.33) is taken to be a disk with a diameter of pixels. The result is depicted in Fig. 4.41(c). Thresholding now produces
the binary image depicted in Fig. 4.41(d), from which the required markers will
be extracted. Notice that this image consists of connected components, labeled
5 , 5 and 5 . First, the holes in 5 are closed by means of the binary closehole
operator [Eq. (4.30)]. Then, the resulting image is slightly shrunk, by means of a
binary erosion by a disk structuring element with a diameter of pixels, and the result is smoothed out by means of a structural opening by a disk structuring element
with a diameter of pixels. Notice that the smoothed component 5 can serve as
an internal marker for the prostate, the smoothed component 5 can serve as an
external marker for the prostate, whereas the smoothed component 5 can serve as
an internal marker for the rectum. What is left to do is to obtain an external marker
prostate
Denonvillies
fascia
rectum
(b)
(a)
C1
C2
C3
(c)
(d)
(e)
(f)
Figure 4.41: Segmentation of an MR image of the prostate: (a) original image with three
labeled anatomical features; (b) the negative of the image in (a); (c) the result of applying
an opening by reconstruction followed by a closing by reconstruction on the image in (b); (d)
the result of thresholding the image in (c); (e) the combined internal and external markers (in
white) obtained from the image in (d), overlaid on the original image in (a); (f) the watershed
lines, overlaid on the original image in (a). Data courtesy of Clare Tempany, MD, Director
of Clinical MRI, Brigham and Womens Hospital, Harvard University. Used with permission.
endocardium
right ventricle
epicardium
left ventricle
(a)
(b)
Figure 4.42: (a) A slice of a tagged MR image of the heart tag lines appear in black.
(b) The result of applying a grayscale structural closing by a vertically oriented pixelwide
linear structuring element to the image in (a) all tag lines have been successfully removed.
Data courtesy of E. R. McVeigh, Department of Biomedical Engineering and Radiology,
The Johns Hopkins University. Used with permission.
(4.38)
The resulting image, depicted in Fig. 4.43(a), for a slice close to the base of the
heart, is processed by the grayscale area opening operator [Eq. (4.25)], with
, which eliminates unwanted debris. The result is depicted in Fig. 4.43(b). Notice that, the only remaining object resides within the left ventricle cavity. Simple
thresholding, followed by an application of the binary alternating filter
in
and being a pixelwide square structuring element,
Eq. (4.18), with
produces the binary image depicted in Fig. 4.43(c), overlaid on the image depicted
in (b). This binary image serves as an initial estimate of the endocardial marker. It
will be used to provide the actual internal, the intermediate, and the external markers needed by the watershed transform. The inner marker is obtained via eroding
this image by a disk structuring element with a diameter of pixels. The intermediate and outer markers are obtained by extracting the boundaries of the dilations of
the same image by a pixelwide cross structuring element and by a pixel wide
disk structuring element, respectively. The sizes of the structuring elements are
chosen such that, the internal marker always exists, the intermediate marker always
stays within the myocardium, and the external marker does not intersect with the
epicardium. The resulting markers are depicted in Fig. 4.43(d) as an overlay. The
watershed transform is now applied on the internal morphological gradient of the
image depicted in Fig. 4.43(e) (and not on the gradient of the image in (b) to avoid
oversegmentation) and produces the result depicted in Fig. 4.43(f). The image in (e)
is obtained by applying a variant of the grayscale alternating sequential filter (
,
discussed in Subsection 4.4.11. The main difference between this filter and the traditional alternating sequential filter is that the opening is replaced by opening by
reconstruction, and the closing is replaced by closing by reconstruction. The filter
effectively reduces grayscale variation within homogeneous regions, as is evident
by comparing Figs. 4.42(b) and 4.43(e). We use a filter of order and a disk
structuring element with a diameter of pixels. Notice that the watershed transform does a satisfactory job at locating the endocardium and epicardium. However,
the resulting segmentation starts to deteriorate for slices closer to the hearts apex
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
Figure 4.43: (a) The result of the closehole tophat operator [Eq. (4.38)], applied on the
image depicted in Fig. 4.42(b). (b) The result of grayscale area opening applied on the image in (a). (c) The result of applying a binary alternating filter on the thresholded version of
the image in (b). (d) The internal, intermediate, and external left ventricle markers obtained
from the binary image in (c). (e) A simplified version of the image depicted in Fig. 4.42(b)
obtained by means of a grayscale alternating sequential filter based on opening and closing
by reconstruction operators. (f) The watershed lines obtained by applying the watershed
transform on the internal morphological gradient of the image in (e) marked by the markers in (d). (g) A tighter set of markers obtained by a morphological manipulation of the
result in (f). (h) The watershed lines obtained by applying the watershed transform on the
internal morphological gradient of the image in (e) marked by the markers in (g). (i) Final
segmentation obtained by smoothing the watershed lines in (h).
Slice #5
Slice #3
Slice #1
Frame #1
Frame #5
Frame #9
Figure 4.44: A closeup of final segmentation results for selected tagged MR slices of the
heart, at different time frames. Data courtesy of E. R. McVeigh, Department of Biomedical
Engineering and Radiology, The Johns Hopkins University. Used with permission.
Acknowledgments
This work was supported by the Office of Naval Research, Mathematical, Computer, and Information Sciences Division, under ONR Grant N00014901345, and
by the National Science Foundation, under NSF Award #9729576.
The authors would like to thank Professor Edward C. Klatt, Department of
Pathology, University of Utah, for kindly permitting use of the images depicted
in Figs. 4.3, 4.54.7, 4.9, and 4.29, and Clare Tempany, MD, Director of Clinical
MRI, Brigham and Womens Hospital, Harvard University, for kindly permitting
use of the prostate image depicted in Fig. 4.41. The authors also thank Professor Elliot R. McVeigh, Department of Biomedical Engineering and Radiology, The Johns
Hopkins University, for permitting use of the data associated with the segmentation of tagged MR images of the heart example of Subsection 4.7.7. Special thanks
to SDC Information Systems for permitting use of the images depicted in Figs. 4.19
References 263
and 4.28. The example depicted in Fig. 4.28 is a slightly modified version of the
mmdfila demonstration of the SDC Morphology Toolbox for MatLab and has
been replicated here with permission. All simulations have been implemented in
MatLab 5.3 using the SDC Morphology Toolbox for MatLab , version 0.9.
Finally, the authors are indebted to Henk Heijmans, Roberto Lotufo, and Milan
Sonka for providing suggestions on how to improve the manuscript.
4.10
References
Reading, Mas
[3] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision. Pacific Grove, California: PWS Publishing, second ed., 1999.
[4] G. Matheron, Random Sets and Integral Geometry. New York City, New York: John
Wiley, 1975.
[5] J. Serra, Image Analysis and Mathematical Morphology. London, England: Academic Press, 1982.
[6] H. J. A. M. Heijmans, Morphological Image Operators. Boston, Massachusetts:
Academic Press, 1994.
[7] P. Soille, Morphological Image Analysis: Principles and Applications. Berlin, Germany: Springer, 1999.
[8] H. J. A. M. Heijmans and C. Ronse, The algebraic basis of mathematical morphology I. Dilations and erosions, Computer Vision, Graphics, and Image Processing,
vol. 50, pp. 245295, 1990.
[9] C. Ronse and H. J. A. M. Heijmans, The algebraic basis of mathematical morphology II. Openings and closings, Computer Vision, Graphics, and Image Processing:
Image Understanding, vol. 54, pp. 7497, 1991.
[10] H. J. A. M. Heijmans, Theoretical aspects of graylevel morphology, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 568582, 1991.
[11] C. Ronse, Why mathematical morphology needs complete lattices, Signal Processing, vol. 21, pp. 129154, 1990.
[12] G. J. F. Banon and J. Barrera, Minimal representations for translationinvariant set
mappings by mathematical morphology, SIAM Journal of Applied Mathematics,
vol. 51, pp. 17821798, 1991.
[13] G. J. F. Banon and J. Barrera, Decomposition of mappings between complete
lattices by mathematical morphology, Part I. General lattices, Signal Processing,
vol. 30, pp. 299327, 1993.
[14] C. R. Giardina and E. R. Dougherty, Morphological Methods in Image and Signal
Processing. Englewood Cliffs, New Jersey: Prentice Hall, 1988.
References 265
[30] J. S. J. Lee, W. I. Bannister, L. C. Kuan, P. H. Bartels, and A. C. Nelson, A processing strategy for automated Papanicolaou smear screening, Analytical and Quantitative Cytology and Histology, vol. 14, pp. 415425, 1992.
[31] F. Meyer, Mathematical morphology: from two dimensions to three dimensions,
Journal of Microscopy, vol. 165, pp. 528, 1992.
[32] J. Pladellorens, J. Serrat, A. Castell, and M. J. Yzuel, Using mathematical morphology to determine left ventricular contours, Physics in Medicine and Biology,
vol. 37, pp. 18771894, 1992.
[33] M. E. Brummer, R. M. Mersereau, R. L. Eisner, and R. R. J. Lewine, Automatic detection of brain contours in MRI data sets, IEEE Transactions on Medical Imaging,
vol. 12, pp. 153166, 1993.
[34] Y. Chen, E. R. Dougherty, S. M. Totterman, and J. P. Hornak, Classification of
trabecular structure in magnetic resonance images based on morphological granulometries, Magnetic Resonance in Medicine, vol. 29, pp. 358370, 1993.
[35] A. Moragas, C. Castells, and M. Sans, Mathematical morphologic analysis of
agingrelated epidermal changes, Analytical and Quantitative Cytology and Histology, vol. 15, pp. 7582, 1993.
[36] M. Sans and A. Moragas, Mathematical morphologic analysis of the aortic medial
structure: Biomechanical implications, Analytical and Quantitative Cytology and
Histology, vol. 15, pp. 93100, 1993.
[37] B. D. Thackray and A. C. Nelson, Semiautomatic segmentation of vascular network images using a rotating structuring element (ROSE) with mathematical morphology and dual feature thresholding, IEEE Transactions on Medical Imaging,
vol. 12, pp. 385392, 1993.
[38] J. Cardillo and M. A. SidAhmed, An image processing system for locating craniofacial landmarks, IEEE Transactions on Medical Imaging, vol. 13, pp. 275289,
1994.
[39] F. Moreso, D. Seron, J. Vitria, J. M. Grinyo, F. M. ColomeSerra, N. Pares, and
J. Serra, Quantification of interstitial chronic renal damage by means of texture
analysis, Kidney International, vol. 46, pp. 17211727, 1994.
[40] W. Bocker, W.U. Muller, and C. Streffer, Image processing algorithms for the automated micronucleus assay in binucleated human lymphocytes, Cytometry, vol. 19,
pp. 283294, 1995.
[41] J. A. GimenezMas, M. P. SanzMoncasi, L. Rem o n, P. Gambo, and M. P. GallegoCalvo, Automated textural analysis of nuclear chromatin: A mathematical morphology approach, Analytical and Quantitative Cytology and Histology, vol. 17,
pp. 3947, 1995.
[42] C. Tsai, B. S. Manjunath, and R. Jagadeesan, Automated segmentation of brain MR
images, Pattern Recognition, vol. 28, pp. 18251837, 1995.
[43] G. Wolf, M. Beil, and H. Guski, Chromatin structure analysis based on a hierarchic
texture model, Analytical and Quantitative Cytology and Histology, vol. 17, pp. 25
34, 1995.
References 267
[58] J. S. J. Lee, R. M. Haralick, and L. G. Shapiro, Morphologic edge detection, IEEE
Journal of Robotics and Automation, vol. 3, pp. 142156, 1987.
[59] J.F. Rivest, P. Soille, and S. Beucher, Morphological gradients, Journal of Electronic Imaging, vol. 2, pp. 326336, 1993.
[60] H. J. A. M. Heijmans and C. Ronse, Annular filters for binary images, IEEE Transactions on Image Processing, vol. 8, pp. 13301340, 1999.
[61] H. J. A. M. Heijmans, Composing morphological filters, IEEE Transactions on
Image Processing, vol. 6, pp. 713723, 1997.
[62] P. Maragos and R. W. Schafer, Morphological filters  Part II: Their relations to
median, orderstatistic, and stack filters, IEEE Transactions on Acoustics, Speech,
and Signal Processing, vol. 35, pp. 11701184, 1987.
[63] J. Serra and L. Vincent, An overview of morphological filtering, Circuits, Systems
and Signal Processing, vol. 11, pp. 47108, 1992.
[64] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. Englewood Cliffs,
New Jersey: Prentice Hall, 1995.
[65] P. Maragos, Morphological skeleton representation and coding of binary images,
IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 1228
1244, 1986.
[66] P. Maragos, Pattern spectrum and multiscale shape representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 701716, 1989.
[67] I. Pitas and A. N. Venetsanopoulos, Morphological shape decomposition, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 3845,
1990.
[68] J. Goutsias and D. Schonfeld, Morphological representation of discrete and binary
images, IEEE Transactions on Signal Processing, vol. 39, pp. 13691379, 1991.
[69] J. M. Reinhardt and W. E. Higgins, Efficient morphological shape representation,
IEEE Transactions on Image Processing, vol. 5, pp. 89101, 1996.
[70] R. Kresch and D. Malah, Skeletonbased morphological coding of binary images,
IEEE Transactions on Image Processing, vol. 7, pp. 13871399, 1998.
[71] D. Schonfeld and J. Goutsias, Optimal morphological pattern restoration from
noisy binary images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 1429, 1991.
[72] E. R. Dougherty, R. M. Haralick, Y. Chen, C. Agerskov, U. Jacobi, and P. H. Sloth,
Estimation of optimal morphological opening parameters based on independent
observation of signal and noise pattern spectra, Signal Processing, vol. 29, pp. 265
281, 1992.
[73] R. M. Haralick, P. L. Katz, and E. R. Dougherty, Modelbased morphology: The
opening spectrum, Graphical Models and Image Processing, vol. 57, pp. 112,
1995.
References 269
[88] F. Y.C. Shih and O. R. Mitchell, Threshold decomposition of grayscale morphology into binary morphology, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 11, pp. 3142, 1989.
[89] L. Vincent, Morphological area openings and closings for greyscale images, in
Shape in Picture: Mathematical Description of Shape in Greylevel Images (Y.L.
O, A. Toet, D. Foster, H. J. A. M. Heijmans, and P. Meer, eds.), pp. 197208, New
York City, New York: SpringerVerlag, 1994.
[90] J. C. Russ, The Image Processing Handbook. Second Edition. Boca Raton, Florida:
CRC Press, 1995.
[91] Y. Chen and E. R. Dougherty, Grayscale morphological granulometric texture classification, Optical Engineering, vol. 33, pp. 27132722, 1994.
[92] K. Sivakumar and J. Goutsias, Morphologically constrained GRFs: Applications to
texture synthesis and analysis, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 21, pp. 99113, 1999.
[93] L. Vincent, Fast grayscale granulometry algorithms, in Mathematical Morphology
and its Applications to Image Processing (J. Serra and P. Soille, eds.), pp. 265272,
Dordrecht, The Netherlands: Kluwer, 1994.
[94] K. Sivakumar, M. J. Patel, N. Kehtarnavaz, B. Yoganand, and E. R. Dougherty, A
constanttime algorithm for erosions/dilations with applications to morphological
texture feature computation, RealTime Imaging, To Appear, 2000.
[95] L. Vincent, Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms, IEEE Transactions on Image Processing, vol. 2,
pp. 176201, 1993.
[96] S. E. Umbaugh, Computer Vision and Image Processing: A Practical Approach
using CVIPtools. Upper Saddle River, New Jersey: Prentice Hall, 1998.
[97] R. M. Haralick and L. G. Shapiro, Computer and Robot Vision. Volume I. Reading,
Massachusetts: AddisonWesley, 1992.
[98] F. Meyer, Skeletons in digital spaces, in Image Analysis and Mathematical Morphology. Volume 2: Theoretical Advances (J. Serra, ed.), pp. 257296, London, England: Academic Press, 1988.
[99] A. M. Lopez, F. Lumbreras, J. Serrat, and J. J. Villanueva, Evaluation of methods
for ridge and valley detection, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 21, pp. 327335, 1999.
[100] J. B. T. M. Roerdink and A. Meijster, The watershed transform: Definitions, algorithms and parallelization strategies, Fundamenta Informaticae, To Appear, 2000.
[101] F. Meyer and S. Beucher, Morphological segmentation, Journal of Visual Communication and Image Representation, vol. 1, pp. 2146, 1990.
[102] L. Vincent and P. Soille, Watersheds in digital spaces: An efficient algorithm based
on immersion simulations, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 13, pp. 583598, 1991.
References 271
[119] J. Goutsias, H. J. A. M. Heijmans, and K. Sivakumar, Morphological operators for
image sequences, Computer Vision and Image Understanding, vol. 62, pp. 326
346, 1995.
[120] P. Salembier and J. Serra, Flat zones filtering, connected operators, and filters by
reconstruction, IEEE Transactions on Image Processing, vol. 4, pp. 11531160,
1995.
[121] C. Ronse, Settheoretical algebraic approaches to connectivity in continuous or
digital spaces, Journal of Mathematical Imaging and Vision, vol. 8, pp. 4158,
1998.
[122] P. Salembier, A. Oliveras, and L. Garrido, Antiextensive connected operators for
image and sequence processing, IEEE Transactions on Image Processing, vol. 7,
pp. 555570, 1998.
[123] J. Serra, Connectivity on complete lattices, Journal of Mathematical Imaging and
Vision, vol. 9, pp. 231251, 1998.
[124] H. J. A. M. Heijmans, Connected morphological operators for binary images,
Computer Vision and Image Understanding, vol. 73, pp. 99120, 1999.
[125] J. Serra, Connections for sets and functions, Fundamenta Informaticae, vol. 41,
pp. 147186, 2000.
[126] A. Toet, A morphological pyramidal image decomposition, Pattern Recognition
Letters, vol. 9, pp. 255261, 1989.
[127] X. Kong and J. Goutsias, A study of pyramidal techniques for image representation and compression, Journal of Visual Communication and Image Representation,
vol. 5, pp. 190203, 1994.
[128] A. Morales, R. Acharya, and S.J. Ko, Morphological pyramids with alternating
sequential filters, IEEE Transactions on Image Processing, vol. 4, pp. 965977,
1995.
[129] J. Goutsias and H. J. A. M. Heijmans, Nonlinear multiresolution signal decomposition schemes. Part 1: Morphological pyramids, IEEE Transactions on Image
Processing, To Appear, 2000.
[130] R. L. de Queiroz, D. A. F. Florencio, and R. W. Schafer, Nonexpansive pyramid for
image coding using a nonlinear filterbank, IEEE Transactions on Image Processing,
vol. 7, pp. 246252, 1998.
[131] H. J. A. M. Heijmans and J. Goutsias, Nonlinear multiresolution signal decomposition schemes. Part 2: Morphological wavelets, IEEE Transactions on Image
Processing, To Appear, 2000.
[132] M.H. Chen and P.F. Yan, A multiscaling approach based on morphological filtering, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11,
pp. 694700, 1989.
[133] R. van den Boomgaard and A. Smeulders, The morphological structure of images:
The differential equations of morphological scalespace, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, pp. 11011113, 1994.
Stuttgart, Germany:
[147] P. Maragos and M. A. Butt, Curve evolution, differential morphology, and distance
transforms applied to multiscale and eikonal problems, Fundamenta Informaticae,
vol. 41, pp. 91129, 2000.
[148] V. Caselles, F. Catte, T. Coll, and F. Dibos, A geometric model for active contours,
Numerische Mathematik, vol. 66, pp. 131, 1993.
[149] J. Barrera, G. J. F. Banon, R. A. Lotufo, and R. Hirata Jr., MMach: a mathematical morphology toolbox for the KHOROS system, Journal of Electronic Imaging,
vol. 7, pp. 174210, 1998.
CHAPTER 5
Feature Extraction
Murray H. Loew
George Washington University
Contents
5.1
5.2
5.3
5.4
5.5
Introduction
5.1.1 Why features? Classification (formal or informal)
almost always depends on them
5.1.2 Review of applications in medical image analysis
5.1.3 Roots in classical methods
5.1.4 Importance of data and validation
Invariance as a motivation for feature extraction
5.2.1 Robustness as a goal
5.2.2 Problemdependence is unavoidable
Examples of features
5.3.1 Features extracted from 2D images
Feature selection and dimensionality reduction for classification
5.4.1 The curse of dimensionality subset problem
5.4.2 Classification versus representation
5.4.3 Classifierindependent feature analysis for classification
5.4.4 Classifierindependent feature extraction
5.4.5 How useful is a feature: separability between classes
5.4.6 Classifierindependent feature analysis in practice
5.4.7 Potential for separation: nonparametric feature
extraction
5.4.8 Finding the optimal subset
5.4.9 Ranking the features
Features in practice
5.5.1 Caveats
273
275
275
276
278
279
279
279
280
280
280
286
286
286
287
291
291
295
296
304
306
308
308
Future developments
Acknowledgments
References
308
325
335
335
335
Introduction 275
5.1
Introduction
This chapter describes the need for image features, categorizes them in several
ways, presents the constraints that may determine which are used in a given application, defines some of them mathematically, and gives examples of their use in
research and in clinical settings. Features can be based on individual pixels (e.g.,
the number having an intensity greater than x; the distance between two points),
on areas (the detection of regions having specific shapes), on time (the flow in a
vessel, the change in an image since the last examination), and on transformations
(wavelet, Fourier, and many others) of the original data.
Excluded from this chapter are discussions of the many methods of image enhancement and preprocessing (for example, noise removal, contrast improvement,
edge detection) used in improving human visual understanding. A large literature
exists, and reference to it for those methods will be made as needed.
5.1.1
Features play a number of roles in medical image analysis. Modeldriven features, which incorporate knowledge of the anatomy and condition in which we are
interested, are introduced in Chapter 7. Such a topdown approach is useful in
many imaging applications. The categories below aim to describe briefly the tasks
in which features many of which are modeldriven are used. Other chapters in
this volume contain numerous examples of the applications of features; this chapter
provides a brief taxonomy and some explanation of the origins of the features.
5.1.2.1
By purpose
a. Screening (detection)
The goal in screening is to detect conditions that may be disease or evidence
of disease, with the intent of conducting a detailed followup of suspicious
findings. Consequently, the emphasis usually is on sensitivity (the detection
of abnormalities when they exist) at the cost of an increased falsepositive
rate (decreased specificity). And, because screening is performed on large
numbers of people, it is important that the procedures be low in cost and
provide results quickly. The features should therefore be easy to extract,
contribute to the sensitivity of the procedure, and require minimal user intervention.
Examples of images used in screening include xray mammograms for breast
cancer, structured visible light images for childhood scoliosis (curvature of
the spine), and fundus photography for diseases of the retina of the eye.
b. Diagnosis (classification) Diagnosis aims to make a specific identification of
a problem: is the suspicious region in the breast a fibroadenoma, a cyst, or a
Introduction 277
carcinoma? Are there microaneurysms in the retina? In cases such as those,
where screening preceded diagnosis, additional features may be required. In
many applications, however, diagnosis is the first step; examples include the
classification and counting of blood cells, the analysis of tissue cells, and the
characterization of chromosomes in the pathology laboratory.
Another application is comparison: of an image with an earlier version of
the same region (to describe changes in a condition), or of a given image to
an atlas or other standard, for diagnosis or reporting. The Visible Human
Project [1] provides data sets that could be useful in that respect.
c. Therapy and treatment planning
In radiation oncology it is often necessary to identify and align structures in
two images: one is the prescription image used to indicate the areas to be
treated; the other is the portal image, taken just prior to treatment. The portal image is intended to confirm that the treatment beam is aimed correctly.
Typically, however, image quality is low, because of greatly reduced contrast.
Various enhancement methods [2, 3] have been employed, but it is nevertheless often necessary to extract features (e.g., shapes, areas) to provide the
basis for identification of the treatment areas and their boundaries.
A solution of the more general problem of multimodality image registration (as used, for example, in imageguided surgery [4, 5]) often depends on
the ability to recognize correspondence between equivalent structures in the
separate modalities. In cases where external fiducials are not used, successful registration may rely on the comparison of features extracted from each
modality [6].
5.1.2.2
By specialty
The features chosen will depend in part on the modality and intended use; the
clinical specialty will dictate that and impose its own requirements and conventions.
For that reason, the features must be capable of being shown to fit the clinicians
understanding of the pathology or disease process; in many cases, of course, it will
be the clinicians experience that suggests features in the first place.
Radiology (grayscale images only): xray plain films (of chest, extremities),
computerized tomography (CT), magnetic resonance imaging (MRI), ultrasound, positron emission tomography (PET), singlephoton emission computerized tomography (SPECT), nuclear medicine; fluoroscopic methods for
angiography and interventional techniques
Pathology (color plays an important role for clinicians): optical and electron
microscopy and effect of using stains of various kinds; fluorescence
Dermatology (color is important): still and video images of skin
By representation
The techniques employed in feature extraction generally are motivated by wellunderstood methods from image processing (in turn, based largely on communication theory; e.g., [7] and signal processing [8]), and statistics [9]. The extension to
higher dimensionality of classical methods usually is straightforward in principle
(though the computational cost can be high), but even then, many methods lack the
capability to deal with geometric and topologic structure. The definitions of and
attempts to characterize connectivity, texture, and boundary, for example, and the
effect of scale changes, all required extensions and new developments. Part of that
effort derived from the goal of incorporating lessons from human vision, given that
it is so successful at integrating local features [10, 11].
When considering individual pixels, it is useful to look for neighbors, connectivity, gradients, and other onedimensional descriptions (perhaps computed in
2D): examples are the histogram and measures of entropy. But much human image understanding depends on the description of regions, and feature extraction can
assume the regions are given or can attempt to find them. The latter task often is
called segmentation and is considered in Chapter 2. Examples of the ways regions
can be described include the following: statistics of amplitude; boundary description (including fractal); edge detection; topological measures; shape descriptors;
texture aggregation; cooccurrence; moments; correlation; skeletons and other medial measures; and the result of morphological operations.
Many kinds of transforms are usefully applied for feature extraction. The
Fourier transform, for example, provides information about spatial frequency for
the image or some subimage. But questions of scale, and the detection of properties that are global or local (and perhaps not knowing a priori which are needed),
have led to the use of a set of transforms that preserves locality in space (and in
time, for sequences of images).
Ideally, features extracted from medical images should be robust; they should
be capable of providing the same information irrespective of noise, artifact, intrinsic variation in the underlying image, and parameter settings in the extraction
algorithms. In practice, this is difficult to achieve and to demonstrate. Noise models exist for the major imaging modalities [12, 13], so it is possible to simulate
(and sometimes to model) feature extraction under a variety of noise conditions
and thus to evaluate performance. Image artifacts (e.g., inhomogeneities of field
in MR, scintillationscreen variations in x ray) are quite varied [12]. Intrinsic variation the range of anatomic and physiologic variation in health and in disease
has some quantification [14, 15], but rare cases exist and cannot be considered
comprehensively. Algorithms for feature extraction preferably should not require
userdefined parameter settings; where unavoidable, those parameters variations
Problemdependence is unavoidable
The nature of the problem should be taken into account when features are being considered. The clinical users knowledge of and experience with the data,
in the context of the application, will suggest features that should be evaluated.
Although many case studies exist in the literature that offer examples of successful features, most new problems can benefit from individualized considerations of
their characteristics. It is unrealistic to expect that some general set of features will
always contain a useful subset; rather, the analyst should enlist the aid of the clinician and investigate the problem carefully. It will be necessary then to convert the
(often) qualitative expression of the features into quantitative and repeatable measures. Techniques in the following sections are likely to be useful, but should be
considered as starting points only.
5.3
Examples of features
Certain assumptions are embodied in the examples and techniques that follow.
In some practical cases, for example, it will be important to employ preprocessing
operations to remove noise from the images. A great variety of methods exists
[7, 9, 16] and will not be examined here. It should be noted, however, that the
choice of technique may depend on whether the image is to then be analyzed by a
human, a machine, or both. Similarly, it may be necessary to segment the image
into regions of interest prior to applying some of the approaches described here.
5.3.1
5.3.1.1
b. Central Moments
d. Entropy
Descriptions of regions
Regions typically are defined on the basis of their internal homogeneity in some
characteristic(s). Scale often is important in defining that homogeneity. And, because there is often selfsimilarity in the homogeneity, fractal features can provide
information that other kinds of features cannot. The following sections illustrate
those points.
Shape The shape of a subimage may be described in terms of its boundary (contourbased) and/or its interior (regionbased).
Regionbased
a. Effective diameter (diameter of circle having the same area)
where P is perimeter.
d. Projections
Though useful mostly in binary image processing, projections can serve
as a basis for definition of related region descriptors. They can be defined in all directions; the most common, however, are the horizontal
and vertical:
They can be useful also in measuring homogeneity in grayscale images, and height and width of regions.
e. Moments
When comparing images or their regions to one another or to a standard,
a set of moments derived by Hu [20] can be quite useful, as it has the
properties of being invariant to translation, rotation, and scale change.
For the image f(x,y), we define
where
where
!
!
Other sets of moments have been described by Flusser and Suk [2123]
(yielding descriptors that are invariant under general affine transformations) and by Gupta and Srinath [24] (which uses only the boundary
and can treat spiral and nonconvex contours).
Texture, and the cooccurrence matrix An example of the use of texture was
provided by Thiele et al. [25], who used texture analysis of the breast tissue surrounding microcalcifications on digitally acquired images during stereotactic biopsy to predict malignant versus benign outcomes. The analysis calculated statistical features from gray level cooccurrence matrices and fractal geometry for equalprobability and linear quantizations of the image data. That preliminary study,
using 54 cases, obtained a sensitivity of 89% and a specificity of 83%, and it was
expected that this would be useful in resolving problems of discordance between
pathological and mammographic findings, and might ultimately reduce the number
of benign biopsies.
One way to describe relationships among pixels is to choose a relationship and
examine the image to determine the ways in which the relationship appears. Let
In any effort at designing a method for classifying images (e.g., into normal
and disease), it is essential that there be a training set of images of known classification. As illustrated above, many features could be computed and used in the
classifier. But misclassification probability tends to increase with the number of
features, and classifier structure is more difficult to interpret [32]. Further, the prediction variability tends to increase, and the classifier is sensitive to outliers. There
is no guarantee that the classifier will perform as well on a new set of samples.
Cover [33] has shown that if there are too many features and too small a training
set, a perfect classification may result, which is nevertheless meaningless in the
sense that performance on a test set may be very poor.
How, then, should a best subset of a set of candidate features be chosen? Only
an exhaustive search over all subsets of features can provide the answer [34]. For
dimensionality , the number of subsets is , implying a large computing task.
Further, for the twoclass case, it was found that for large and small probability
of error, , the sample size
is given by
"
and so it is easy to see the growth with dimensionality and with reduction of error. The remainder of this section examines the question of feature selection as a
practical problem.
5.4.2
The many applications and types of image features offer a number of opportunities to define and select those that will be most useful in a specific situation. A
pathologist who knows that nuclei of certain sizes are important may wish to have
the computer examine a series of cell images and display only those containing the
desired nuclei. The radiologist comparing several ultrasound images might want to
Feature analysis for classification is based on the discriminatory power of features. This is a measure of the usefulness of the feature(s) in determining the class
of an object. Traditional feature analysis for classification addresses only classifierspecific discriminatory power: first a classifier is selected, then the discriminatory
power of a feature is proportional to the accuracy of the classifier when it uses that
feature. Traditional feature analysis for classification is thus classifierdriven. This
section presents recent work [36] that provides an alternative, datadriven approach
to feature analysis that is called classifierindependent feature analysis (CIFA). It is
based on the nonparametric discriminatory power of features, defined as the relative usefulness of a feature within a subset in the absence of classifierspecific assumptions. CIFA ranks features by the amount of separation each feature induces
between classes.
This approach is taken because the classifierspecific approach to feature analysis does not address adequately problems such as medical image analysis and
Classifierindependent feature analysis is especially critical when the goal is assisted classification, rather than automated classification. Most of the classification
work to date has focused on automated classification, i.e., replacing a human expert.
Automated classification is intended usually for applications such as assemblyline
quality control, in which an automated classifier can be used to identify defective
(or potentially defective) items. The role of a diagnostic system, however, should be
to assist rather than replace the diagnostician. An example of a system for assisted
diagnosis would be the use of image enhancement techniques to aid in medical
imaging.
Feature analysis for problems such as diagnosis must necessarily be based on
classification, since the goal is still classifying objects. Assumptions specific to a
particular automated classifier should not be made because the diagnostician serves
as the classifier in this system, and the classification information can be captured
only via a learning sample, i.e., a set of correctly classified objects.
Identify
Candidate Feature Set
ClassifierIndependent
Feature Analysis
Learning
Sample
Nonparametric
Discriminatory Power
e.g.
Medical
Diagnosis
STOP
Familiarity, etc
ClassifierSpecific
Feature Analysis
Production
Classifier
System
Choose
Classifier
Data Collection
Measurements
Feature Extractor
Feature Values
Classifier
Assignment of Class Labels
Evaluate Classifier
Performance
STOP
A new metric for nonparametric discriminatory power has been developed that
is called relative feature importance (RFI). RFI uses nonparametric distribution estimation to avoid classifierspecific assumptions. RFI ranks features by their relative contribution to the potential for separation between classconditional joint feature distributions. Thus the fundamental assumption underlying RFI is that proximity in feature space can be used to determine class membership. Note that this
assumption does not mean that proximity in feature space can be used as given
to determine class membership, but rather that it is possible to extract information
from the given features which can then be used to determine class membership
based on proximity.
Only those features within an optimal subset are ranked, to eliminate noise
and redundant features. The rankings within the optimal subset take into account
interactions between features, since the features are not assumed to be independent.
RFI assigns features outside the optimal subset a discriminatory power of zero.
Directly calculating RFI requires several forms of estimation: estimating the
shape of the classconditional joint and marginal distributions, and estimating the
contributions of the initial set of features to separation between the classes. Accuracy of estimation is balanced against the minimization of the assumptions required
to calculate the metric. RFI is always based on the withinclass and betweenclass
scatter matrices of the learning sample; there are alternatives, however, for the indicated distance measures, weighting algorithms, separation criteria, and scatter
matrix formats [36].
5.4.3.3
The idea of nonparametric scatter matrices used by RFI to measure the separation between classes is based on an original extension of the idea of a nonparametric local mean [37], called the local outofclass mixture mean. Though
that work was limited to two classes, the local outofclass mixture mean us used
here to permit extrapolation of their technique to multiclass problems.
5.4.3.4
While the nonparametric scatter matrices used by RFI are as proposed in [37],
with the exception of the local outofclass mixture mean, the algorithm used to
calculate the matrices here is quite different. Fukunaga and Mantock calculated
withinclass scatter first, whitened the data, and went on to calculate betweenclass
scatter using the whitened data. While this technique is theoretically sound for
parametric scatter matrices, the calculations do not hold for nonparametric scatter.
RFI calculates both withinclass scatter and betweenclass scatter using the original
data, leading to a successful measure of separation.
The potential for separation between the classes present in a subset of features is
measured via nonparametric discriminant analysis using the local outofclass mixture mean and calculation technique mentioned above. This first step in calculating
RFI is useful as a standalone classifierindependent feature extraction technique.
Features are processed and extracted in assisted classification, as well as automated
classification; almost all measurements are quantized or processed to some degree.
As a final step in assisted classification, reducing the number of features presented
to the human expert can be helpful.
The algorithm used to estimate the contribution of each original feature to the
potential separation between the classconditional joint feature distributions, the
Weighted Absolute Weight Size (WAWS), derives from the work of Mucciardi and
Gose [38]. Given the eigenvectors and eigenvalues used in discriminant analysis
(nonparametric or parametric), WAWS can be used to rank features within any
given set of features, not just the optimal subset of those features.
At the heart of classifierindependent feature analysis is the calculation of nonparametric discriminatory power. The metric relative feature importance (RFI),
combines the classifierindependent feature extraction algorithm with WAWS to
rank features by their nonparametric discriminatory power.
5.4.5
The goal of classifierindependent feature analysis for classification is to measure the usefulness of the features in the candidate feature set. Nonetheless, classification performance on the learning sample cannot be used in and of itself as a basis
for analyzing the features for several reasons. First, as noted above, it has been
shown that, in the general case, features that optimize classification performance
for one classifier may not perform at all well in another classifier [33]. Indeed,
because one of the uses of classifierindependent feature analysis is to guide the
choice of automated classifier, classification performance is not a good measure
since it would lead to a search through some candidate set of automated classifiers.
More fundamentally, though, classifierindependent feature analysis tries to measure the potential for discrimination between classes of the features in the candidate
feature set, which potential may not be realizable in practice.
Once classification performance has been eliminated as a measure of usefulness, what remains is the separability between the classes. Separability is not subject to the theoretical constraints of classification performance. When expressed
as Bayes error, the separation between classconditional joint feature distributions
places a lower bound on classification error that is classifierindependent. Unfortunately, Bayes error is not calculable for many problems. Nonetheless, separation
between classconditional joint feature distributions gives rise to the potential for
classification. Issues of calculation aside, classifierindependent feature analysis
uses separability between classes as the basis for the usefulness of a feature.
Traditional feature analysis techniques have either searched for the optimal subset of features (generally called feature selection), or measured the usefulness of the
features independently from one another (generally called feature ranking). A principal difficulty with the traditional approach is that the information given by feature
selection and feature ranking algorithms can be contradictory [33].
In measuring discriminatory power, the goal is to measure the relative usefulness of each feature within a subset (whether the whole candidate feature set or
some proper subset of it), given the use of the other features. Figure 5.2 illustrates
a series of related problems in which the relative usefulness (separability) of the
individual features change. Note that while these problems can be solved using the
marginal distributions of the features, problems which cannot be solved using the
marginals can be constructed. Figure 5.3 illustrates a series of problems which cannot be solved using the marginals. Due to the difficulty in determining the correct
ranks for the features, however, problems such as those in Fig. 5.3 are not useful in
designing and comparing metrics for nonparametric discriminatory power.
Discriminatory power aims at revealing the underlying structure of the data,
rather than simply optimizing classifier performance. A researcher in medical diagnosis, knowing that features based on blood tests generally rank high in diagnosing a certain condition, may want to focus limited resources on searching for
new, better features based on blood tests. These sorts of decisions are currently being made based on human interpretation of lowdimensional projections and mapping techniques [39]. Measuring discriminatory power provides a powerful tool for
managing highdimensional feature spaces.
5.4.5.2
Classification error is fundamentally a function of the separation between classconditional joint feature distributions. Increasing separation provides the opportunity to improve performance. A classifier can realize this opportunity only if its
specific structure fully exploits the increased separation, which makes classifierspecific discriminatory power dependent on that structure. For example, a Bayes
linear classifier will not be able to exploit the difference in discriminatory power
feature 2
1
300 samples
300 samples
1
feature 1
300 samples
300 samples
2
}}
2 1
}
}
(a)
feature 2
1
300 samples
300 samples
1
300 samples
feature 1
300 samples
2 1
}
2
}
2
}}
(b)
Figure 5.2: A problem with two features with multicluster uniform distribution. (a) Classes
1 and 2 are completely separable under feature 1. Feature 2 is redundant, because it
does not contribute any further discriminatory information to the feature set. (b) The center
clusters are ambiguous under feature 1; therefore, perfect discrimination between classes
is no longer possible. Feature 2 still does not contribute any information to the feature set
not already captured by feature 1, and thus remains redundant. (Continued in next page.)
1
300 samples
75%
overlap
}}
300 samples
feature 1
300 samples
300 samples
50% overlap
(c)
feature 2
}
}
1
300 samples
2
300 samples
1
feature 1
300 samples
300 samples
25%
overlap
2
50% overlap
(d)
Figure 5.2: (Continued.) (c) The center clusters are ambiguous under both features, and
each feature adds some separability. Thus both features are useful, but feature 1 is more
useful (has higher nonparametric discriminatory power) than feature 2 since it induces more
separability. (d) The center clusters are ambiguous under both features, and each feature
adds some separability. Feature 2 now provides more separability than feature 1. The
discriminatory power of the features is a function of the relative percentage overlap of the
center cluster for each feature.
class 1:
class 2:
feature 2
feature 3
Figure 5.3: A relatively simple problem which cannot be solved using methods based on the
marginal distributions of the features. Two classes of two clusters each with three features
(feature 1, not shown, is noise, N(0,1) for all four clusters). Features 2 and 3 are Gaussian
within each cluster with means as shown. Feature 1 should be discarded. When features
2 and 3 have equal variance, they are equally useful. When features 2 and 3 have different
variances, their usefulness is different.
between features 1 and 2 in Fig. 5.3. In contrast, the nonparametric discriminatory power of a feature is defined as the amount it contributes to the potential for
separation between the classconditional joint feature distributions.
5.4.6
In practice, classifierindependent feature analysis has a number of applications. The first step is to identify a set of candidate features. A learning sample
is collected using those features. The nonparametric discriminatory power of the
features is measured. In some applications, the discriminatory power is the desired
end result (e.g., the use of focus group information in product development). New
features may be generated on the basis of the discriminatory power of the old features in an iterative fashion. In applications requiring an automatic classifier, when
the separation between classconditional joint feature distributions (the theoretical
lower limit to classification error) is reduced to an acceptable point, a classifier
which best exploits the useful features is chosen. The features may be further processed using classifierspecific feature selection and extraction techniques. A trial
application of the production classifier system is implemented, and classification
performance is estimated. If the performance of the system is not satisfactory, a
different classifier may be tried, or the whole process may iterate.
Because classifierindependent feature analysis does not make classifierspecific
assumptions, problems with a wide range of characteristics should be considered.
Features may have mixed distributions and multiple clusters. Noise features (features which contribute no classification information) and redundant features (fea
Discriminant analysis can be used to extract features that maximize the ratio
of the separation between classes to the spread within classes, as measured by the
betweenclass and withinclass scatter matrices. Withinclass scatter is a measure
of the scatter of a class relative to its own mean. Betweenclass scatter is a measure of the distance from each class to the mean(s) of the other classes. Withinclass
and betweenclass scatter can be defined parametrically or nonparametrically. Parametric scatter matrices use the learning sample to estimate the distributions of the
features through estimation of parameters for an assumed distributional structure.
Nonparametric scatter matrices use the learning sample to perform local density
estimation around individual samples, and then measures scatter using the local
density estimates.
Parametric scatter matrices The parametric versions of the withinclass and
betweenclass scatter matrices estimate the means of the classes based on the entire
learning sample. The parametric versions assume that a distribution can be characterized by its mean and covariance. Let be the a priori probability of class # ,
be the covariance matrix and $ be the mean of class # , be the total number of
%
$ $
#
where
$
%
$ $ $ $
$
$
The components of the betweenclass scatter matrix are estimated using the
learning sample in the same manner as the withinclass scatter matrix.
&
&
$
$
&
When & , the local mean reduces to the parametric mean, and therefore the
nonparametric withinclass scatter matrix reduces to the parametric version. Nonparametric betweenclass scatter is measured as the scatter around the outofclass
mixture means:
The betweenclass nonparametric scatter matrix does not reduce to its parametric form as does the withinclass, because the outofclass mixture means necessarily exclude same class samples, but the relationship is close when & .
Choosing a distance metric The use of the & nearestneighbor local density estimates introduces the need to choose a distance metric for determining the distance
between a sample and its neighbors. Many distance measures have been proposed
for use with kNN error estimation [43]. Two commonly used metrics are the Euclidean distance:
Fukunaga and Mantock used Euclidean distance in their original work. Mahalanobis distance should also be considered (especially using Fukunaga and Mantocks original algorithm), since it incorporates information concerning the relative
variances of the features. Both metrics are considered as candidates in the design
of RFI.
Weighting factor
'
'
'
and
'
Unlike Fukunaga and Mantocks algorithm, RFI always calculates betweenclass scatter using the original data. Fukunaga and Mantock whiten the data first,
using and (the eigenvalues and eigenvectors, respectively, of %
), then calculate betweenclass scatter. For parametric discriminant analysis, calculating betweenclass scatter using the original data, ) , and calculating it using whitened
data:
*
$
$ $ $
)
%
%
For nonparametric betweenclass scatter (even without weights), however, calculating in the whitened space rather than in the original space changes the results.
Consider
:
&
and therefore
Thus, for both the parametric and the nonparametric forms, the eigenvectors
form the linear transform which maximizes , , the ratio of the betweenclass to
withinclass scatter. The extracted features are optimal in the sense that they maximize separation between the classconditional joint feature distributions in the rotated space.
Using the nonparametric scatter matrices, the extraction is based on local density estimation. Thus the results are a compromise between information provided
in the various clusters or regions belonging to a class.
5.4.7.6
Although developed in the context of a classifierindependent approach to feature analysis, the feature extraction algorithm given below is useful and valuable in
its own right. When used as an extraction technique, the eigenvectors corresponding to the lowest eigenvalues are dropped, resulting in a reduction of computational
costs, and potentially an increase in computational performance, depending on the
classifier chosen. As illustrated in Fig. 5.1, a second round of classifierspecific
feature optimization may still be desirable once a classifier is chosen.
where
&
'
where
&
and
'
5.4.7.7
Invariance
Because nonparametric discriminatory power measures the potential of the features for inducing separability between classes, it is desirable that measures of nonparametric discriminatory power be invariant with regard to rotation, scaling, and
shift of features. Rotational and shift invariance eliminate the impact of irrelevant
details of the measurement method for the features. Scale invariance eliminates the
need for normalization of the features while preserving the critical information of
the ratio of betweenclass to withinclass scatter. The invariance considered here
is in feature space, not in image space. And, although we seek image features
that are themselves invariant to changes in the image, it is equally valuable that
they maintain separability of image classes even when modifications are made in
feature space.
RFI is a function of the eigenvalues and eigenvectors of the parametric and
nonparametric scatter matrices. While the nonparametric scatter matrices are not
as well understood as the parametric scatter matrices, the nonparametric forms are
still symmetric. Therefore, functions of eigenvectors and eigenvalues retain the
same properties for both parametric and nonparametric scatter matrices.
Rotational invariance results from the extraction technique; since the optimal
features are extracted from the original features, rotation in the original feature
space has no impact. Scale invariance results from the use of the ratio of betweenclass to withinclass scatter; since both withinclass and betweenclass scatter are
equally affected by scaling a feature, the ratio removes the effects of scaling. Shift
invariance results from the use of scatter around the means, therefore the technique
is selfcentering.
All three forms of invariance reduce to the issue of preserving class separability, which is invariant under any nonsingular transformation (including rotation,
scaling, and shift) [44]. Those transformations affect separability in the individual features (i.e., in the marginal feature distributions), but not between the classes
themselves. Thus, so long as none of the extracted features is discarded, RFI is
invariant.
5.4.8
The next step in calculating RFI is to find the optimal subset of the original
features. Finding the optimal subset is necessary because rankings are meaningful
Given the presence of redundant features, more than one subset of the same size
may produce the same amount of separation. When two or more smallest subsets
produce the same amount of separation, and that separation is the maximum separation found, then more than one optimal subset exists. The presence of more than
one optimal subset is not a problem; in both assisted and automatic classification,
it offers more options in the design of the classification system.
5.4.8.2
The criteria commonly used in parametric discriminant analysis to find the optimal subset of features are not generally applicable for the nonparametric case.
Criteria such as the trace of the ratio of the betweenclass to withinclass scatter
matrices are based on the same simplifying assumptions as the parametric scatter
matrices. The trace, when calculated on parametric scatter matrices, is monotonic
as a function of subset size, reflecting the theoretical assumption that Bayes error
also decreases monotonically as a function of subset size.
Under conditions of limited sample size, however, the monotonicity assumption does not hold even for wellbehaved data sets with unimodal Gaussian distributions, if the true distributions are not known and must be estimated. As the
number of features increases for a fixed sample size, so does the error in the estimation. A second concern is the cost of including each feature, in computer time,
in complexity, and sometimes in degree of invasiveness or risk, as can be the case
in medical diagnosis. In practice, whether for automatic classification or assisted
classification, having more features is not always better.
A nonparametric approach is to select the optimal subset based on the & nearestneighbor error in the extracted space. The & error does not introduce any new
classifierspecific assumptions. Moreover, because & error is asymptotically at
most twice the Bayes error, calculating & error in the extracted space estimates
the theoretical lower limit on the potential classification error [45]. Using &
introduces a new parameter (& , the number of nearest neighbors used to calculate
). Fortunately, as discussed above, the behavior of & is well understood. Note that
while the value of & used to calculate the & error does not need to be the same
as the value of & used to calculate the local outofclass mixture mean, the experiments presented in this research all have both & set to the same value. Thus the
& error is defined as
where the summation is over all classes, is the number of samples, and is
defined as the number of misclassified samples from class
when calculating class
based on the voting & nearest neighbor procedure, using the transformed data. This
formulation allows for the introduction of a cost factor if errors in all classes are
not of equal consequence.
5.4.8.4
Exhaustive search
Finding the optimal subset requires exhaustive search, since any nonexhaustive technique can do arbitrarily poorly in the general case [46]. The assumption
of monotonicity, necessary for branchandbound algorithms to guarantee performance, is extremely restrictive and rarely justified in real problems [47]. Whenever
possible, exhaustive search should be done. For the purposes of evaluating different configurations of RFI, or for comparing estimators for nonparametric discriminatory power, exhaustive search is required. When applying RFI directly to real
problems which are too large to execute exhaustive search, suboptimal techniques
must be used. Both genetic algorithms and floating search offer promising paths
for suboptimal search [48]. The estimator as first proposed [49] uses a genetic
algorithm.
5.4.9
The final step of RFI is to rank the features within the optimal subset. Only the
portion of the learning sample contained within the optimal subset is used. If more
than one optimal subset exists, ranking is done separately for each optimal subset.
RFI ranks features based on the contribution of the original features to the separation in the rotated space. The contribution of the original features to the sepa
A technique for ranking features using only the eigenvectors was proposed
by Mucciardi and Gose [38]. The technique, the Average Absolute Weight Size
(AAWS), averages the magnitudes of the eigenvectors to estimate the contribution
of the original features to separation. For the
th of / features, where is the
weight for feature
in the th extracted feature, 0 % is given by:
AAWS
/
/
Mucciardi and Gose also tried sorting the eigenvectors by eigenvalue, retaining
only those extracted features which accounted for some threshold amount of separation in the extracted space and recomputing AAWS using only the retained extracted features. Both techniques, however, use only the eigenvector information.
Thus, AAWS measures the contribution of each original feature to the extracted
space, rather than to separation in the extracted space. Eliminating the extracted
features which contribute the least separation in the extracted space includes separation information, but introduces a tuning parameter which must be set ad hoc.
5.4.9.2
The contribution of the original features to the separation in the extracted space
can be estimated without tuning parameters by using the Weighted Absolute Weight
Size (WAWS). WAWS uses the normalized eigenvalues to measure the contributions of the original features to the extracted features by the proportion of separation the extracted features contribute to separation in the extracted space. The
WAWS of feature
is:
WAWS
/ 1
Deriving the ranks from the raw WAWS values: statistical model
Features with statistically distinct WAWS values are given different ranks. To
determine whether WAWS values are distinct, a randomized block analysis of variance (ANOVA) is performed, and intervals constructed around the differences between treatment means using the multiple comparisons formula. Each feature is
thus a treatment, and each data set (optimal subset only), a block.
The null hypothesis, that there is no difference between treatments, is tested
[36].
Features with intervals around the differences from all other features are given
distinct ranks. Groups of features in which some features have distinct WAWS
values, but others do not, are given a single rank. For example, if features 2 and 3
have distinct WAWS values, but feature 4 overlaps both features 2 and 3, all three
features are assigned a single rank. Features not in the optimal subset have rank
zero. Features (or groups of features) with distinct ranks are ranked based on their
treatment means, with the largest distinct treatment mean being assigned the highest
rank. Higher ranks indicate greater discriminatory power. Additional details of the
process appear in [36], along with the results of extensive experimentation.
5.5
5.5.1
Features in practice
Caveats
Much of this section is taken from recent work [50] that has made clear the very
real possibility of identifying tissue types reliably using the observed ultrasound
radiofrequency (RF) signal (see also Chapter 9). A candidate set of features was
selected, based in part on knowledge of the scattering properties of various kinds
of tissue; the features were extracted from a set of labeled samples and evaluated
in several classifier structures. The specific objective was to provide, in data from
the liver, reliable discrimination between hepatitis and normal tissue; a secondary
goal was to reduce the dependence of the features on the imaging system used to
acquire the data.
This section introduces and describes several new parameters that represent
specific characteristics of ultrasound scattering from soft tissues. In the spirit of
Texture parameters
.
,
,
0.04) at the task of distinguishing normal from hepatitis livers. For the current
texture analysis approach, the ENT feature has been retained and the correlation
of the cooccurrence matrix (COR) feature has been added based on the indirect
relationship to physical parameters as reported by Thijssen et al. [53].
Using the transducer to define relationship operator The distances, and
(that define the relationship operator ), to the neighbor pixel should be chosen to
reduce the effects that the transducer characteristics have on the extracted parameters. At the same time the chosen values of and must produce parameters that
distinguish various textures. Previous researchers have used fixed values of and
. Kadah et al. [56] defined the neighbor as = 4 and = 0. Raeth et al. [57]
defined a set of neighbors using all combinations of = 2, 3, and 4 and = 2, 3,
and 4. Mia et al. [55] defined the fixed neighbor distance as = 10 and = 3.
The work by Valckx and Thijssen [58] provides some guidance on how to
choose an effective neighbor distance. They found that two factors affected the
optimal choice of the displacement used in computing the cooccurrence matrix
parameters. As the distance separating the neighbor pixels decreased, the features
provided better discrimination between various textures. This suggests that a small
displacement would be optimal. The texture parameters, however, are very sensitive to the characteristics of the transducer at small displacements. The parameters
tend to become independent of those factors when the displacement is greater than
a resolution cell size (6 dB width) of the transducer. This suggests that a large
displacement would be optimal. Given these competing requirements, Valckx and
Thijssen [58] suggest that the optimal displacement would be the smallest displacement at which the parameters are independent of transducer effects.
Following this guideline, the definition of neighbor used in this work was =
resolution cell size in the axial direction and = resolution cell size in the lateral
direction. The resolution cell size of the transducer is measured by computing the
autocorrelation function of the envelopedetected signal. The maximum of this
function occurs at the zero lag point. The fullwidth at halfmax (FWHM) distance
of the autocorrelation function is used as the estimate of the transducer resolution
cell size. This is illustrated in Fig. 5.4. The autocorrelation function is computed
at various depths to account for changes in the resolution cell size due to diffraction
effects. The resulting values of and at each depth are used to compute the
entries in the cooccurrence matrix associated with pixel pairs at that depth.
Simulation data described in [50] were used to validate the hypothesis that using the transducer resolution cell size as the definition of neighbor when comput
Feature
ENT
COR
5.5.2.2
Cepstral parameters
Previous work by Fellingham and Sommer [59] and Suzuki et al. [60] suggests
that the mean scatterer spacing is a feature of the scattering media that may be
useful in distinguishing diffuse diseases. The cepstrum (specifically, the power
cepstrum) has been used by Suzuki et al. [60], Wear et al. [61], and Kadah et al. [56]
to detect regularly spaced echoes in the RF signal. Those echoes were attributed to
regularlyspaced scatterers in the tissue. Thus, the spacing (in time) of those echoes
in the RF signal are related to the spacing (in distance) of scatterers in the tissue.
The power cepstrum, first developed by Bogart et al. [62], is defined as the
inverse Fourier transform of the logarithm of the magnitude spectrum:
2/ 3
) # where ) # / 3
(5.1)
Where / 3 is the Fourier transform operator and 2/ 3 is the inverse Fourier
transform operator. If is a real sequence then ) ' is a real even sequence,
and is a real sequence. So the power cepstrum is a real sequence when the
input is a real sequence (the term power refers to the fact that logarithm is taken of
the power spectrum). When the magnitude of ) # is found prior to computing the
logarithm, the phase information in the original signal is discarded. A consequence
of discarding the phase is that the power cepstrum is not invertible. Details of the
calculation are contained in [50].
One drawback of the power cepstrumbased approach to estimating scatterer
spacing is that it disregards any information that is contained in the phase of the
signal. The success of Varghese and Donohue [63, 64] in estimating scatterer spacing using the autocorrelation of the frequency spectrum, which does utilize the
phase information, suggests that there is some advantage to retaining the phase
information when processing the RF signal.
Mia et al. [65] suggested using the complex cepstrum to identify periodic scattering. The complex cepstrum is an approach in which both the magnitude and
the phase of the time domain signal are considered. The complex cepstrum, as
If the input signal is real, then the magnitude of the Fourier transform,
[and 45 ] is even and the phase, 6, is odd. So the complex cepstrum, , is
real and even. The term complex refers to the fact that the logarithm is performed
on a complex value. An advantage of performing the complex logarithm is that
the phase information of the original signal is retained and the complex cepstrum,
, is invertible.
While the presence of regularly spaced scatterers is a sufficient condition to
cause peaks in the cepstrum, it is not a necessary condition. Kuc et al. [67] showed
that various distributions of scatterer spacing can result in peaks in the cepstrum.
Even when regularly spaced scatterers are present, they are expected to be relatively
weak, and thus, not easily detected. This detection can be improved by signal
averaging.
Consider the RF signal from a single Aline scan as shown in Fig. 5.5. The cepstrum of this signal is shown in Fig. 5.6. The periodic component of the scattering
is not evident in this figure. But when of these spectra, from adjacent Alines
within an ROI, are summed, the signal due to the periodic scatterers increases
by a
due to the random scatterers increase by a factor of .
factor of while those
This provides a gain of for the periodic component. The averaged cepstrum
( ) is shown in Fig. 5.7. The effect of the periodic scatterers is clearly visible
as a peak in the cepstrum.
Simulation and phantom data [50] were used to quantify the performance of the
complexcepstrum approach to scatterer spacing estimation. The simulation data
can be controlled well and the actual values are known precisely. The phantom
data studied by Wear et al. [61] were used to provide a more realistic test.
To compare the relative effectiveness of the power and complex cepstra at the
task of estimating scatterer spacing, an objective performance measure is needed
to judge the effectiveness of a given approach. The task is to detect the main peak
(harmonics are also present) in the cepstrum shown in Fig. 5.7. The detectability
of those peaks under various conditions will serve as a measure of effectiveness for
each approach. One objective measure of the detectability of a peak in the presence
of noise is the number of standard deviations separating the peak from the mean
value. This is the basis of the constant false alarm rate (CFAR) detection technique,
as described by Skolnik [68], that is commonly used in radar applications. The
objective measure of performance is the signal excess (SE = [peakmean]/standard
deviation) in units of standard deviations. This is illustrated in Fig. 5.8.
Several parameters that are not directly influenced by characteristics of the
transducer are extracted from the complex cepstrum. The first parameter is the
Figure 5.5: The RF signal of a single Aline from within a ROI of a liver scan.
weighted average of the peaks. It is more likely that components of biological tissues, such as the portal triads in the liver, have a range or several ranges of scatterer
spacing rather than a single dominant spacing. For this reason, the cepstrum is
searched in the range 0.752.5 mm for all peaks that have signal excess greater
than 1.5 standard deviations. The magnitudeweighted average of all such peak
locations is a cepstral parameter called PCEP. The average magnitude of all such
peaks, normalized by the mean of the cepstrum, is a second cepstral parameter
called MCEP. The final parameter extracted from the complex cepstrum is the ratio
of the energy in the lowquefrency 1 portion of the cepstrum to the entire cepstrum.
This parameter, called RCEP, is a measure of the proportion of the backscattered
power that is due to unresolvable scatterers. The low quefrency portion of the
cepstrum represents reflections that are close together spatially, and thus are not
resolvable by imaging system.
5.5.2.3
Phase coherence
Figure 5.6: The complex cepstrum of the RF signal of the single Aline scan shown in
Fig. 5.5.
In the case of purely diffuse scattering, there are a large number of scatterers
within the resolution cell of the transducer. The resulting RF signal is the accumulation of all the random reflections from within a resolution cell. When the number
of scatterers per resolution cell is sufficiently large and the phases of the individual
reflections are randomly distributed between and , the phase of the resulting
(accumulated) signal is uniformly distributed between and . If, however, there
is structure in the scattering medium, then the reflections from those sites will have
some nonrandom phase relationship. The presence of longrange order results in
some phases occurring more frequently than others do and will be evident in a
histogram of phase.
Structures with longrange order will result in coherent scattering of certain frequency components of the ultrasound pulse. The wavelength, 7, of each frequency
component, , is related to the speed of sound, , in the tissue, and structures that
are located at integral multiples of a wavelength will produce coherent scattering of
frequencies associated with that wavelength. The relative amount of coherent scattering present in the RF signal is characterized by analyzing the phase distribution
of the RF signal at various frequencies. This analysis can be performed only for
frequencies that are within the usable bandwidth of the transducer.
The first step in this process is to use a Hilbert transform to produce the analytic
signal (real and imaginary components) of the RF signal. The analytic signal then
is demodulated at the frequency of interest. This is accomplished by multiplying
the analytic signal by a complex phasor at the demodulation frequency. Next, the
Figure 5.7: The complex cepstrum averaged from the RF signal of 27 adjacent Aline scans.
Figure 5.8: Illustration of the computation of SE = [PeakMean]/Std Dev. The shaded region
in the image represents 1 Std Dev about the mean.
5.5.2.4
Figure 5.9: The normalized phase profile of a simulated incoherent signal resulting from
the presence of diffuse scattering components only.
a sparse cooccurrence matrix or phase profile may not provide valid measures.
Similarly, the small subregions do not provide enough data for the prediction filters in the Wold decomposition. The point SNR (mean/standard deviation) of the
envelope signal of each subregion is used as a simple substitute for a texture measure. Wagner et al. [52] showed that the point SNR takes on a value of 1.91 for
purely diffuse Rayleigh scattering and deviates from that value as other scattering
components are introduced.
The coefficient of variation of the three cepstral parameters and the point SNR
are computed to yield, respectively, the following four intrapatient variability features: VPCEP, VMCEP, VRCEP, and VSNR. The simulation experiments used to
analyze the parameters described previously in this chapter are not useful in evaluating these intrapatient variability features.
The new parameter extraction methods produced a set of 12 features that were
used in the analysis of the clinical data. Simulation experiments indicate that using
the resolution cell size of the transducer to define the distance to the neighbor pixel
in the cooccurrence matrix computation indeed does increase the reproducibility
of the texture features ENT and COR, as compared to using a fixed distance.
For the cepstral features, it was shown that peaks in the cepstrum, resulting from
repetitive structures in the RF signal, are more detectable, under a variety of signal
conditions, when the complex cepstrum is used instead of the power cepstrum.
Simulation [50] showed that the parameters extracted from the cepstral analysis
(PCEP, MCEP, and RCEP) exhibited different levels of correlation between two
Figure 5.10: The normalized phase profile of a simulated signal containing coherent and
incoherent components resulting from the presence of diffuse and periodic scattering components.
transducer settings (a useful first step in the quest for transducer independence);
PCEP was fairly highly correlated, RCEP was moderately correlated, and MCEP
was weakly correlated.
The phase profile was used to compute the two phase coherency features MPRF
and MVPRF. The RWLD feature was computed from the Wold decomposition of
the scattering field estimated via the deconvolution of the RF signal. The deconvolution of the RF signal was performed using an approach modified from the cepstral
mean subtraction technique commonly used in speech processing applications [50].
5.5.2.5
Resultsfeature selection
The parameters described above were extracted from the RF ultrasound signals
collected from each subject in the clinical data sets. The first step in evaluating
the parameters performance in the task of distinguishing normal livers from those
with hepatitis is to determine which parameters to use in the several designs of
the classification system. The subset selection techniques described earlier (and
in [36]) were not available at the time that these experiments were performed, and,
in light of the Hughes phenomenon, it was nevertheless essential to choose subsets.
The clinical data sets used in this work were relatively small. The data set from
machine D includes 36 normals and 50 cases of hepatitis. The machine A data
set includes 37 normals and 19 cases of hepatitis. The dimensionalities of the
Figure 5.11: The coefficient of variation of the phase profile, as a function of the demodulation frequency, for a simulated image containing diffuse scatterers only.
Figure 5.12: The coefficient of variation of the phase profile, as a function of the demodulation frequency, for a simulated image containing diffuse and periodic scattering components.
ture combinations (those that provided good classification performance with data
set D and had reasonable correlation between the two imaging systems) then were
evaluated on data set A to identify feature combinations that provided good classification performance with both data sets.
5.5.2.6
Resultsclassifier performance
The parameters described above were extracted from all of the regions of interest (ROIs) or subregions associated with each subject in data set D. The values of
the first eight of the twelve parameters were averaged over the six ROIs available
for each subject to produce a single feature value for each of the eight parameters.
For the remaining four parameters, the coefficient of variation was computed of the
feature values obtained from all of the subregions associated with each patient. A
correlation matrix was computed for the set of twelve features.
Using a correlation value of 0.75 as the threshold for redundant features, there
are two pairs of features that meet that criterion. The ENT and COR features have
a correlation coefficient of 0.86; the COR and MPRF features have a correlation
coefficient of 0.90. One or more of those three features can be eliminated without
significant loss of information. The decision of which feature(s) to eliminate can
be aided by computing the Mahalanobis distance for each feature.
The Mahalanobis distance is a measure of the separation between the means
of a feature (normalized by the standard deviations) computed for the two classes.
Feature
ENT
COR
PCEP
Method
Texture
Texture
Cepstral
MCEP
Cepstral
RCEP
Cepstral/Coherence
MPRF
Phase coherence
MVPRF
Phase coherence
RWLD
Coherence
VPCEP
Intrapatient
variability
Intrapatient
variability
Intrapatient
variability
Intrapatient
variability
VMCEP
VRCEP
VSNR
Description
Entropy of cooccurrence matrix
Correlation of cooccurrence matrix
Weighted average of locations of
cepstral peaks
Weighted average of magnitudes of
cepstral peaks
Ratio of lowtohigh portion
of cepstrum
Normalized magnitude of peak of
phase profile
Max of coefficient of variation of
phase profile
Ratio of predictable and random
components of Wold decomposition
of deconvolved RF signal
Intrapatient variability of PCEP
Intrapatient variability of MCEP
Intrapatient variability of RCEP
Intrapatient variability of
point SNR of envelope
While a low value does not necessarily mean a feature provides no separation between the two classes (separation may still be provided using a quadratic or other
more complex classifier), a high value is a good indication that the feature will
provide good separation. Of the three features identified as having high mutual intercorrelation, the MPRF feature has the largest Mahalanobis distance. This feature
is retained for further analysis. The feature COR, highly correlated with MPRF, is
eliminated from the analysis. The ENT feature, also correlated with COR, is retained for further analysis. This leaves eleven features from which to select feature
combinations that perform well.
The leaveoneout (design with features and test with the th; perform
times) and resubstitution (design and test with all samples) test methods [44] were
used to measure the classification performance of linear, quadratic, and & nearest
neighbor (& ) classifiers at the task of discriminating between the normal and
Classifier
Design
Linear
Quadratic
kNN
Best
ThreeFeature
Combination
MPRF MVPRF
VRCEP
PCEP MPRF
VRCEP
RCEP MPRF
RWLD
Leaveoneout
Performance
Resubstitution
Performance
0.89
0.03
0.91
0.03
0.90
0.03
0.92
0.03
0.93
0.03
Not Applicable
hepatitis cases in data set D. An exhaustive search of the 11 singlefeature, 55 twofeature, and 165 threefeature combinations, resulting from the eleven remaining
candidate features, was performed to identify specific combinations that provided
good performance.
The classification performance as measured by the area under the receiver operating characteristic (ROC) curve, , was computed for all the feature combinations discussed above. (See Chapter 10 of this volume for a discussion of ROC.) A
threshold level was set at
0.82 to identify feature combinations that yielded
good classification performance. No single feature provided classification performance above this level. Almost half of the threefeature combinations (80 out of
165) provided classification performance of
0.82 using one or more of the
classifier designs. The best threefeature performance was dependent on the classifier design that was used. The best performing threefeature combination for each
classifier design is listed in Table 5.5.2.6. The ROC curves resulting from the three
leaveoneout test conditions are shown in Fig. 5.13.
While the bestperforming feature combinations selected above may be biased
due to the selection process 2 , the fact that almost half the threefeature combinations resulted in classification performance of
0.82 indicates that data set D
is indeed separable using a threefeature classifier.
Even though many of the threefeature combinations provided good classification performance with data set D, to design classifiers that will work across imaging
2
Performing exhaustive search of all feature combinations results in selecting the optimal subset
but can also result in a selection bias as described by Raudys and Jain [72]. The bias results from the
fact that the measured classification performance of each feature combination is an estimate of the
actual performance of that feature combination. Each estimate deviates from the actual performance
by some estimation error that can be assumed to be Gaussian distributed without loss of generality.
Some estimates will underestimate the performance; others will overestimate the actual performance.
TPF
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
FPF
0.8
Figure 5.13: ROC curves for a linear (solid line), quadratic (dashed line), and kNN (dotted
line) classifier resulting from leaveoneout testing of the D data using the best threefeature
combination for each classifier design. There are no significant differences between the
three ROC curves.
systems using the limited data available required that the dimensionality be reduced
to twofeature combinations. This made it possible to identify feature combinations that also provided good classification performance for the much smaller data
set A. Only 13 out of the possible 55 twofeature subsets produced classification
performance exceeding the threshold level of
0.82 for one or more classifier
designs. Those were evaluated in the same way as the threefeature sets, yielding
the results shown in Table 5.5.2.6.
5.5.2.7
Conclusions
Classifier
Design
Linear
Quadratic
kNN
Best
Twofeature
Combination
MPRF MVPRF
PCEP VRCEP
PCEP RWLD
Leaveoneout
Performance
Resubstitution
Performance
0.88
0.04
0.85
0.04
0.86
0.04
0.89
0.04
0.85
0.04
Not Applicable
the matched data set, was quite poor. Some of the features did exhibit a moderate
level of correlation that was significantly higher than other features. None of the
features, however, was highly correlated across the two imaging systems, despite
the fact that the characteristics (center frequency and bandwidth) were similar for
the two transducers. Other characteristics of the transducers, though, were quite
different. It is not clear how much of the lack of correlation can be attributed to
differences in the imaging systems and how much to the difficulty in acquiring the
same regions of interest, in the same image plane, for successive in vivo ultrasound
scans in a clinical environment.
Despite the lack of good correlation between imaging systems, several of the
twofeature combinations provided reasonable classification performance for both
data sets. One feature combination in particular RWLD and VRCEP produced very good classification performance, using a simple linear classifier design,
for both the D ( = 0.86
0.04) and A ( = 0.86
0.06) data sets. While that
feature combination provides good classification performance for both data sets,
it does require the classifier to be trained separately with those two features from
each data set.
All the classification analysis indicates that a simple linear classifier design performs as well as or better than the more complex quadratic and & classifier
designs at the task of distinguishing normal livers from cases of hepatitis using features extracted from ultrasound RF signals. No significant improvement is achieved
through the use of the more complex classifier designs. The features were the key
element, and the search continues for ultrasoundsystemindependent criteria.
5.5.3
Breast MRI
This section is based on recent work that has shown that fractalbased features
properly defined can add substantially to the characterization of shape and its
value in classification of breast masses [73]. The motivation again is the nature of
the tissue and what it suggests as basic descriptive information.
1
0.8
TPF
0.6
0.4
0.2
0
0.2
0.4
0.6
0.8
FPF
Figure 5.14: ROC curves resulting from leaveoneout testing of the D (solid line) and A
(dashed line) data sets using a twofeature (RWLD and VRCEP) linear classifier.
5.5.3.1
Background
Test data
Test data were regions of interest (ROIs) from 16bit MR images of focal
masses of the breast. All images were acquired with a threedimensional fatsuppressed, radiofrequency, spoiled gradientecho sequence on a 1.5T system. The
images used in this study were obtained during the first 90 seconds after the delivery of a 20ml bolus of a gadoliniumbased contrast agent, gadopentetate dimeglumine. The resulting images consist of 512x512x28 pixels and were obtained in the
sagittal plane from an acquisition matrix of size 512x256x32. For each test case,
we used a single, representative twodimensional section. The field of view ranged
from 16 to 22 cm, and slice thicknesses ranged from 1.5 to 4.0 mm, depending on
the size of the breast. The ROIs were identified and defined to be rectangular, with
the focal mass approximately in the center of the rectangle. The smallest ROI was
24x27 pixels; the largest was 89x102 pixels.
The test cases, which included 20 benign and 32 malignant masses, were obtained by essentially the same procedures as in the NunesSchnall study [81]. Fortyeight cases were selected on the basis of the availability of recorded expert architectural features and suitability of the ROI size for fractal analysis. Four other cases
were identified as having border characteristics that made them difficult to diagnose (one case of spiculated benign lesion, two cases of smooth cancers, one case
of lobulated cancer) and were added to the test data. The border characteristics of
the masses in the study are given in Table 5.5.
5.5.3.3
Algorithms that are used to estimate the fractal dimension generally operate on
threedimensional surfaces derived from grayscale images. The (x,y) coordinates
correspond to the spatial location of each pixel; the (z) coordinate represents the
Smooth
Lobulated
Irregular
Spiculated
Malignant masses
2
2
13
15
Benign masses
4
11
4
1
Discrimination analysis
Feature
Border
Rim enhancement
Septation
Signal intensity
Density
Descriptor
Smooth
Lobulated
Irregular
Spiculated
Negative
Probable
Definite
Definite
Probable
Negative
None
Minimum
Moderate
Marked
Homogeneous
Heterogeneous
Numerical assignment
0
0.33
0.66
1.00
0
0.5
1.0
0
0.5
1.0
0
0.33
0.5
1.0
0
1.0
refers to the subset consisting of all five NunesSchnall features and BRID refers to
the subset of features consisting of border, rim enhancement, signal intensity, and
density.
An ROC was constructed for each subset of features. Operating points that
determine the ROC are established by selecting thresholds that are assumed to separate benign lesions from malignant and by plotting the corresponding computed
falsepositive fraction (FPF) and truepositive fraction (TPF) values. As an example, for a threshold of 0.5, any mass with malignancy measure of 0.5 or more is
called malignant and any mass with malignancy measure of less than 0.5 is called
benign. Then the true benign or malignant states, in conjunction with the called
states, are used to compute the FPF and TPF values that determine a single ROC
operating point. A set of threshold values over the range [0,1] is used to generate
the set of operating points that define the ROC curve. Performance was evaluated
in two ways, viz.: (1) from FPF values for ROC operating points with TPF levels in
the range (0.90,1.00), values generally regarded as being clinically important, and
(2) from the area under the ROC curves.
To test improvement in discrimination, the fractal dimension was combined
with the NunesSchnall features and that sets discrimination compared to that
obtained by using only the NunesSchnall features. At most four NunesSchnall
features were used in conjunction with the fractal dimension to limit the fractal
Sensitivity analysis
The robustness of the discrimination generated by the BRIDF features was evaluated when small changes were made to algorithm parameters that affect numerical processing but not the underlying theory. The tested parameters are as follows
(see [89] for details):
a. Max(MN): The algorithm for the fractal interpolation function model (FIFM)
evaluates segments having length MN. The parameter Max(MN) is the largest
segment size that is evaluated and is a determinant of the size of the family of
fractaldimension estimates. Nominal value was 80 pixels; sensitivity analysis was performed over the range of 70 to 90 pixels.
b. Min(Mod): For each evaluated boundary segment, the FIFM method generates a set of fractal models. Min(Mod) is the minimum number of acceptable
models that a segment must generate for the segment to be used. Nominal
value was 10 models; sensitivity analysis included values from 8 to 12 models.
5.5.3.6
Results
The 32 cancers produced three clinically important (0.9 or greater) TPF levels: 0.969, 0.938, and 0.906, corresponding to three, two, and one false negative
results, respectively. (Allowing TPF to be equal to 1.000 often generates an unacceptably high number of falsepositive results.) The three TPF levels were tested
with both the ANN and the logistic regression discrimination models, yielding a
total of six combinations, each of which defined an evaluation criterion. In five
of the six analyses, the optimum discrimination available from the NunesSchnall
features was obtained by using either the full set (BRSID), or the subset (BRID),
that omitted septation. To assess the contribution of the fractal feature, discrimination was computed for the BRIDF feature set. Table 5.5.3.6 shows the results
of the discrimination analysis. Smaller values of FPF are preferred; note that the
performance of the set that contained the fractal feature was better than either of
the other two sets, under all conditions.
The TPF levels shown in Table 5.5.3.6 are inferior to those reported by Nunes
et al. [81]. This is because of the presence of an exaggerated proportion of difficulttoanalyze cases in this study. When, however, those four specially selected cases
are removed from the data set and the analysis rerun, the FPF levels are comparable
to those reported by Nunes et al. [81].
BRIDF
BRSID
BRID
BRIDF
BRSID
BRID
Table 5.5.3.6 shows the importance of the evaluation criteria when determining
which set of expertobserver features should be used to establish the baseline discrimination for comparative analysis. The set BRSID outperformed BRID in three
of the runs; BRID outperformed BRSID in the other three. In all cases, however,
BRIDF performed better than both BRID and BRSID.
The BRIDF improvement in discrimination was also evaluated using areas under both the empiricaldata ROC curve and the smoothed ROC curve as computed
by LABROC1 software [94]. In this analysis only the ANN model was used.
While the area measures use a large number of clinically unimportant operating
points, they present an overall picture of discrimination. Table 5.5.3.6 shows that
the feature set BRIDF provides an improvement in discrimination over each of the
NunesSchnall feature sets with either of the two ROC curves.
CLABROC software was used to evaluate the statistical significance in smoothed ROC areas between the curves generated by BRIDF and BRSID, and also between the curves generated by BRIDF and BRID [95]. Table 5.5.3.6 shows the
results of pairwise analysis using both the area test and the bivariate chisquare
test. It is clear that the addition of the fractal feature led to a significantly greater
ROC area than that provided by either of the two sets of original features.
Robustness of the fractaldimension feature was evaluated by comparing the
discrimination of BRIDF when the fractaldimension algorithm used nominal parameter settings with the discrimination of BRIDF when selected parameters from
the fractaldimension algorithm were perturbed, as described above. Max(MN) had
nominal value 80 and was tested at values 70 and 90; Min(Mod) had nominal value
10 and was tested at values 8 and 12. Table 5.5.3.6 shows the discrimination anal
Feature combination
BRIDFBRSID
BRIDFBRID
Area test
Value Twotailed
2.58
0.010
3.54
0.004
Bivariate 8 test
Value
7.20
0.027
14.39
0.001
BRIDF (nominal)
BRIDF70
BRIDF90
BRIDF8
BRIDF12
ysis result using the ANN model and the TPF values obtained from Table 5.5.3.6.
The small changes in FPF values resulting from the perturbed parameters should
be compared to the much larger differences between BRIDF and the NunesSchnall
sets shown in Table 5.5.3.6.
5.5.3.7
Conclusions
Future developments
Better features mean better screening and better diagnosis. Ideally, features are
easy to compute, perform well in the presence of noise, artifact, and anatomic and
physiologic variation, and are in accord with clinician understanding. As medical
knowledge grows, we can expect that the resulting improved models of anatomic
shape and shape change, and of physiologic function, will lead to better understanding of the overall process of characterizing form and function, from first principles.
Better features will reduce the cost and risk of imaging, as speeds and diagnostic accuracies increase. Larger databases, properly constructed from goldstandard
data, would increase confidence in investigator and user alike that the techniques
are indeed generally applicable. A concomitant growth of interest in, and implementation of, standards for feature measurement and use would ensure that real
reproducibility is achieved in a variety of clinical settings.
5.7
Acknowledgments
References
[1] M. J. Ackerman, Visible human project, Proceedings of the IEEE, vol. 86, pp. 504
511, Mar 1998.
[2] C. W. Chen, W. Lai, F. Y. Fang, and L. Chen, Portal image feature extraction by hierarchical region processing technique, in Proc. 1995 IEEE International Conference
on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century, vol. 4,
pp. 35613566, 1995.
[3] D. S. Fritsch, E. L. Chaney, A. Boxwala, M. J. McAuliffe, S. Raghavan, A. Thall, and
J. R. D. Earnhart, Corebased portal image registration for automatic radiotherapy
treatment verification, International Journal of Radiation Oncology Biology Physics,
vol. 33, no. 5, 1995.
[4] A. Gueziec, P. Kazanzides, B. Williamson, and R. H. Taylor, Anatomybased registration of CTscan and intraoperative Xray images for guiding a surgical robot,
IEEE Transactions on Medical Imaging, vol. 17, pp. 715728, Oct 1998.
[5] R. H. Taylor, J. Funda, L. Joskowicz, A. D. Kalvin, S. H. Gomory, A. P. Gueziec,
and L. M. G. Brown, Overview of computerintegrated surgery at the IBM Thomas
J. Watson Research Center, IBM Journal of Research and Development, vol. 40,
pp. 163183, Mar 1996.
References 337
[22] J. Flusser and T. Suk, Character recognition by affine moment invariants, in Computer Analysis of Images and Patterns. 5th International Conference, CAIP 93 Proceedings, (Berlin, Germany), pp. 572577, SpringerVerlag, 1993.
[23] J. Flusser and T. Suk, Affine moment invariants: a new tool for character recognition, Pattern Recognition Letters, vol. 15, pp. 433436, April 1994.
[24] L. Gupta and M. D. Srinath, Contour sequence moments for the classification of
closed planar shapes, PatternRecognition, vol. 20, no. 3, pp. 267272, 1987.
[25] D. L. Thiele, C. KimmeSmith, T. D. Johnson, M. McCombs, and L. W. Bassett, Using tissue texture surrounding calcification clusters to predict benign vs. malignant
outcomes, Medical Physics, vol. 23, pp. 549555, April 1996.
[26] R. M. Haralick, Statistical and structural approaches to texture, Proceedings of the
IEEE., vol. 67, pp. 786804, May 1979.
[27] G. E. Carlson and W. J. Ebel, Cooccurrence matrix modification for small region
texture measurement and comparison, in IGARSS88  Remote Sensing: Moving towards the 21st Century, (Piscataway, N. J.), pp. 519520, IEEE, 1988.
[28] F. Argenti, L. Alparone, and G. Benelli, Fast algorithms for texture analysis using
cooccurrence matrices, IEE Proceedings, Part F: Radar and Signal Processing,
vol. 137, no. 6, pp. 443448, 1990.
[29] L. Alaprone, F. Argenti, and G. Benelli, Fast calculation of cooccurrence matrix
parameters for image segmentation, Electronics Letters, vol. 26, pp. 2324, January
1990.
[30] G. E. Carlson and W. J. Ebel, Cooccurrence matrices for small region texture measurement and comparison, Intl. Journal of Remote Sensing, vol. 16, no. 8, pp. 1417
1423, 1995.
[31] M. Oberholzer, M. Ostreicher, H. Christen, and M. Bruhlmann, Methods in quantitative image analysis, in Histochem Cell Biology, pp. 333355, SV, 1996.
[32] M. Nadler and E. Smith, Pattern Recognition Engineering. New York: John Wiley,
1993.
[33] T. M. Cover, The best two independent measurements are not the two best, IEEE
Transactions on Systems, Man, and Cybernetics, vol. 4, pp. 116117, January 1974.
[34] T. M. Cover and J. M. van Campenhout, On the possible orderings in the measurement selection problem, IEEE Transactions on Systems Man and Cybernetics, vol. 7,
pp. 657661, Sept 1977.
[35] R. Duda, P. Hart, and D. Storck, Pattern Classification and Scene Analysis: Classification. New York: Wiley, 2000.
[36] H. J. Holz, Classifierindependent feature analysis. D.Sc. thesis, The George Washington University, May 1999.
[37] H. J. Holz and M. H. Loew, Multiclass classifierindependent feature analysis, Pattern Recognition Letters, vol. 18, pp. 12191224, November 1997.
References 339
[54] M. Insana, R. Wagner, B. Garra, D. Brown, and T. Shawker, Analysis of ultrasound
image texture via generalized rician statistics, Optical Engineering, vol. 25, pp. 743
748, June 1986.
[55] R. S. Mia, M. H. Loew, K. Wear, and R. Wagner, Quantitative ultrasound tissue
characterization using texture and cepstral features, in Proceedings of SPIE  Medical Imaging 1998, vol. 3338, pp. 211219, 1998.
[56] M. Kadah, A. Farag, J. Zurada, A. Badawi, and A. Youssef, Classification algorithms for quantitative tissue characterization of diffuse liver disease from ultrasound
images, IEEE Transactions on Medical Imaging, vol. 15, pp. 466478, August 1996.
[57] U. Raeth, D. Schlaps, B. Limberg, I. Zuna, A. Lorenz, G. Kaick, W. Lorenz, and
B. Kommerell, Diagnostic accuracy of computerized bscan texture analysis and
conventional ultrasonography in diffuse parenchymal and malignant liver disease,
Journal of Clinical Ultrasound, vol. 13, pp. 8799, February 1985.
[58] F. Valckx and J. Thijssen, Characterization of echographic image texture by cooccurrence matrix parameters, Ultrasound in Medicine and Biology, vol. 23, no. 4,
pp. 559571, 1997.
[59] L. Fellingham and F. Sommer, Ultrasonic characterization of tissue structure in the in
vivo human liver and spleen, IEEE Transactions on Sonics and Ultrasonics, vol. 31,
pp. 418428, July 1984.
[60] K. Suzuki, N. Hayashi, Y. Sasaki, M. Kono, Y. Imai, H. Fusamoto, and T. Kamada,
Ultrasonic tissue characterization of chronic liver disease using cepstral analysis,
Gastroenterology, vol. 101, pp. 13251331, November 1991.
[61] K. Wear, R. Wagner, M. Insana, and T. Hall, Application of autoregressive spectral
analysis to cepstral estimation of mean scatterer spacing, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 40, pp. 5058, January 1993.
[62] B. Bogart, M. Healy, and J. Tukey, The frequency analysis of time series echoes:
cepstrum, pseudoautocovariance, crosscepstrum, and saphe cracking, in Proceedings of the Symposium on Time Series Analysis (M. Rossenblat, ed.), (New York),
pp. 209243, Wiley, 1963.
[63] T. Varghese and K. Donohue, Meanscatterer spacing estimates with spectral autocorrelation, Journal of the Acoustical Society of America, vol. 96, pp. 35043515,
December 1994.
[64] T. Varghese and K. Donohue, Estimating mean scatterer spacing with frequencysmoothed spectral autocorrelation function, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 42, pp. 451463, May 1995.
[65] R. S. Mia, M. H. Loew, K. Wear, and R. Wagner, Quantitative estimation of scatterer spacing from backscattered ultrasound signals using the complex cepstrum, in
Proceedings of the 15th International Conference, Information Processing in Medical
Imaging, pp. 513518, 1997.
[66] A. Oppenheim and R. Schafer, Discretetime signal processing, ch. 12. Englewood
Cliffs, NJ: Prentice Hall, 1989.
References 341
[82] J. Shea, From missiles to mammograms, PENN Health, vol. 910, 1966.
[83] C. Burdett, H. Longbotham, M. Desai, W. Richardson, and J. Stoll, Nonlinear indicators of malignancy, in SPIE: Biomedical Image Processing and Biomedical Visual,
Part 2, vol. 1905, pp. 853860, 1993.
[84] C. Burdett and M. Desai, Localized fractal dimension measurement in digital mammographic images,, in SPIE : Vis Comm and Imag Proc, Part 1,, vol. 2094, pp. 141
151, 1993.
[85] S. Pohlman, K. Powell, N. Obuchowski, W. Chilcote, and S. GrundfestBroniatowski,
Quantitative classification of breast tumors in digitized mammograms, Med Phys,
vol. 23, pp. 13371345, August 1996.
[86] C. Priebe, J. Solka, R. Lorey, G. Rogers, W. Poston, M. K. amd W. Qian, L. Clarke,
and R. Clark, The application of fractal analysis to mammographic tissue classification, Cancer Letters, vol. 77, pp. 183189, 1994.
[87] V. Velanovich, Fractal analysis of mammographic lesions: A feasibility study quantifying the difference between benign and malignant masses, Amer J Med Sci,
vol. 311, pp. 211214, May 1996.
[88] C. B. Caldwell, S. J. Stapleton, D. W. Holdsworth, R. A. J. amd W. J. Weiser,
G. Cooke, and M. J. Yaffe, Characterisation of mammographic parenchymal pattern by fractal dimension, Phys. Med. Biol., vol. 35, no. 2, pp. 235247, 1990.
[89] A. I. Penn and M. H. Loew, Estimating fractal dimension of medical images
with fractal interpolation function models, IEEE Transactions on Medical Imaging,
vol. 16, pp. 930937, Dec 1997.
[90] M. F. Barnsley, Fractals Everywhere. Academic Press, 1993.
[91] M. H. Loew, A diffusionbased description of shape, in Pattern Recognition Theory
and Application (P. Devijver and J. Kittler, eds.), NATO ASI Series, pp. 501508,
Berlin: SpringerVerlag, 1987.
[92] J. A. Baker, P. J. Kornguth, J. Y. Lo, M. E. Williford, and C. E. Floyd Jr., Breast cancer: Prediction with artificial neural network based on birads standardized lexicon,
Radiology, vol. 196, pp. 817822, 1995.
[93] Logistic Regression, Examples, Version 6. Cary, NC.: SAS Institute, Inc., first ed.,
1995.
[94] D. Dorfman, C. Metz, B. Herman, P. Wang, J. Shen, and H. B. Kronman, Program for
the IBM PC. http://wwwradiology.uchicago.edu/sections/roc/
software.cgi, 1993.
[95] J. Shen, B. Herman, H. B. Kronman, P. Wang, and C. Metz, Clabroc program IBMPC version 1.2.1. http://wwwradiology.uchicago.edu/sections/
roc/software.cgi, 1993.
[96] S. E. Harms, D. P. Flamig, K. L. Hesley, M. D. Meiches, R. A. Jensen, W. P. Evans,
D. A. Savino, and R. V. Wells, MR imaging of the breast with rotating delivery of
excitation off resonance: Clinical experience with pathologic correlation, Radiology,
vol. 187, pp. 493501, 1993.
CHAPTER 6
Extracting Surface Models of the Anatomy
from Medical Images
Andre Gueziec
Consultant
Contents
6.1
6.2
6.3
6.4
6.5
6.6
6.7
Introduction
Surface representations
345
345
6.2.1
345
Point set
346
348
348
349
6.3.2
6.3.3
Tetrahedral decomposition
A lookup procedure to replace the determinant test
350
355
357
358
359
360
361
363
368
369
371
Optimization
6.6.1 Smoothing
371
373
376
383
383
343
6.8
387
390
6.9
References
390
Introduction 345
6.1
Introduction
Surface representations
Various geometric objects may be used to represent a surface. The first object,
in Section 6.2.1, specifies a surface with a set of samples (and hence does not define
a surface per se). A powerful piecewise linear representation is introduced in
Section 6.2.2. Smooth curved surface representations (Section 6.2.3) are (in theory)
more compact and allow the full exploitation of curvature and normal information;
however, their topology is less flexible.
6.2.1
Point set
The simplest representation for a surface is probably a point set, each point,
or vertex, being defined by three Cartesian coordinates . Contrary to a
curve, for which a set of sample points may be naturally ordered according to a
progression along the curve, there is no such natural ordering for surface sample
points.
Point sets have been used successfully for registration purposes, as in the Head
in the Hat scheme [7]. More recent surface registration methods using sample
points are described in [8, 9]. Registration between point sets or between a point
set and a surface is generally performed by associating a corresponding point (respectively, surface location) to each point of a set and by determining a geometric
Triangular mesh
triangle fan
singular
edge
(b)
singular
vertex
(c)
regular
vertex
(d)
(a)
Figure 6.1: Triangular meshes: (a) triangular mesh modeling the boundary of a proximal
portion of a human femur; (b) singular edge and vertex; (c) regular vertices. (d) a nonmanifold mesh having a singular vertex.
(a)
(b)
(c)
Figure 6.2: Triangular meshes with a different Euler Number: (a) 2; (b) 0 (one handle);
(c) 2 (two handles).
enumerating the edges by visiting in turn each triangle). It can be seen that
. Hence, the number of triangles is approximately twice the number
of vertices, provided that .
Most of the applications listed in the present chapter utilize a triangular mesh.
If the triangles are sufficiently small, it is possible to approximate surfaces closely
enough in most cases. Also, as discussed further in Section 6.6.1, a signal processing framework may be applied to triangular meshes, allowing such operations as
smoothing.
One drawback of a triangular mesh is the amount of data that is required to
represent it. To address this issue, methods have been developed to approximate
triangular meshes using fewer triangles (Section 6.6.2). Methods for compressing
triangular meshes were also developed, initially in the computer graphics community [14, 15]; these methods facilitate storage, transmission, and rendering.
Curved surfaces
Isosurface extraction
Hexahedral decomposition
Marching Cubes (Lorensen & Cline, 1987) [27] is probably the best known
variant of the hexahedral decomposition method. However, the resulting tiling may
be inconsistent across hexahedra, resulting in nonmanifold surfaces [24, 28, 29].
(This problem may be resolved in some implementations.) The Wyvill et al. [30]
or Kalvin [28] methods for instance do not have this problem. For brevity, we
concentrate on the tetrahedral decomposition method in the next Section 6.3.2. A
complete description of an implementation of the Wyvill et al. method is also available in [26]. In [26] the results that are presented indicate that when subjected to
the same simplification process the final output of the hexagonal and tetrahedral decompositions are virtually identical. It is also argued that implementing the tetrahedral decomposition is perhaps simpler. Additional perspectives on isosurface
2
Even for CTscans of dry bone, isosurfaces may yield models that are not anatomically faithful
in regions of thin bone.
hexagonal cell
connecting 8
iso surface voxels
volume element
or voxel
Tetrahedral decomposition
This section describes the method of Gueziec and Hummel [43], which is an
extension of the method of Doi, Koide, et al. [44, 45]. The differences between
the methods described in [43] and in [44, 45] are related to 1) the specifics of
the decomposition into tetrahedra and the resulting orientation of the tetrahedra,
2) the lookup procedure for oriented triangles, and 3) the addition in [43] of a
simplification process following the extraction process. The simplification process
is described in detail in Section 6.6.2. The hexahedral lattice is decomposed into a
tetrahedral lattice as shown in Fig. 6.4.
In addition to the examples provided in Section 6.3.6, Gueziec and Hummels
method has been applied to compare schizophrenic and normal ventricles by Dean
et al. [46] and is used for surface extraction before registration in [9, 47].
The method has also been used during simulations studies by Gencer et al. on
the optimal placement of electrodes for electric source imaging [48]. The effect of
the simplification procedure on the accuracy of the detection of rib lines (which are
discussed in detail in Section 6.3.5) has been studied by Gu e ziec and Dean [49].
Figure 6.4: Triangulation of an hexahedral cell producing five tetrahedra. Four are isosceles and isomorphic to one another (
,
and
are shown at the left and center. The fifth tetrahedron (right) is equilateral and occupies the
center of the hexahedral cell.
6.3.2.1
The following procedure is used to build a tetrahedron as an (oriented) quadruple of vertices. The actual implementation may either apply this procedure for
every tetrahedron or apply it once and insert all possible tetrahedra (10) in a table.
This table may also be built manually. The first two methods are probably less
errorprone.
For any given hexahedral cell, two tetrahedral decompositions are possible.
One of these is shown in Fig. 6.4; the other is mirror symmetric to the one shown.
In order to be consistent between neighboring hexahedral cells, i.e., in order that
faces and edges of tetrahedra in one cell match faces and edges of tetrahedra in the
neighboring cells, we must alternate between the two decompositions from cell to
cell, in a threedimensional checkerboard fashion.
We next describe the use of binary operations to perform the decomposition
easily. (This is not used in the Doi/Koide version.) Each of the vertices of the
hexahedral cell is numbered from 0 to 7 (see Fig. 6.5). Moving along an edge of
the hexahedral cell is performed by inverting one of the three bits of the vertex
number. The three possible 1bit inversions will be denoted by 001, 010, and
100 (see Fig. 6.5).
Each hexahedral cell has an integer coordinate location
for its local
origin. We use , , and
to measure the row, column, and height in the array of
cells. In order to identify each tetrahedron within a cell, we perform the following
steps:
1. If
is even, we call the cell an even cell, and we say that the parity of
the cell is even. If the sum is odd, the cells parity is odd. To determine tetrahedron number 1 within the cell, we select apex as vertex 000 in an even
cell, and as 001 in an odd cell. We then obtain three other apices from by
applying the motion operators 001, 010, and 100, resulting in , ,
0(000)
010
1(001)
011
3(011) 110
2(010)
100
101
4(100)
6(110)
001
5(101)
7(111)
A1
a11
a11
A1
a12
a12
a13
a13
Odd
Even
Figure 6.6: To determine tetrahedron number 1 within the cell, we select apex
as vertex
000 in an even cell and as vertex 001 in an odd cell. We then obtain three other vertices
from
001,
010, and
100.
,
The next step consists in determining whether a portion of the isosurface will
intersect a given tetrahedron . The voxel values
corresponding to the four tetrahedron apices are retrieved from the threedimensional
lattice. For each tetrahedron edge that exhibits an intensity sign change, a vertex of
the polygonal approximation of the isosurface
is created. The exact position
of the vertex is determined by the zerocrossing of a function interpolating voxel
values along the edge. Because of the issue illustrated in Fig. 6.7, we use linear
interpolation on an edge of the hexahedral cell and bilinear interpolation on a diagonal edge (which amounts to using bilinear interpolation overall). Specifically, if
denote the four intensity values on the face of an 8cell, then the intensity
value along the diagonal edge is given by:
Here, varies from 0 to 1 linearly along the diagonal. Provided ,
will have
exactly one zero in the range , which can easily be determined from the
quadratic formula (the other zero will fall outside this range).
6.3.2.3
1
.29
.5
1
Figure 6.7:
.25
1
Using a linear interpolation along the diagonal edge results in a severe dif
ference in position for the polygonal surface when the diagonal edge is swapped (left and
right), assuming a square face. Instead, we use a bilinear interpolant (middle). The numbers .5, .29, and .25 indicate the relative position of the isosurface, where it intersects the
toplefttobottomright diagonal.
I, II, and III, are illustrated in Fig. 6.8. For the Cases I and III, three of the values
have the same sign. For Case II, two vertices are positive and two are negative. In
this case, the surface will intersect all four faces, and we have a quadrilateral. By
choosing arbitrarily a diagonal of the quadrilateral, we obtain two triangles within
the tetrahedron. Combining all cases, we have a patch of the surface represented as
either one or two triangles. The triangles can be oriented.
Cases I and III For Cases I and III, exactly one value among
,
,
,
has sign opposite to the others. Using this value, we compute intersections along
the edges connecting its vertex to the other three vertices, preserving order among
. So, for example, if
has the different sign, we compute intersection along , then , and then . These three intersections
determine an ordering which is either the correct direction, or opposite to the correct direction. One way to determine if the orientation is correct is to check it by
examining the sign of a determinant, as done by Doi/Koide, as follows.
First, we reorder the vertices to obtain such that
the vertices with negative values precede the vertices with positive values, without
otherwise disturbing relative order. Viewing the vertices as coordinates in threespace, we can then compute the determinant
If this determinant is negative, then the ordering of the triangle vertices is correct;
otherwise, the ordering must be reversed. The procedure can be verified by carefully considering the cross product (see Fig. 6.8).
Case II In this case, exactly two vertices are negative, and two are positive. We
reorder the vertices as above, to give vertices , where the
Case I
+
+
Case II
Case III
Figure 6.8: Defining one or two oriented triangles. Depending upon the number of vertices
outside the solid (marked with the minus sign) we define three or four vertices of the surface
approximation in the ordering specified in the text. The condition for having to reverse the
ordering is . Note that this value is six times the volume, positive
or negative, of the tetrahedron.
values associated with and are negative, and the values associated with and
are positive. We compute the vertices of the quadrilateral as follows. We first find
the zero along . Next, we find the zero along , then , and finally
. This sequence of four points establishes a cycle, which is either the correct
ordering, or the incorrect ordering, which can be easily checked.
In this case, the verification procedure is exactly the same as in the previous
case. That is, viewing the vertices as vectors in , we compute
. The ordering of the interpolation points defined above is correct if the
determinant is negative; if the determinant is positive, then the order should be
reversed.
If the resulting ordering of the interpolation points is given by ,
then there are two possible triangulations: one is given by the following pair of ordered triangles , and the other is given by the pair
. The triangulations may be chosen arbitrarily. However, by always choosing the first one in an even cell, and the second one in an odd
cell, the degree of vertices in the triangulation can be shown to be always less than
or equal to nine [43].
6.3.3
Bit code
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Sign of det()
Monga et al. [50] provide a method for computing principal curvature directions and curvature values of an isosurface directly from voxel data and spatial
derivatives of the voxel data 3 . Approaches based on triangular patches, rather than
voxel data, are given in [52, 53] and [54]. If a differentiable explicit representation
is available (e.g., using spline tensor products), the method of [18] may also be
used.
In order to evaluate the surface curvatures using volume data, we consider a
curve , with tangent vector , along a normal section of the isosurface 4 . Since
is contained in a plane that contains the surface normal, , the normal of is
also
, where
designates a continuous function modeling volume
(voxel) data. The differentiation of
in the direction of yields
where
denotes the normal curvature of , which is also by definition the surface
curvature in direction , and
is the Hessian:
3
Rib (or ridge, or crest) lines of a surface are the loci of the surface where the
principal curvatures reach a local extremum. Rib lines have been studied in medical
imaging by Bookstein and Cutting [55, 56], the Epidaure team at INRIA [51, 57
61], as well as Dean et al. [46, 49]. While Epidaures main application has been to
extract (few) salient and robust features for registering 3D image data sets, Cutting,
Bookstein, Dean, et al. have been pursuing the goal of using rib lines, combined
with other networks of (geodesic, or minimumdistance) lines, to build templates
of the anatomy, particularly of the face, skull, or ventricles.
Several characterizations of rib lines have been proposed. We refer to the
abovereferenced publications as well as textbooks on the geometry of surfaces [62
64]. Referring to Section 6.3.4 above, rib lines may be related to ribs (discontinuity curves) of the focal surfaces of a surface, which are the loci of the centers of
curvature of the surface 5 . There is one center for the minimum principal curvature,
another center for the maximum principal curvature, and hence two focal surfaces,
except at an umbilic point. Ribs may thus be colored differently, depending on
which focal surface they relate to [64].
In this section, we use and illustrate Markatiss characterization of a rib line
5
A center of curvature is located along the normal, at an offset equal to the radius of curvature,
which is the inverse of the curvature.
(6.1)
(6.2)
Isosurface examples
The first example in Fig. 6.10 is a cortex model composed of 362,000 triangles
(courtesy of Henry Rusinek and Gregoire Malandain, who performed the segmentation [66]) that was constructed in 59 seconds on an IBM RS6000 computer using
the tetrahedral decomposition method. Surface simplification (see [43] and Section 6.6.2) reduced the number of triangles from 362,000 to 52,000. The result is
shown in Fig. 6.10(b). Brain sulci appear in red, and gyri appear in green and blue,
as color coded relative to the magnitude of the largest principal curvature.
The next example is a surface model obtained from a CT scan of a cranium from
the Cleveland Museum of Natural History Collection (Courtesy of Bruce Latimer,
Court Cutting, and David Dean). The following surface model was obtained using the tetrahedral decomposition method followed by surface simplification. The
model, shown in Fig. 6.11, comprises 66,000 triangles and 129,000 triangles, simplified from an original isosurface containing 3,450,000 triangles.
6
Markatiss ridge equation was brought to the authors attention through personal communication
with I. Porteous, from the University of Liverpool.
Figure 6.9: Ribs (in white) drawn on an isosurface of a cortex segmented in an MRI image. The ribs were characterized using Markatiss equation. The ribs appear to follow the
centerlines of the sulci.
6.4
(a)
(b)
Figure 6.10: Example of isosurface extraction and curvature computation. (a) cortex model
represented as an isosurface (362,000 triangles). (b) simplified cortex (52,000 triangles);
the vertices are colorcoded (with Gouraud shading)
largest surface principal curvature. (For a color version of this Figure see Plate 1 in the
color section of this book.)
while limiting the xray dosage. Direct extraction of bone surfaces from such CT
data using software that builds an isosurface produces staircase artifacts: bone
contours are located several voxels apart from slice to slice, and the gap is modeled
by the isosurfacing method as a flat surface portion parallel to the slices. Aside
from the problems created by the irregular spacing between slices, bone has very
different Hounsfield numbers in the CT data because of varying bone densities
(within the same femur bone). Selecting a different isovalue for each slice may
not be sufficient to obtain a correct segmentation of each slice. This problem is
illustrated in Fig. 6.12.
6.4.1
(a)
(b)
Figure 6.11: Isosurface extracted from a CTscan image. (a) model of cranium extracted
as an isosurface and simplified. (b) the surface is colorcoded (with Gouraud shading)
according to the curvature as in Fig. 6.10. (For a color version of this Figure see Plate 2 in
the color section of this book.)
Figure 6.12:
inadequate results, even if the isovalue is chosen differently for each slice. We show here
a CT slice taken in the distal femur region (region of the condyles) along with a best attempt
(yet widely unsuccessful) at segmenting the bone using isocontours (gray almostfractallike curves). A segmentation using our deformable model implementation is shown using
the white curve (another gray curve next to the white curve represents a preliminary result
obtained using a smoothed lowresolution version of the image).
Another avenue of research seeks to automate changes of topology. We briefly study some
surface modeling aspects of this issue in Section 6.5.
10
The methods could be extended to nonparallel slices, but they dont seem to occur in practice.
Figure 6.13: Using deformable contours for surface extraction. A hierarchical deformable
contour can cope with incorrect user input (e.g., rightmost point marked with a square), or
reuse such input for multiple slices.
(6.3)
(6.4)
(a)
Figure 6.14:
(b)
Femur models obtained by tiling contours. The models have been simpli
fied as explained in Section 6.6.2. (a) Proximal femur (1,664 triangles). (b) distal femur
(4,199 triangles).
To produce the pictures shown in Fig. 6.14, we have maximized the sum of triangle
compactnesses [84]. In [84], the compactness of a triangle is defined with the
following formula:
(6.5)
where is the positive area of the triangle and are the lengths of the
three sides. This formula defines a dimensionless measure similar to the areatoperimeter ratio; this
measure can be computed inexpensively, without evaluating
square roots (except , which is precomputed or read from a table).
Following [67, 83], we consider a rectangular graph (grid) whose nodes
represent edges between and vertices in the respective adjacent contours. Edge
(which is the same as ) is chosen as a closest point pair between
the contours. A path in the graph from node to node represents
a surface connecting the two contours. This graph is shown in Fig. 6.15(a). Starting
with Edge 00, we may decide to construct a triangle by connecting the edge with
with the next vertex on the top contour (with vertices), which is represented in
the graph by drawing a horizontal arrow starting from vertex (0,0) and pointing
left. Alternatively, we may decide to connect Edge 00 with the next vertex on
the bottom contour (with vertices), which is represented in the graph by drawing
a vertical arrow starting from vertex (0,0) and pointing down. Starting from the
graph node pointed by the arrow (this node represents an edge in the tesselation,
0
m
0
n
n
0
(a)
Figure 6.15:
(b)
contours comprising and vertices is known, and a first edge connecting the two contour
is chosen (Edge 00), an optimum of a userdefined surface quality criterion is built using
dynamic programming. (a) rectangular graph whose nodes represent edges between
and vertices in the respective adjacent contours. Each node has a cost associated,
representative of an optimum path from (0,0) to that node; (b) edges drawn in gray map the
gray path in A (see text).
either Edge 10 or Edge 01), we may again decide to connect with a vertex from
either the top or the bottom contours. Both choices may be recorded in the graph.
By examining carefully Fig. 6.15, the reader will be able to convince him/herself
that all possible tesselations starting with Edge 00 may be represented as a path in
the toroidal graph of Fig. 6.15(a).
For each triangle, we compute a cost function (area, volume, or else) as determined by the user. The cost of a particular tesselation is obtained by adding the
costs of all triangles. The costs for all possible paths may be computed by filling
an by table, wherein each table entry corresponds to a vertex of the graph of
Fig. 6.15(a), and represents the cost of the optimum path leading to that vertex. The
cost of Entry thus represents the cost of the best tesselation. This
process follows a general method called dynamic programming [85].
6.4.2.2
(a)
(b)
Figure 6.16: Capping of a distal femur model by extrapolating areas and centroids of a few
contours: (a) before; (b) after.
of the slices coordinate: , with and
(the area
must decrease from a positive value). After fitting to the data, at the slice
coordinate that predicts an area of zero, we create a new surface vertex ( and
coordinates may be predicted with the centroid data) and link it with triangles to all
edges of the last contour: the surface is thus capped. This operation is illustrated in
Fig. 6.16.
6.5
In previous sections we provided the complete details of methods for extracting surfaces from segmented volume data (Section 6.3) and from a collection of
Tensorproduct Bsplines
Bspline basis functions are piecewise polynomials with a finite support and a
recursive definition. The spline basis functions of degree zero are the characteristic
functions of the variable for the intervals between real values called
knots:
!
otherwise
!
!
!
(6.6)
The !
functions have degree " . They are globally
. The evaluation
of (6.6) is especially efficient, i.e., can be computed with divided differences, and
can be used to implement splines of all orders. We plot in Fig. 6.17 a quadratic
Bspline function. In order to model a curve, we associate a control vertex with
each function. With the shape of the functions in Fig. 6.17(a), endpoints are interpolated. It is also possible to obtain a closed curve, with the help of functions as in
Fig. 6.17(b).
Cubic spline functions (degree " ) are particularly useful because of their
property of minimizing the bending energy in # (see for instance [88, 89]).
In order to model a surface patch , we construct a tensor product of spline
functions, i.e., a surface point
will be written as
! !
, where we have fixed and suppressed " .
When using the Bspline functions of the type of Fig. 6.17(a), if none of the
control vertices is repeated, the surface has a planar topology. When using the Bspline functions of the type of Fig. 6.17(b) for one of the variables (say ) setting
0.8
0.6
0.6
0.
0.
0.2
0.2
10
30
50
10
20
30
50
60
(a)
(b)
Figure 6.17: Bspline functions. (a) Uniform quadratic singular Bspline functions. If a
control point is attached to each function, a curve that interpolates endpoints is obtained.
(b) Uniform quadratic periodic splines. The two last control points are set equal to the two
first in order to obtain a closed curve.
the last " control points along equal to the " first for each row of control points,
we obtain a cylindrical topology. When using the Bspline functions of the type of
Fig. 6.17(b) for the two variables
and setting the first " control points equal
to the last " for each row and each column of control points, a toroidal topology is
obtained.
The spherical topology is slightly more difficult to obtain than the planar, cylindrical, and toroidal topologies. The control vertices may be organized along an even
number of meridians, aligned along one parametric direction. As in the toroidal
case, Bspline functions of the type of Fig. 6.17(b) are used for both parametric
directions. Control points should be repeated, and one must distinguish between
odd order and even order splines. For instance, in the case of biquadratic Bsplines
illustrated in Fig. 6.19, each two terminal control vertices on a meridian are set
equal to the two first control vertices on the opposite meridian (which is why the
number of meridians must be even).
When using one Bspline surface patch, only the above mentioned four different
topologies may be obtained (planar, cylindrical, toroidal, and spherical). This is a
significant limitation for a large number of anatomical structures.
In order to deform the surfaces, the same mechanisms that were developed for
snakes may be used [76]. An energy, compounding surface tension and bending and external forces defined using the data to be segmented (often, depending
upon the gradient of image data), is minimized. As the energy term is generally
too complex to be minimized in closed form (it depends upon local voxel values
and their derivatives), the minimization is generally performed using an iterative
process, for instance an iterative solution of a partial differential equation, or a
sequence of leastsquare approximations [18]. Each step of the minimization corresponds to a new position for the surface. Such an iterative deformation process
Optimization 371
is illustrated in Fig. 6.18. An initial Bspline surface approximating a cylinder is
positioned inside a threedimensional volume of MRI data. Selected slices of the
MRI data set are shown in Fig. 6.18(a). The surface then evolves and converges to
approximate the epidermis [Fig. 6.18(b)(d)]. A more detailed study of deformable
models and their convergence is available in Chapter 3.
Splines of order 3 and higher are particularly useful for computing surface curvatures. The ability to compute surface curvatures is one of the justifications for
using smooth surface models (see also Cohen et al. [90]). Curvatures are visualized
in Figs. 6.18 and 6.20. In order to perform these curvature computations, since the
spline surfaces are of the form
with derivatives with respect to and
easily available, it is possible to use standard formulae. These formulae can be found
in textbooks of differential geometry [91] or in [92]. Recent work has demonstrated
that high curvature features are very useful for registration purposes [93,94]. However, several methods exist for computing curvatures on triangular meshes [53, 54].
6.5.2
Few methods using curved surface models and allowing dynamic changes of
topology have been proposed. The method of Leitner and Cinquin [95] does so
using cubic Bspline surfaces. This method starts with a spherical topology. A
hole is created (evolving to a torus) if the method detects a selfintersection of
the surface at some point. To create more complex topologies, several smoothly
connected Bspline patches are used.
The study of recent work indicates that when using a triangular mesh to represent a surface, there is more flexibility in changing the topology. Lachaud and
Montanvert [96] propose a method in which the topology can be changed at each iteration. After the vertices have been moved using rules similar to other deformable
models, simple proximity constraints determine whether vertices should be created
or removed or whether a connection between surface bodies should be created or
removed. Figure 6.21 illustrates the operation of the method of [96] for extracting
brain vessels from a MRI scan.
McInerney and Terzopoulos [97] tessellate volume space using a tetrahedral
mesh. After each evolution of the surface, the vertices that are crossed in the mesh
are marked, and the surface is retiled (from the tetrahedral mesh data) as explained
in Section 6.3. Owing to the mechanism used for marking the tetrahedral mesh
vertices, the surface can only expand everywhere or retract everywhere.
6.6
Optimization
The surface models, either spline or polygonal, that were constructed in the
previous sections may benefit from various optimizations. During isosurface construction or tiling from contours for instance, the difference between intraslice and
interslice sampling ratios may create artifacts, which can be purely visual (such as
a staircase effect) or which can affect further processing such as registration.
(a)
(b)
(c)
(d)
Figure 6.18: Segmenting a MRI scan using a Bspline deformable surface model. (a) MRI
slices used as input to the Bspline deformable model. (b)(d): successive iterations of the
model with regions of higher curvature highlighted. Dark curves are crest lines, extracted
with an algorithm described in [18].
Optimization 373
Figure 6.19: Network of control vertices for a biquadratic spline surface with spherical
topology. Each pair of opposite meridiansshare three vertices at each pole (north, south),
including the pole vertex itself.
6.6.1
Smoothing
We describe here a method for smoothing polygonal surfaces that was developed by Taubin [98]. A straightforward technique for smoothing a polygonal mesh
consists of replacing each vertex with an average of its neighbor vertices. Taubin
noted that this technique resulted in shrinkage for the objects bounded by the polygonal mesh. His solution involves two steps. The set of vertices connected to a vertex
by an edge is denoted
. For each of the two steps, a vertex
is updated
according to the following equation:
$ %
&
(6.7)
where the weights & sum to one, the scale factor $ is positive and applied during
the first step, and the scale factor % is negative and applied during the second step.
Note that for a given step the displacements are first computed for each vertex and
then applied all together at the same time. (Otherwise the displacement applied to
a particular vertex would influence the displacements applied to its neighbors.)
The first step of this procedure thus corresponds to applying the traditional
neighbor averaging step (albeit using a scale factor). While it has been observed
that this first step yields shrinkage, the second step moves each vertex in the opposite direction, thereby compensating for the shrinkage effect.
Figure 6.20: Curvature of a tensorproduct spline surface. Maximum surface principal curvature represented using a colormap that varies from blue to red. (For a color version of this
Figure see Plate 3 in the color section of this book.)
Optimization 375
(a)
(b)
(c)
Figure 6.21:
formable model represented as a triangular mesh. The model evolves inside an image
pyramid; several iterations are performed for a given level [96]: (a) level 3 of the image pyramid; (b) level 1; (c) level 0. (Scan courtesy of UMDS Group, London. Images courtesy of
J.O. Lachaud and A. Montanvert.)
(a)
(b)
Figure 6.22: Illustration of surface smoothing using the method of [98]: (a) before smoothing; (b) after smoothing: 200 smoothing steps,
.
Figure 6.22 illustrates the operation of this method on a femur model comprising
180,854 triangles, extracted from a CT scan. The spacing between slices of the
CT data is about 6 times the size of a pixel within a slice. The staircase effect
resulting from this difference in sampling is illustrated in Fig. 6.22(a). The result
of smoothing is visualized in Fig. 6.22(b).
The smoothing affects the vertex coordinates and, thus, the accuracy of the
model. For some applications in which the geometric accuracy cant be compromised, the difference between the surfaces before and after smoothing may be
tracked using the methods in Section 6.6.2.
Smoothing techniques are not limited to the ones presented here. For instance,
a new technique, probably more complex than Taubins, was recently introduced
in [99].
6.6.2
Optimization 377
different levels of detail. For this application, the simplification process should
preserve the visual appearance of surfaces as much as possible.
A technique for reducing the number of patches in a spline model was proposed by Gopi and Manocha [100]. This method merges adjacent triangular B e zier
patches according to several patterns, to form larger triangular B e zier patches.
To guarantee the accuracy of the approximations, and the faithfulness of visualizations, it is very useful to be able to bound the deviation between the original
model and the approximation. This operation is difficult because the maximum
distance between two polygonal meshes is not reached in general for a pair of vertices. Methods for bounding the maximum surface deviation during simplification
are described in [101105]. Surface simplification methods are reviewed in [84].
Here are some additional references: [106122].
We next focus on the variable tolerance method, which is described fully
in [84] 11 . This method, along with several of the methods referenced above, relies on the atomic operation represented in Fig. 6.24 to reduce the resolution of the
surface: the edge contraction brings together the two endpoints of an edge, thereby
eliminating one vertex and two triangles.
The variable tolerance method further relies on two main processes. The first
process, called the subdivision process, measures the incremental deviation between two corresponding portions of a polygonal surface before and after a simplification operation (i.e., an edge contraction, but this would work with other operations such as vertex or triangle removals [124]). Given the two graphs representing
the two surface portions, the subdivision process builds subdivisions of both surfaces in piecewise planar polygons such that there is a onetoone correspondence
between the elements and a direct comparison may be performed. Such subdivisions are shown in Fig. 6.25, denoted by ) and . For each corresponding pair
of planar polygons the maximum distance between the two must occur for a pair
of corresponding vertices of the polygons. The error bounding process described
below uses this information to keep track of a bound on the deviation between the
simplified surface and the original surface.
In practice, the subdivision process only generates pairs of corresponding vertices. As illustrated in Fig. 6.25, for each vertex of one graph that does not appear in
the other graph, a representative must be found in the other graph. This is done by
projecting the vertex on the closest triangle. Pairs of edges belonging to different
graphs must also be tested for potential bridges, which occur when the shortest segment that can be drawn between the two edges has its endpoints inside the edges:
this is the case for edges
and
in Fig. 6.25. The full detail of this
process is described in [84].
The second process, or error bounding process, is used for keeping track of the
deviation after an arbitrary number of simplification operations. The error bounding
process uses an error volume for reporting an error for each surface point (not
11
(a)
(b)
(c)
Figure 6.23: Simplification of 6 surfaces (lung, external, tumor, spinal cord, vertebrae,
bolus) extracted and visualized using a clinical radiotherapy visualization system: (a) high
resolution: 46,469 vertices; (b) low resolution: 4,688 vertices; (c) very low resolution: 1,811
vertices. Here, the difference between the surfaces can hardly be seen, which is the ideal
outcome of surface simplification. The simplification method of [84] was used. (Courtesy of
Mike Zeleznik, RAHD Oncology Products.) (For a color version of this Figure see Plate 4 in
the color section of this book.)
Optimization 379
v2
v1
v0
Figure 6.24: An edge contraction consists of bringing together the two endpoints
of an edge, until they become a single vertex
vm1
vm1
tr+1
vm
tr
v2
tm
w0
v3
sr+1
sm1
sr
vr
sm
t1
vr+1
v0
vr
t0
vr+1
tm1
vm
sr1
tr1
v1
wg
t2
v3
t3
wh
s2
s3
v5
v4
w1
v5
v4
and
graph that does not appear in the other graph, a representative must be inserted in the
other graph (vertices ). This is done by projecting the vertex on the closest triangle of the
other graph: e.g.,
of
is projected to
:
. In
addition, for each pair of edges, a test determines whether the shortest segment that can
be drawn between the two edges has its endpoints inside the edges. When this is the case,
as for edges
and
respective graph. (For a color version of this Figure see Plate 5 in the color section of this
book.)
of a triangle incident to
(on the edge opposite
) with )
weights. Then, to each constraint we associate the vector and the ball
*
of radius +
centered at the tip of , as shown in Fig. 6.27(c).
We then consider the set of vectors and balls or radii + corresponding to all
the constraints, as illustrated in Fig. 6.27(d). The error value may be computed
by determining the smallest ball ! centered at the origin enclosing the set of balls.
(For a ball of negative radius, ! is not required to enclose it, but to contain at least
one of its points, potentially a single point where both balls are tangent.)
By allowing the center to move, we can reduce , the radius of the enclosing
ball. Given an initial ball enclosing a set of balls, we wish to obtain a smaller enclosing ball. We are not particularly interested in the smallest possible ball because
after moving
, the constraints become stale. A simple solution is to construct the
bounding box of the set of balls, take its center, and determine the ball of smallest
radius with that center.
The method can be extended for preservation of data attached to surface vertices, assuming a linear variation across triangles. This is illustrated by the example
of Fig. 6.30.
6.6.2.1
Examples
Optimization 381
A1
A2
A3
A4
(a)
Figure 6.26:
(b)
a ball across the simplified surface. The radius of the ball varies linearly over a surface
triangle, interpolating the ball radii at the triangle vertices. (a) Error volume, in green (A1),
centered on the simplified surface (A2). The radii of the balls are such that the original
surface (A3 or A4) is not only contained in the error volume, in dashed red (A3: incorrect),
but also intersects all the spheres, in blue (A4: correct). (b) After several edge contractions,
the resulting error volume contains all the intermediate error volumes. (For a color version
of this Figure see Plate 6 in the color section of this book.)
and very low resolution (25fold reduction factor). The difference between the
surfaces can hardly be seen in Fig. 6.23, which is a very desirable goal in surface
simplification. Statistics about the simplification of these surfaces are provided in
Table 6.2. The availability of several levels of detail allows for realtime interaction
with the surfaces on a computer display.
Table 6.2:
Vertex counts and timings (CPU seconds measured on a DEC Alpha) for the