You are on page 1of 1284

Editorial Board

Jacob Beutel
Consultant

J. Michael Fitzpatrick
Vanderbilt University

Steven C. Horii
University of Pennsylvania Health Systems

Yongmin Kim
University of Washington

Harold L. Kundel
University of Pennsylvania Health Systems

Milan Sonka
University of Iowa

Richard L. Van Metter


Eastman Kodak Company

The images on the front cover are taken from figures in the text. Top row: left,
Fig. 7.13(c), p. 430; middle, Fig. 2.14, p. 104; right, Fig. 6.10(b), p. 363 and
color plate 1. Middle row: left, Fig. 11.27, p. 659; right, Fig. 13.28(d), p. 774,
Bottom: Fig. 15.10(d), p. 937.

Milan Sonka
J. Michael Fitzpatrick
Editors

SPIE

PRESS

A Publication of SPIEThe International Society for Optical Engineering


Bellingham, Washington USA

Library of Congress Cataloging-in-Publication Data


Handbook of medical imaging / [edited by] Jacob Beutel, Harold L. Kundel,
and Richard L. Van Metter.
p. cm.
Includes bibliographical references and index.
Contents: v. 1. Progress in medical physics and psychophysics.
ISBN 0-8194-3621-6 (hardcover: alk. paper)
1. Diagnostic imagingHandbooks, manuals, etc. I. Beutel, Jacob. II. Kundel, Harold L.
III. Van Metter, Richard L.
[DNLM: 1. Diagnostic ImagingHandbooks. 2. Health PhysicsHandbooks.
3. Image Processing, Computer-AssistedHandbooks. 4. Psychophysics
Handbooks. 5. Technology, RadiologicHandbooks. WN 39 H2363 2000]
RC78.7.D53 H36 2000
616.07'54dc21
99-054487
CIP
Vol. 2 ISBN 0-8194-3622-4
Medical Image Processing and Analysis, edited by Milan Sonka and J. Michael Fitzpatrick.
Published by
SPIE
P.O. Box 10
Bellingham, Washington 98227-0010 USA
Phone: +1 360 676 3290
Fax: +1 360 647 1445
Email: Books@SPIE.org
Web: http://spie.org
Copyright 2000, 2004, 2009 Society of Photo-Optical Instrumentation Engineers
All rights reserved. No part of this publication may be reproduced or distributed in any form or by
any means without written permission of the publisher.
The content of this book reflects the thought of the authors. Every effort has been made to publish
reliable and accurate information herein, but the publisher is not responsible for the validity of the
information or for any outcomes resulting from reliance thereon.
Printed in the United States of America.

Contents
Extended Contents

vii

Preface to the Handbook of Medical Imaging


Introduction to Volume 2: Medical Image Processing and Analysis
Chapter 1. Statistical Image Reconstruction Methods
Jeffrey A. Fessler
Chapter 2. Image Segmentation
Benoit M. Dawant, Alex P. Zijdenbos

xxiii
xxv
1

71

Chapter 3. Image Segmentation Using Deformable Models


Chenyang Xu, Dzung L. Pham, Jerry L. Prince

129

Chapter 4. Morphological Methods for Biomedical Image Analysis


John Goutsias, Sinan Batman

175

Chapter 5. Feature Extraction


Murray H. Loew

273

Chapter 6. Extracting Surface Models of the Anatomy from


Medical Images
Andre Gueziec
Chapter 7. Medical Image Interpretation
Matthew Brown, Michael McNitt-Gray

343

399

COLOR PLATES
Chapter 8. Image Registration
J. Michael Fitzpatrick, Derek L. G. Hill, Calvin R. Maurer, Jr.

447

Chapter 9. Signal Modeling for Tissue Characterization


Michael F. Insana, Kyle J. Myers, Laurence W. Grossman

515

vi Contents
Chapter 10. Validation of Medical Image Analysis Techniques
Kevin W. Bowyer

567

Chapter 11. Echocardiography


Florence Sheehan, David C. Wilson, David Shavelle, Edward A. Geiser

609

Chapter 12. Cardiac Image Analysis: Motion and Deformation


Xenophon Papademetris, James S. Duncan

675

Chapter 13. Angiography and Intravascular Ultrasound


Johan H. C. Reiber, Gerhard Koning, Jouke Dijkstra, Andreas Wahle,
Bob Goedhart, Florence H. Sheehan, Milan Sonka

711

Chapter 14. Vascular Imaging and Analysis


Milan Sonka, Weidong Liang, Robert M. Stefancik, Alan Stolpen

809

Chapter 15. Computer-Aided Diagnosis in Mammography


Maryellen L. Giger, Zhimin Huo, Matthew A. Kupinski, Carl J. Vyborny

915

Chapter 16. Pulmonary Imaging and Analysis


Joseph M. Reinhardt, Renuka Uppaluri, William E. Higgins,
Eric A. Hoffman

1005

Chapter 17. Brain Image Analysis and Atlas Construction


Paul M. Thompson, Michael S. Mega, Katherine L. Narr,
Elizabeth R. Sowell, Rebecca E. Blanton, Arthur W. Toga

1061

Chapter 18. Tumor Imaging, Analysis, and Treatment Planning


Michael W. Vannier

1131

Chapter 19. Soft Tissue Analysis via Finite Element Modeling


Leonid V. Tsap, Dmitry B. Goldgof, Sudeep Sarkar

1153

Index

1203

Extended Contents

Preface to the Handbook of Medical Imaging


Introduction to Volume 2: Medical Image Processing and Analysis
1

Statistical Image Reconstruction Methods


1.1 Introduction
1.2 The problem
1.2.1 Transmission measurements
1.2.2 Reconstruction problem
1.2.3 Likelihood-based estimation
1.2.4 Penalty function
1.2.5 Concavity
1.3 Optimization algorithms
1.3.1 Why so many algorithms?
1.3.2 Optimization transfer principle
1.3.3 Convergence rate
1.3.4 Parabola surrogate
1.4 EM algorithms
1.4.1 Transmission EM algorithm
1.4.2 EM algorithms with approximate M-steps
1.4.3 EM algorithm with Newton M-step
1.4.4 Diagonally-scaled gradient-ascent algorithms
1.4.5 Convex algorithm
1.4.6 Ordered-subsets EM algorithm
1.4.7 EM algorithms with nonseparable penalty functions
1.5 Coordinate-ascent algorithms
1.5.1 Coordinate-ascent Newton-Raphson
1.5.2 Variation 1: Hybrid Poisson/polynomial approach
1.5.3 Variation 2: 1D parabolic surrogates
1.6 Paraboloidal surrogates algorithms
1.6.1 Paraboloidal surrogate with Newton Raphson
1.6.2 Separable paraboloidal surrogates algorithm
1.6.3 Ordered subsets revisited

vii

xxiii
xxv
1
3
4
5
7
10
12
13
14
14
15
16
17
18
20
25
25
27
28
34
35
35
36
39
39
40
41
41
43

viii Extended Contents


1.6.4

1.7

1.8

1.9

1.10

1.11
1.12
1.13
1.14
1.15
2

Paraboloidal surrogates coordinate-ascent (PSCA)


algorithm
1.6.5 Grouped coordinate ascent algorithm
Direct algorithms
1.7.1 Conjugate gradient algorithm
1.7.2 Quasi-Newton algorithm
Alternatives to Poisson models
1.8.1 Algebraic reconstruction methods
1.8.2 Methods to avoid
1.8.3 Weighted least-squares methods
Emission reconstruction
1.9.1 EM Algorithm
1.9.2 An improved EM algorithm
1.9.3 Other emission algorithms
Advanced topics
1.10.1 Choice of regularization parameters
1.10.2 Source-free attenuation reconstruction
1.10.3 Dual energy imaging
1.10.4 Overlapping beams
1.10.5 Sinogram truncation and limited angles
1.10.6 Parametric object priors
Example results
Summary
Acknowledgements
Appendix: Poisson properties
References

Image Segmentation
2.1 Introduction
2.2 Image preprocessing and acquisition artifacts
2.2.1 Partial volume effect
2.2.2 Intensity nonuniformity (INU)
2.3 Thresholding
2.3.1 Shape-based histogram techniques
2.3.2 Optimal thresholding
2.3.3 Advanced thresholding methods for simultaneous
segmentation and INU correction
2.4 Edge-based techniques
2.4.1 Border tracing
2.4.2 Graph searching
2.4.3 Dynamic programming
2.4.4 Advanced border detection methods

44
45
45
45
46
46
47
47
48
49
50
51
52
52
52
53
53
53
53
53
54
56
57
57
58
71
73
73
74
74
78
78
79
84
88
88
89
91
93

Extended Contents ix

2.5

2.6

2.7
2.8
2.9
3

2.4.5 Hough transforms


Region-based segmentation
2.5.1 Region growing
2.5.2 Region splitting and merging
2.5.3 Connected component labeling
Classification
2.6.1 Basic classifiers and clustering algorithms
2.6.2 Adaptive fuzzy -means with INU estimation
2.6.3 Decision trees
2.6.4 Artificial neural networks
2.6.5 Contextual classifiers
Discussion and Conclusion
Acknowledgements
References

Image Segmentation Using Deformable Models


3.1 Introduction
3.2 Parametric deformable models
3.2.1 Energy minimizing formulation
3.2.2 Dynamic force formulation
3.2.3 External forces
3.2.4 Numerical implementation
3.2.5 Discussion
3.3 Geometric deformable models
3.3.1 Curve evolution theory
3.3.2 Level set method
3.3.3 Speed functions
3.3.4 Relationship to parametric deformable models
3.3.5 Numerical implementation
3.3.6 Discussion
3.4 Extensions of deformable models
3.4.1 Deformable Fourier models
3.4.2 Deformable models using modal analysis
3.4.3 Deformable superquadrics
3.4.4 Active shape models
3.4.5 Other models
3.5 Conclusion and future directions
3.6 Further reading
3.7 Acknowledgments
3.8 References

94
98
98
99
100
101
103
109
110
111
116
119
120
120
129
131
133
134
136
138
144
145
146
146
147
150
152
153
154
154
155
157
159
161
167
167
168
168
168

x Extended Contents
4

Morphological Methods for Biomedical Image Analysis


4.1 Introduction
4.2 Binary morphological operators
4.2.1 Increasing and translation invariant operators
4.2.2 Erosion and dilation
4.2.3 Representational power of erosions and dilations
4.2.4 Opening and closing
4.2.5 Representational power of structural openings and closings
4.2.6 The hit-or-miss operator
4.2.7 Morphological gradients
4.2.8 Conditional dilation
4.2.9 Annular openings and closings
4.2.10 Morphological filters
4.3 Morphological representation of binary images
4.3.1 The discrete size transform
4.3.2 The pattern spectrum
4.3.3 The morphological skeleton transform
4.4 Grayscale morphological operators
4.4.1 Threshold decomposition
4.4.2 Increasing and translation invariant operators
4.4.3 Erosion and dilation
4.4.4 Representational power of erosions and dilations
4.4.5 Opening and closing
4.4.6 Representational power of structural openings and closings
4.4.7 Flat image operators
4.4.8 Morphological gradients
4.4.9 Opening and closing top-hat
4.4.10 Conditional dilation
4.4.11 Morphological filters
4.5 Grayscale discrete size transform
4.6 Morphological image reconstruction
4.6.1 Reconstruction of binary images
4.6.2 Reconstruction of grayscale images
4.6.3 Examples
4.7 Morphological image segmentation
4.7.1 The distance transform
4.7.2 Skeleton by influence zones (SKIZ)
4.7.3 Watershed-based segmentation of nonoverlapping particles
4.7.4 Geodesic SKIZ
4.7.5 Watershed-based segmentation of overlapping particles
4.7.6 Grayscale segmentation
4.7.7 Examples

175
177
182
182
182
185
187
190
192
194
195
195
199
201
201
202
204
206
207
209
209
214
214
218
218
220
221
223
223
225
227
227
230
233
237
237
240
241
242
245
246
253

Extended Contents xi

4.8 Conclusions and further discussion


4.9 Acknowledgments
4.10 References

260
262
263

Feature Extraction
5.1 Introduction
5.1.1 Why features? Classification (formal or informal)
almost always depends on them
5.1.2 Review of applications in medical image analysis
5.1.3 Roots in classical methods
5.1.4 Importance of data and validation
5.2 Invariance as a motivation for feature extraction
5.2.1 Robustness as a goal
5.2.2 Problem-dependence is unavoidable
5.3 Examples of features
5.3.1 Features extracted from 2D images
5.4 Feature selection and dimensionality reduction for classification
5.4.1 The curse of dimensionality subset problem
5.4.2 Classification versus representation
5.4.3 Classifier-independent feature analysis for classification
5.4.4 Classifier-independent feature extraction
5.4.5 How useful is a feature: separability between classes
5.4.6 Classifier-independent feature analysis in practice
5.4.7 Potential for separation: nonparametric feature
extraction
5.4.8 Finding the optimal subset
5.4.9 Ranking the features
5.5 Features in practice
5.5.1 Caveats
5.5.2 Ultrasound tissue characterization
5.5.3 Breast MRI
5.6 Future developments
5.7 Acknowledgments
5.8 References

273
275

Extracting Surface Models of the Anatomy from Medical Images


6.1 Introduction
6.2 Surface representations
6.2.1 Point set
6.2.2 Triangular mesh
6.2.3 Curved surfaces
6.3 Iso-surface extraction
6.3.1 Hexahedral decomposition

275
276
278
279
279
279
280
280
280
286
286
286
287
291
291
295
296
304
306
308
308
308
325
335
335
335
343
345
345
345
346
348
348
349

xii Extended Contents

6.4

6.5

6.6

6.7

6.8
6.9
7

6.3.2 Tetrahedral decomposition


6.3.3 A look-up procedure to replace the determinant test
6.3.4 Computing surface curvatures
6.3.5 Extracting rib (or ridge, or crest) lines
6.3.6 Iso-surface examples
Building surfaces from two-dimensional contours
6.4.1 Extracting two-dimensional contours from an image
6.4.2 Tiling contours into a surface portion
Some topological issues in deformable surfaces
6.5.1 Tensor-product B-splines
6.5.2 Dynamic changes of topology
Optimization
6.6.1 Smoothing
6.6.2 Simplification and levels of detail
Exemplary algorithms operating on polygonal surfaces
6.7.1 Apparent contours and perspective registration
6.7.2 Surface projection for x-ray simulations
Conclusion and perspective
References

Medical Image Interpretation


7.1 Introduction
7.2 Image segmentation
7.2.1 Algorithms for extracting image primitives
7.2.2 Knowledge-based segmentation
7.3 Feature-based labeling/classification
7.3.1 Feature extraction from image primitives
7.3.2 Labeling/classification of image primitives
7.4 Knowledge representations and high-level image analysis
7.4.1 Knowledge representations
7.4.2 Classification using a knowledge base
7.4.3 High-level feature representation
7.4.4 High-level image interpretation
7.5 Image interpretation systems
7.5.1 Rule-based
7.5.2 Semantic network-based
7.5.3 Atlas-based
7.6 Applications
7.6.1 Visualization
7.6.2 Image measurements
7.6.3 Characterization
7.6.4 Content-based image archiving/retrieval

350
355
357
358
359
360
361
363
368
369
371
371
373
376
383
383
387
390
390
399
400
404
404
405
409
409
410
413
413
420
420
421
423
423
426
429
434
434
435
437
438

Extended Contents xiii


7.7
7.8

Discussion
References

439
440

COLOR PLATES
8

Image Registration
8.1 Introduction
8.1.1 Operational goal of registration
8.1.2 Classification of registration methods
8.2 Geometrical transformations
8.2.1 Rigid transformations
8.2.2 Nonrigid transformations
8.2.3 Rectification
8.3 Point-based methods
8.3.1 Points in rigid transformations
8.3.2 Points in scaling transformations
8.3.3 Points in perspective projections
8.3.4 Points in curved transformations
8.4 Surface-based methods
8.4.1 Disparity functions
8.4.2 Head and hat algorithm
8.4.3 Distance definitions
8.4.4 Distance transform approach
8.4.5 Iterative closest point algorithm
8.4.6 Weighted geometrical feature algorithm
8.5 Intensity-based methods
8.5.1 Similarity measures
8.5.2 Capture ranges and optimization
8.5.3 Applications of intensity-based methods
8.6 Conclusion
8.7 Acknowledgments
8.8 References

447
449
449
450
451
452
454
460
463
469
473
474
476
478
478
482
482
483
484
485
487
488
496
498
504
505
506

Signal Modeling for Tissue Characterization


9.1 Introduction
9.2 Continuous-to-discrete transformations
9.3 Ultrasonic waveform models
9.3.1 Object models
9.3.2 Pulse models
9.3.3 Sensitivity function
9.3.4 Object deformation model
9.3.5 Echo model for elasticity imaging
9.4 Ultrasonic applications

515
516
519
520
520
526
530
530
533
534

xiv Extended Contents


9.5

Magnetic resonance waveform models


9.5.1 Object description
9.5.2 The MR data set
9.6 Continuous-to-discrete transformations revisited
9.7 MR applications
9.7.1  and  as tissue parameters
9.7.2 Synthetic imaging
9.7.3 Contrast uptake studies
9.7.4 Spectroscopy
9.8 Summary
9.9 Acknowledgements
9.10 References
10 Validation of Medical Image Analysis Techniques
10.1 Introduction
10.2 Types of image analysis problems
10.3 Definitions of basic performance metrics
10.3.1 Performance metrics for measurement problems
10.3.2 Performance metrics for detection problems
10.3.3 Performance metrics for image segmentation
10.4 Methodologies for training and testing
10.4.1 Half-half
10.4.2 Leave-one-out
10.4.3 N-way cross-validation
10.5 Statistical tests
10.5.1 Test for difference between two sample means
10.5.2 Test for difference between two ROCs
10.5.3 Test for difference between two FROCs
10.6 Practical pitfalls in estimating performance
10.6.1 Use of synthetic data
10.6.2 Data sets of varying difficulty
10.6.3 Inadvertently cleaned data sets
10.6.4 Sensor-specific artifacts in images
10.6.5 Subjectively-defined detection criteria
10.6.6 Poor train/test methodology
10.6.7 Problems related to the ground truth
10.7 Conclusions
10.8 Discussion
10.9 Acknowledgments
10.10 References

536
538
539
542
545
545
546
547
550
556
556
557
567
569
570
571
571
571
577
581
582
582
583
583
587
589
590
590
590
591
592
593
595
596
597
600
601
604
604

Extended Contents xv
11 Echocardiography
11.1 Introduction
11.1.1 Overview of cardiac anatomy
11.1.2 Normal physiology
11.1.3 Role of ultrasound in cardiology
11.2 The echocardiographic examination
11.2.1 Image acquisition in standard views
11.2.2 M-mode and two dimensional echocardiography
11.2.3 doppler echocardiography
11.2.4 Three-dimensional echocardiography
11.3 The ventricles
11.3.1 Ventricular volume
11.3.2 Ventricular mass
11.3.3 Ventricular function
11.3.4 Ventricular shape
11.3.5 Clinical evaluation of the left ventricle
11.4 The valves
11.4.1 Assessment of the valves from echocardiograms
11.4.2 Valve stenosis
11.4.3 Valve regurgitation
11.5 Automated analysis
11.5.1 Techniques for border detection from echocardiographic
images
11.5.2 Validation
11.6 Acknowledgments
11.7 References

609
611
611
614
616
617
617
618
619
620
621
621
626
627
633
634
637
637
638
642
650

12 Cardiac Image Analysis: Motion and Deformation


12.1 Introduction
12.2 Invasive approaches to measuring myocardial
deformation
12.3 Approaches to obtaining estimates of cardiac deformation from 4D
images
12.3.1 Methods relying on magnetic resonance tagging
12.3.2 Methods relying on phase contrast MRI
12.3.3 Computer-vision-based methods
12.4 Modeling used for interpolation and smoothing
12.5 Case study: 3D cardiac deformation
12.5.1 Obtaining initial displacement data
12.5.2 Modeling the myocardium
12.5.3 Integrating the data and model terms
12.5.4 Results

675
676

654
661
664
664

678
679
679
683
685
686
690
690
693
694
695

xvi Extended Contents


12.6
12.7
12.8
12.9

Validation of results
Conclusions and further research directions
Appendix A: Comparison of mechanical models to regularization
References

698
703
704
705

13 Angiography and Intravascular Ultrasound


13.1 Introduction
13.2 X-ray angiographic imaging
13.2.1 Image acquisition
13.2.2 Basic principles of quantitative coronary arteriography
13.2.3 Complex vessel morphology
13.2.4 Densitometry
13.2.5 QCA validation
13.2.6 Off-line and on-line QCA
13.2.7 Digital QCA
13.2.8 Geometric corrections of angiographic images
13.2.9 Image compression
13.2.10 Edge enhancement
13.2.11 Future QCA directions
13.3 Biplane angiography and 3D reconstruction of coronary trees
13.3.1 Assessment of 3D geometry for biplane
angiographic systems
13.3.2 3D reconstruction of vessel trees from biplane
angiography
13.3.3 Medical applications of 3D reconstructions
13.4 Introduction to intravascular ultrasound
13.4.1 Quantitative intravascular ultrasound
13.4.2 Three-dimensional analysis of IVUS image sequences
13.5 Fusion of biplane angiography and IVUS
13.5.1 3D reconstruction of the IVUS catheter trajectory
13.5.2 Relative orientation of the IVUS frames: Catheter
twisting
13.5.3 Absolute orientation of the IVUS frames: A non-iterative
approach
13.5.4 Visualization techniques for fusion results
13.5.5 In-vitro validation
13.5.6 Application and validation in vivo
13.6 Left ventriculography
13.6.1 Visual assessment
13.6.2 Quantitative assessment of ventricular volume
13.6.3 Quantitative assessment of regional wall motion
13.6.4 Quantitative x-ray left ventriculography

711
713
715
715
718
724
728
729
737
739
741
744
746
749
750
751
753
755
757
760
770
776
776
777
779
780
781
783
784
785
785
788
791

Extended Contents xvii


13.6.5 Future expectations on fully automated assessment of the
left ventricular outlines
13.7 Acknowledgments
13.8 References

793
793
794

14 Vascular Imaging and Analysis


14.1 Introduction
14.1.1 Vascular imaging approaches
14.2 Ultrasound analysis of peripheral artery disease
14.2.1 Vascular ultrasound imaging
14.2.2 Intima-media thickness carotid ultrasound
image analysis
14.2.3 Vascular reactivity and endothelial function brachial
ultrasound image analysis
14.3 Magnetic resonance angiography
14.3.1 Vascular disease
14.3.2 Vascular imaging in clinical practice
14.3.3 Principles of MRA
14.3.4 Spatial encoding, spatial resolution, and k-space
14.3.5 MR properties of blood and MR contrast agents
14.3.6 Black blood MRA
14.3.7 Bright blood MRA without exogenous contrast
14.3.8 Contrast-enhanced bright blood MRA
14.3.9 MRA image display
14.3.10 Quantitative analysis of MRA images
14.3.11 Vasculature assessment via tubular object extraction and
tree growing
14.3.12 Arterial visualization via suppression of
major overlapping veins
14.3.13 Knowledge-based approach to vessel detection
and arteryvessel separation
14.3.14 Fuzzy connectivity approach to vessel detection and
arteryvessel separation
14.4 Computed tomography angiography and assessment of
coronary calcification
14.4.1 Quantitative analysis of coronary calcium via EBCT
14.5 Acknowledgments
14.6 References

809
811
812
813
814

15 Computer-Aided Diagnosis in Mammography


15.1 Introduction
15.2 Breast cancer
15.3 Radiographic manifestations of breast cancer

915
917
918
919

821
834
848
848
849
851
855
858
860
861
868
876
877
880
884
886
896
899
900
905
906

xviii Extended Contents


15.3.1 Radiographic screening for breast cancer
15.4 Image requirements in mammography
15.4.1 Technical performance of mammography
15.4.2 Positioning and compression
15.4.3 X-ray tubes and generators
15.4.4 Recording systems and scatter rejection
15.4.5 Regulation of mammography
15.5 Digitization
15.6 Computerized analysis of mammograms
15.6.1 Computer-aided detection
15.6.2 Computer-aided diagnosis
15.7 Segmentation of breast region and preprocessing
15.8 Lesion extraction
15.9 Feature extraction
15.9.1 Mass lesions
15.9.2 Clustered microcalcifications
15.10 Feature selection
15.10.1 1D analysis
15.10.2 Stepwise feature selection
15.10.3 Genetic algorithm
15.10.4 Selection with a limited database
15.11 Classifiers
15.11.1 Linear discriminant analysis
15.11.2 Artificial neural networks
15.11.3 Bayesian methods
15.11.4 Rule-based
15.11.5 MOGA (multi-objective genetic algorithm)
15.12 Presentation of CAD results
15.13 Evaluation of computer analysis methods
15.13.1 Effect of database
15.13.2 Effect of scoring
15.14 Evaluation of computer analysis method as an aid
15.15 (Pre-)clinical experiences and commercial systems
15.16 Discussion and summary
15.17 Acknowledgements
15.18 References
16 Pulmonary Imaging and Analysis
16.1 Introduction
16.1.1 Overview of pulmonary anatomy
16.1.2 Clinical applications
16.1.3 Imaging modalities

920
921
923
923
924
925
926
926
927
928
928
929
932
933
933
951
955
956
956
957
958
958
959
960
963
964
965
966
970
973
974
975
982
985
986
986
1005
1006
1006
1007
1009

Extended Contents xix


16.2 Image segmentation and analysis
16.2.1 Lung and lobe segmentation
16.2.2 Airway and vascular tree segmentation
16.3 Applications
16.3.1 Tissue analysis
16.3.2 Functional analysis
16.3.3 Virtual bronchoscopy
16.4 Summary and future directions
16.5 References

1011
1011
1017
1028
1028
1038
1042
1050
1050

17 Brain Image Analysis and Atlas Construction


17.1 Challenges in brain image analysis
17.1.1 Image analysis and brain atlases
17.1.2 Adaptable brain templates
17.1.3 Mapping structural differences
17.1.4 Probabilistic atlases
17.1.5 Encoding cortical variability
17.1.6 Disease-specific atlases
17.1.7 Dynamic (4D) brain data
17.2 Registration to an atlas
17.2.1 The Talairach system
17.2.2 Digital templates
17.3 Deformable brain atlases
17.3.1 Atlas-to-brain transformations
17.4 Warping algorithms
17.4.1 Intensity-driven approaches
17.4.2 Bayesian methods
17.4.3 Polynomial mappings
17.4.4 Continuum-mechanical transformations
17.4.5 Navier-Stokes equilibrium equations
17.4.6 Viscous fluid approaches
17.4.7 Acceleration with fast filters
17.4.8 Neural network implementations
17.5 Model-driven deformable atlases
17.5.1 Anatomical modeling
17.5.2 Parametric meshes
17.5.3 Automated parameterization
17.5.4 Voxel coding
17.5.5 Model-based deformable atlases
17.6 Probabilistic atlases and model-based morphometry
17.6.1 Anatomical modeling
17.6.2 Parametric mesh models

1061
1064
1064
1064
1065
1065
1065
1067
1067
1067
1068
1068
1068
1068
1069
1069
1071
1072
1072
1073
1074
1075
1076
1077
1079
1079
1082
1084
1084
1086
1086
1086

xx Extended Contents
17.6.3 3D maps of variability and asymmetry
17.6.4 Alzheimers disease
17.6.5 Gender in schizophrenia
17.7 Cortical modeling and analysis
17.7.1 Cortical matching
17.7.2 Spherical, planar maps of cortex
17.7.3 Covariant field equations
17.8 Cortical averaging
17.8.1 Cortical variability
17.8.2 Average brain templates
17.8.3 Uses of average templates
17.9 Deformation-based morphometry
17.9.1 Deformable probabilistic atlases
17.9.2 Encoding brain variation
17.9.3 Tensor maps of directional variation
17.9.4 Anisotropic Gaussian fields
17.9.5 Detecting shape differences
17.9.6 Tensor-based morphometry
17.9.7 Mapping brain asymmetry
17.9.8 Changes in asymmetry
17.9.9 Abnormal asymmetry
17.9.10 Model-based shape analysis
17.10 Voxel-based morphometry
17.10.1 Detecting changes in stereotaxic tissue distribution
17.10.2 Stationary Gaussian random fields
17.10.3 Statistical flattening
17.10.4 Permutation
17.10.5 Joint assessment of shape and tissue distribution
17.11 Dynamic (4D) brain maps
17.12 Conclusion
17.13 Acknowledgments
17.14 References
18 Tumor Imaging, Analysis, and Treatment Planning
18.1 Introduction
18.2 Medical imaging paradigms
18.3 Dynamic imaging
18.4 Conventional and physiological imaging
18.5 Tissue-specific and physiologic nuclear medicine modalities
18.6 Positron emission tomography
18.7 Dynamic contrast-enhanced MRI
18.8 Functional CT and MRI

1088
1088
1090
1090
1090
1093
1096
1097
1097
1098
1101
1101
1101
1101
1103
1104
1105
1106
1107
1107
1108
1109
1109
1109
1110
1111
1111
1112
1114
1114
1117
1117
1131
1132
1135
1136
1136
1137
1138
1139
1142

Extended Contents xxi


18.9 Perfusion
18.10 Perfusion MRI
18.11 Future
18.12 References

1142
1148
1150
1150

19 Soft Tissue Analysis via Finite Element Modeling


19.1 Introduction
19.1.1 Motivation
19.1.2 Applications
19.1.3 Previous work
19.2 Theoretical background
19.2.1 Application of active contours
19.2.2 Measuring rigidity of the objects
19.2.3 Finite element method
19.2.4 Finite element implementation of large strain
nonlinearities
19.2.5 Iterative descent search in one dimension
19.3 Human skin, neck, and hand modeling and motion analysis
19.3.1 Methods, assumptions, and system accuracy
19.3.2 Modeling, strain distributions, and abnormalities
19.3.3 Experimental results of skin and neck motion with
simulated abnormalities
19.3.4 Hand motion analysis
19.4 Burn scar assessment technique
19.4.1 Overview
19.4.2 Specifics of grid tracking
19.4.3 Burn scar experiments
19.5 Advanced assessment and modeling issues
19.5.1 Additional issues in scar assessment
19.5.2 Integration of recovered properties into complex models
19.6 Conclusions
19.7 References

1153
1155
1155
1157
1159
1162
1162
1163
1164

1172
1174
1179
1179
1180
1185
1192
1192
1193
1195
1196

Index

1203

1165
1166
1168
1168
1170

Preface to the Handbook of Medical Imaging


During the last few decades of the twentieth century, partly in concert with the
increasing availability of relatively inexpensive computational resources, medical imaging technology, which had for nearly 80 years been almost exclusively
concerned with conventional film/screen x-ray imaging, experienced the development and commercialization of a plethora of new imaging technologies. Computed
tomography, MRI imaging, digital subtraction angiography, Doppler ultrasoundimaging, and various imaging techniques based on nuclear emission (PET, SPECT,
etc.) have all been valuable additions to the radiologists arsenal of imaging tools
toward ever more reliable detection and diagnosis of disease. More recently, conventional x-ray imaging technology itself is being challenged by the emerging possibilities offered by flat panel x-ray detectors. In addition to the concurrent development of rapid and relatively inexpensive computational resources, this era of
rapid change owes much of its success to an improved understanding of the information theoretic principles on which the development and maturation of these
new technologies is based. A further important corollary of these developments
in medical imaging technology has been the relatively rapid development and deployment of methods for archiving and transmitting digital images. Much of this
engineering development continues to make use of the ongoing revolution in rapid
communications technology offered by increasing bandwidth.
A little more than 100 years after the discovery of x rays, this three-volume
Handbook of Medical Imaging is intended to provide a comprehensive overview
of the theory and current practice of Medical Imaging as we enter the twenty-first
century. Volume 1, which concerns the physics and the psychophysics of medical
imaging, begins with a fundamental description of x-ray imaging physics and progresses to a review of linear systems theory and its application to an understanding
of signal and noise propagation in such systems. The subsequent chapters concern the physics of the important individual imaging modalities currently in use:
ultrasound, CT, MRI, the recently emerging technology of flat-panel x-ray detectors and, in particular, their application to mammography. The second half of this
volume, which covers topics in psychophysics, describes the current understanding of the relationship between image quality metrics and visual perception of the
diagnostic information carried by medical images. In addition, various models of
perception in the presence of noise or unwanted signal are described. Lastly, the
statistical methods used in determining the efficacy of medical imaging tasks, and
xxiii

xxiv Preface to the Handbook of Medical Imaging


ROC analysis and its variants, are discussed.
Volume 2, which concerns Medical Image Processing and Image Analysis, provides descriptions of the methods currently being used or developed for enhancing
the visual perception of digital medical images obtained by a wide variety of imaging modalities and for image analysis as a possible aid to detection and diagnosis. Image analysis may be of particular significance in future developments, since,
aside from the inherent efficiencies of digital imaging, the possibility of performing
analytic computation on digital information offers exciting prospects for improved
detection and diagnostic accuracy.
Lastly, Volume 3 describes the concurrent engineering developments that in
some instances have actually enabled further developments in digital diagnostic
imaging. Among the latter, the ongoing development of bright, high-resolution
monitors for viewing high-resolution digital radiographs, particularly for mammography, stands out. Other efforts in this field offer exciting, previously inconceivable
possibilities, e.g., the use of 3D (virtual reality) visualization for surgical planning
and for image-guided surgery. Another important area of ongoing research in this
field involves image compression, which in concert with increasing bandwidth enables rapid image communication and increases storage efficiency. The latter will
be particularly important with the expected increase in the acceptance of digital
radiography as a replacement for conventional film/screen imaging, which is expected to generate data volumes far in excess of currently available capacity. The
second half of this volume describes current developments in Picture Archiving and
Communications System (PACS) technology, with particular emphasis on integration of the new and emerging imaging technologies into the hospital environment
and the provision of means for rapid retrieval and transmission of imaging data.
Developments in rapid transmission are of particular importance since they will
enable access via telemedicine to remote or underdeveloped areas.
As evidenced by the variety of the research described in these volumes, medical
imaging is still undergoing very rapid change. The editors hope that this publication
will provide at least some of the information required by students, researchers,
and practitioners in this exciting field to make their own contributions to its ever
increasing usefulness.
Jacob Beutel
J. Michael Fitzpatrick
Steven C. Horii
Yongmin Kim
Harold L. Kundel
Milan Sonka
Richard L. Van Metter

Introduction to Volume 2: Medical Image


Processing and Analysis
The subject matter of this volume, which is well described by its name, Medical
Image Processing and Analysis, has not until now been the focus of a rigorous,
detailed, and comprehensive book. While there are many such books available for
the more general subjects of image processing and image analysis, their broader
scope does not allow for a thorough examination of the problems and approaches
to their solution that are specific to medical applications. It is the purpose of this
work to present the ideas and the methods of image processing and analysis that
are at work in the field of medical imaging.
There is much common ground, of course. Image processing, whether it be
applied to robotics, computer vision, or medicine, will treat imaging geometry, linear transforms, shift-invariance, the frequency domain, digital versus continuous
domains, segmentation, histogram analysis, morphology, and other topics that apply to any imaging modality and any application. Image analysis, regardless of its
application area, encompasses the incorporation of prior knowledge, the classification of features, the matching of models to subimages, the description of shape, and
many of the generic problems and approaches of artificial intelligence. However,
while these classic approaches to general images and to general applications are
important, the special nature of medical images and medical applications requires
special treatment. This volume emphasizes those approaches that are appropriate
when medical images are the subjects of processing and analysis. With the emphasis placed firmly on medical applications and with the accomplishments of the
more general field used as a starting point, the chapters that follow are able, individually, to treat their respective topics thoroughly, and they serve, collectively, to
describe the current state of the field of medical image processing and analysis in
great depth.
The special nature of medical images derives as much from their method of
acquisition as it does from the subjects whose images are being acquired. While
surface imaging is used in some applications, for example for the examination of
the properties of skin in Chapter 19, medical imaging has been distinguished primarily by its ability to provide information about the volume beneath the surface,
a capability that sprang first from the discovery of x radiation some one hundred
years ago. Indeed, images are obtained for medical purposes almost exclusively to
xxv

xxvi Introduction to Volume 2: Medical Image Processing and Analysis


probe the otherwise invisible anatomy below the skin. This information may be in
the form of the two-dimensional projections acquired by traditional radiography,
the two-dimensional slices of B-mode ultrasound, or full three-dimensional mappings, such as those provided by computed tomography (CT), magnetic resonance
(MR) imaging, single photon emission computed tomography (SPECT), positron
emission tomography (PET), and 3D ultrasound. Volume 1 in this series provides
a detailed look at the current state of these modalities.
In the case of radiography, perspective projection maps physical points into image space in the same way as photography, but the detection and classification of
objects is confounded by the presence of overlying or underlying tissue, a problem rarely considered in general works on image analysis. In the case of tomography, three-dimensional images bring both complications and simplifications to
processing and analysis relative to two-dimensional ones: The topology of three
dimensions is more complex than that of two dimensions, but the problems associated with perspective projection and occlusion are gone. In addition to these
geometrical differences, medical images typically suffer more from the problems
of discretization, where larger pixels (voxels in three dimensions) and lower resolution combine to reduce fidelity. Additional limitations to image quality arise
from the distortions and blurring associated with relatively long acquisition times
in the face of inevitable anatomical motion primarily cardiac and pulmonary, and
reconstruction errors associated with noise, beam hardening, etc. These and other
differences between medical and nonmedical techniques of image acquisition account for many of the differences between medical and nonmedical approaches to
processing and analysis.
The fact that medical image processing and analysis deal mostly with living bodies brings other major differences in comparison to computer or robot vision. The objects of interest are soft and deformable with three-dimensional shapes
whose surfaces are rarely rectangular, cylindrical, or spherical and whose features
rarely include the planes or straight lines that are so frequent in technical vision
applications. There are however major advantages in dealing with medical images
that contribute in a substantial way to the analysis design. The available knowledge
of what is and what is not normal human anatomy is one of them. Recent advances
in selective enhancement of specific organs or other objects of interest via the injection of contrast-enhancing material represent others. All these differences affect
the way in which images are effectively processed and analyzed.
Validation of the developed medical image processing and analysis techniques
is major part of any medical imaging application. While validating the results of
any methodology is always important, the scarcity of accurate and reliable independent standards creates yet another challenge for the medical imaging field.
Medical image processing deals with the development of problem-specific approaches to the enhancement of raw medical image data for the purposes of selective visualization as well as further analysis. Medical image analysis then con-

Introduction to Volume 2: Medical Image Processing and Analysis xxvii


centrates on the development of techniques to supplement the mostly qualitative
and frequently subjective assessment of medical images by human experts with
a variety of new information that is quantitative, objective, and reproducible. Of
course, a treatment of medical image processing and analysis without a treatment
of the methods by which images are acquired, displayed, transmitted, and stored
would provide only a limited view of the field. The two accompanying volumes
of this handbook complete the picture with complementary information on all of
these topics. Image acquisition approaches are presented in Volume 1, and image
visualization, virtual reality, image transmission, compression, and archiving are
dealt with in Volume 3.
The volume you hold in your hands is a result of the work of a dedicated team
of researchers in medical image processing and analysis. The editors have worked
very closely with the authors of individual chapters to produce a coherent volume
with a uniformly deep treatment of all its topics, as well as to provide a comprehensive coverage of the field. Its pages include many cross references that further
enhance the usability of the nineteen chapters, which treat separate but frequently
interrelated topics. The book is loosely divided into two parts: Generally applicable theory is provided in Chapters 110 with the remaining chapters devoted more
specifically to separate application areas. Nevertheless, many general approaches
are presented in the latter group of chapters in synergy with information that is
pertinent to specific applications. Each of the chapters is accompanied by numerous figures, example images, and abundant references to the literature for further
reading.
The first part of this volume, which emphasizes general theory, begins with a
rigorous treatment of statistical image reconstruction in Chapter 1. Author, J.
Fessler, deals with the problems of tomographic reconstruction when the number
of detected photons is so low that Poisson statistics must be taken into account. In
this regime standard back-projection methods fail, but maximum likelihood methods, if properly applied, can still produce good images. Fessler gives a rigorous
presentation of optimization methods for this problem with assessments of their
practical implementation, telling us what works and what does not. The focus is on
attenuation images, but the simpler problem of emission tomography is treated as
well.
Image segmentation, which is the partitioning of an image into regions that are
meaningful for a specific task, is one of the first steps leading to image analysis and
interpretation. Chapter 2, which presents this subject, was written by B. Dawant
and A. Zijdenbos and deals with the detection of organs such as the heart, the liver,
the brain, or the lungs in images acquired by various imaging modalities.
Image segmentation using deformable models is the topic of Chapter 3, written by C. Xu, D. L. Pham, and J. L. Prince. Parametric and geometric deformable
models are treated in a unifying way, and an explicit mathematical relationship between them is presented. The chapter also provides a comprehensive overview of

xxviii Introduction to Volume 2: Medical Image Processing and Analysis


many extensions to deformable models including deformable Fourier models, deformable superquadrics, point distribution models, and active appearance models.
Chapter 4, prepared by J. Goutsias and S. Batman, provides a comprehensive
treatment of binary as well as gray-scale mathematical morphology. The theoretical concepts are illustrated on examples demonstrating their direct applicability to
problems in medical image processing and analysis.
Chapter 5 is devoted to the extraction of description features from medical
images and was written by M. Loew. The text summarizes the need for image features, categorizes them in several ways, presents the constraints that may determine
which features to employ in a given application, defines them mathematically, and
gives examples of their use in research and in clinical settings.
A. Gueziec authored Chapter 6, which describes methods for extracting surface models of the anatomy from medical images, choosing an appropriate surface
representation, and optimizing surface models. This chapter provides detailed algorithms for surface representation and evaluates and compares their performance
on real-life examples.
Image interpretation is one of the ultimate goals of medical imaging and utilizes techniques of image segmentation, feature description, and surface representation. It is heavily dependent on a priori knowledge and available approaches for
pattern recognition, general interpretation, and understanding. It is also a frequent
prerequisite for highly automated quantitative analysis. This topic is treated in
Chapter 7 by M. Brown and M. McNitt-Gray.
Chapter 8 was authored by J. M. Fitzpatrick, D. L. G. Hill, and C. R. Maurer,
Jr. and presents the field of image registration. The goal of registration, which
is simply to map points in one view into corresponding points in a second view,
is important when information from two images is to be combined for diagnosis
or when images are used to guide surgery. This chapter presents the theoretical as
well as experimental aspects of the problem and describes many approaches to its
solution, emphasizing the highly successful application to rigid objects.
One of the promising directions of medical image analysis is the potential ability to perform soft tissue characterization from image-based information. This
topic is treated in Chapter 9 by M. Insana, K. Myers, and L. Grossman. Tissue
characterization is approached as a signal processing problem of extracting and
presenting diagnostic information obtained from medical image data to improve
classification performance, or to more accurately describe biophysical mechanisms.
This chapter discusses in detail the difficulties resulting from the lack of accurate
models of image signals and provides an insight into tissue modeling strategies.
Validation of the image analysis techniques is a necessity in medical imaging.
In Chapter 10, K. Bowyer focuses on the problem of measuring the performance
of medical image processing and analysis techniques. In this context, performance
relates to the frequency with which an algorithm results in a correct decision. The
chapter provides an overview of basic performance metrics, training and testing

Introduction to Volume 2: Medical Image Processing and Analysis xxix


methodologies, and methods for statistical testing, as well as it draws attention to
commonly occurring flaws of validation.
Chapter 11 opens the second part of this volume, which emphasizes applications, by providing detailed information about cardiac anatomy in the context of
echocardiographic imaging and analysis. The chapter was written by F. Sheehan, D. Wildon, D. Shavelle, and E. C. Geiser and consists of sections devoted
to echocardiographic imaging including 3-D echocardiography, echocardiographic
assessment of ventricular volume, mass, and function, and their clinical consequences. Separate sections are devoted to imaging and analysis of valvular morphology and function as well as to an overview of available automated analysis
approaches and their validation.
Ventricular motion and function is a topic of Chapter 12, which was contributed
by X. Papademetris and J. Duncan. This chapter further explores the diagnostic utility of estimating cardiac motion and deformation from medical images.
The authors focus primarily on the use of 3D MR image sequences, while also discussing the applications to ultrafast CT and 3D ultrasound. Description of magnetic
resonance tagging, tag detection, and phase-contrast methods are all included and
motion assessment is discussed in the context of the corresponding image data.
The following chapter, Chapter 13, is authored by J. H. C. Reiber, G. Koning, J. Dijkstra, A. Wahle, B. Goedhart, F. Sheehan, and M. Sonka and deals with
minimally invasive approaches to imaging the heart and coronary arteries using
contrast angiography and intravascular ultrasound. This chapter summarizes the
preprocessing of angiography images, geometric correction techniques, and analysis approaches leading to quantitative coronary angiography as well as quantitative
left ventriculography. Approaches for three-dimensional reconstruction from biplane angiography projections are discussed. Later sections deal with quantitative
intravascular ultrasound techniques and introduce methodology for image data fusion of biplane angiography and intravascular ultrasound to achieve a geometrically
correct representation of coronary lumen and wall morphology.
Chapter 14 treats ultrasound, MR, and CT approaches to non-invasive vascular imaging and subsequent image analysis. The chapter, written by M. Sonka,
A. Stolpen, W. Liang, and R. Stefancik, covers the determination of intimamedia
thickness using carotid ultrasound, assessment of brachial artery endothelial function, as well as the imaging of peripheral and brain vasculature via MR angiography and x-ray CT angiography. The chapter contains methods for determining
the topology and morphology of vascular structures and demonstrates how X-ray
CT can be used to determine coronary calcification. Overall, facilitating early diagnosis of cardiovascular disease is one of the main goals of the methodologies
presented.
Mammography accounts for one of the most challenging, as well as most
promising, recent additions to the set of highly automated applications of medical
imaging. In Chapter 15 authors, M. L. Giger, Z. Huo, M. A. Kupinski, and C.

xxx Introduction to Volume 2: Medical Image Processing and Analysis


J. Vyborny, provide a comprehensive treatment of computer-aided diagnosis as a
second opinion for the mammographer. The entire process of image acquisition,
segmentation, lesion extraction, and classification is treated in this comprehensive
chapter along with a careful look at the important problem of clinical validation.
Pulmonary imaging and analysis is the topic of Chapter 16, prepared by J.
Reinhardt, R. Uppaluri, W. Higgins, and E. Hoffman. After a brief overview of pulmonary anatomy and a survey of methods and clinical applications of pulmonary
imaging, the authors discuss pulmonary image analysis leading to the segmentation of lungs and lobes, vascular and airway tree segmentation, as well as the role
of virtual bronchoscopy. Approaches for the characterization of pulmonary tissue
are discussed followed by sections devoted to pulmonary mechanics, image-based
perfusion and ventilation, and multi-modality data fusion.
Chapter 17, authored by P. M. Thompson, M. S. Mega, K. L. Narr, E. R. Sowell,
R. E. Blanton, and A. W. Toga, presents the subjects of brain imaging, analysis,
and atlas construction. The authors describe brain atlases that fuse data across
subjects, imaging modalities, and time, storing information on variations in brain
structure and function in large populations. The chapter then reviews the main
types of algorithms used in brain image analysis, including approaches for nonrigid image registration, anatomical modeling, tissue classification, cortical surface
mapping, and shape analysis. Applications include the use of atlases to uncover
disease-specific patterns of brain structure and function, and to analyze the dynamic processes of brain development and degeneration.
Chapter 18, by M. W. Vannier, is devoted to tumor imaging, analysis, and
cancer treatment planning. The chapter summarizes the use of imaging in diagnosis and treatment of solid tumors. It emphasizes current imaging technologies
and image processing methods used to extract information that can guide and monitor interventions after cancer has been detected leading to initial diagnosis and
staging.
Chapter 19, the concluding chapter of this volume, deals with light imaging of
soft tissue movement and its finite element modeling. It was contributed by L.
V. Tsap, D. B. Goldgof, and S. Sarkar. The main topic is the analysis of soft tissue
motion descriptors not easily recoverable from visual observations. The descriptors
include strain and initially unknown (or hard to observe) local material properties.
New methods for human tissue motion analysis from range image sequences using the nonlinear finite element method are provided, and their practical utility is
demonstrated, using assessment of burn scar tissue severity and the extent of repetitive motion injury.
Medical image processing and analysis has, over the last thirty years or so,
evolved from an assortment of medical applications into an established discipline.
The transition has been achieved through the cooperation of a large and growing
number of talented scientists, engineers, physicians, and surgeons, many of whose
ideas and accomplishments are detailed by the authors of these chapters. We have

Introduction to Volume 2: Medical Image Processing and Analysis xxxi


produced this volume in order to make these achievements accessible to researchers
both inside and outside the medical imaging field. It is our hope that its publication
will encourage others to join us in the common goal of improving the diagnosis and
treatment of disease and injury by means of medical imaging.
Acknowledgments
This volume was prepared using the LATEX 2 publishing tool. All communication between the editors and authors was carried out over the Internet via E-mail,
FTP, and the Web. All the authors, who contributed the individual chapters and
agreed to adhere to strict deadlines and the demanding requirements of common
formatting and cross referencing across chapter boundaries, share in the credit for
the final product as you see it. Their enthusiasm, patience, and cooperation helped
us to overcome all the obstacles that we encountered during preparation of this
publication.
Several other people contributed enthusiasm and expertise during the fourteen
months devoted to preparation of this volume. We acknowledge Juerg Tschirren,
graduate student of the Department of Electrical and Computer Engineering at the
University of Iowa, for sharing his deep knowledge of the LATEX 2 environment,
solving hundreds of small problems that we all faced as authors, as well as for
designing procedures for the exchange of information that served us all so well and
that lead to the successful publication of this camera-ready volume.
Ken Hanson of Los Alamos National Laboratory, was present at the beginning
of this project and was in fact responsible for getting it underway. We wish to
thank him for editorial help and for his advice and encouragement. We acknowledge Rick Hermann, SPIE Press Manager, for his visionary support of this project,
and we acknowledge the support that we received from our respective home universities The University of Iowa at Iowa City and Vanderbilt University, Nashville,
Tennessee.
Finally, we wish to thank those who gave us time to spend on this book that
otherwise would have been spent with them Jitka Sonkova, Marketa Sonkova,
Pavlina Sonkova, Patricia F. Robinson, Katherine L. Fitzpatrick, John E. Fitzpatrick, and Dorothy M. Fitzpatrick. Without their patience, understanding, and
support, this work would not have been possible.

Milan Sonka
milan-sonka@uiowa.edu

J. Michael Fitzpatrick
j.michael.fitzpatrick@vanderbilt.edu

CHAPTER 1
Statistical Image Reconstruction Methods for
Transmission Tomography
Jeffrey A. Fessler
University of Michigan

Contents
1.1
1.2

Introduction
The problem
1.2.1
1.2.2
1.2.3
1.2.4

1.3

1.4

1.5

3
4

Transmission measurements
Reconstruction problem
Likelihood-based estimation
Penalty function

5
7
10
12

1.2.5 Concavity
Optimization algorithms

13
14

1.3.1
1.3.2

Why so many algorithms?


Optimization transfer principle

14
15

1.3.3
1.3.4

Convergence rate
Parabola surrogate

16
17

EM algorithms
1.4.1 Transmission EM algorithm
1.4.2 EM algorithms with approximate M-steps
1.4.3 EM algorithm with Newton M-step
1.4.4 Diagonally-scaled gradient-ascent algorithms

18
20
25
25
27

1.4.5 Convex algorithm


1.4.6 Ordered-subsets EM algorithm
1.4.7 EM algorithms with nonseparable penalty functions
Coordinate-ascent algorithms
1.5.1 Coordinate-ascent Newton-Raphson
1.5.2 Variation 1: Hybrid Poisson/polynomial approach

28
34
35
35
36
39

2 Statistical Image Reconstruction Methods

1.6

1.5.3 Variation 2: 1D parabolic surrogates


Paraboloidal surrogates algorithms

39
40

1.6.1
1.6.2
1.6.3
1.6.4

41
41
43

1.6.5
1.7

1.8

1.9

Paraboloidal surrogate with Newton Raphson


Separable paraboloidal surrogates algorithm
Ordered subsets revisited
Paraboloidal surrogates coordinate-ascent (PSCA)
algorithm
Grouped coordinate ascent algorithm

Direct algorithms
1.7.1 Conjugate gradient algorithm
1.7.2 Quasi-Newton algorithm
Alternatives to Poisson models
1.8.1 Algebraic reconstruction methods

45
45
46
46
47

1.8.2

47

Methods to avoid

1.8.3 Weighted least-squares methods


Emission reconstruction
1.9.1 EM Algorithm

1.9.2 An improved EM algorithm


1.9.3 Other emission algorithms
1.10 Advanced topics

1.11
1.12
1.13
1.14

44
45

48
49
50
51
52
52

1.10.1 Choice of regularization parameters


1.10.2 Source-free attenuation reconstruction
1.10.3 Dual energy imaging

52
53
53

1.10.4 Overlapping beams


1.10.5 Sinogram truncation and limited angles
1.10.6 Parametric object priors
Example results
Summary
Acknowledgements
Appendix: Poisson properties

53
53
53
54
56
57
57

1.15 References

58

Introduction 3
1.1

Introduction

The problem of forming cross-sectional or tomographic images of the attenuation characteristics of objects arises in a variety of contexts, including medical x-ray
computed tomography (CT) and nondestructive evaluation of objects in industrial
inspection. In the context of emission imaging, such as positron emission tomography (PET) [1, 2], single photon emission computed tomography (SPECT) [3],
and related methods used in the assay of containers of radioactive waste [4], it is
useful to be able to form attenuation maps, tomographic images of attenuation
coefficients, from which one can compute attenuation correction factors for use in
emission image reconstruction. One can measure the attenuating characteristics of
an object by transmitting a collection of photons through the object along various
paths or rays and observing the fraction that pass unabsorbed. From measurements collected over a large set of rays, one can reconstruct tomographic images of
the object. Such image reconstruction is the subject of this chapter.
In all the above applications, the number of photons one can measure in a
transmission scan is limited. In medical x-ray CT, source strength, patient motion,
and absorbed dose considerations limit the total x-ray exposure. Implanted objects
such as pacemakers also significantly reduce transmissivity and cause severe artifacts [5]. In industrial applications, source strength limitations, combined with the
very large attenuation coefficients of metallic objects, often result in a small fraction of photons passing to the detector unabsorbed. In PET and SPECT imaging,
the transmission scan only determines a nuisance parameter of secondary interest
relative to the objects emission properties, so one would like to minimize the transmission scan duration. All the above considerations lead to low-count transmission scans. This chapter discusses algorithms for reconstructing attenuation images
from low-count transmission scans. In this context, we define low-count to mean
that the mean number of photons per ray is small enough that traditional filteredbackprojection (FBP) images, or even methods based on the Gaussian approximation to the distribution of the Poisson measurements (or logarithm thereof), are
inadequate. We focus the presentation in the context of PET and SPECT transmission scans, but the methods are generally applicable to all low-count transmission
studies. See [6] for an excellent survey of statistical approaches for the emission
reconstruction problem.
Statistical methods for reconstructing attenuation images from transmission
scans have increased in importance recently for several reasons. Factors include
the necessity of reconstructing 2D attenuation maps for reprojection to form 3D
attenuation correction factors in septaless PET [7, 8], the widening availability of
SPECT systems equipped with transmission sources [9], and the potential for reducing transmission noise in whole body PET images and in other protocols requiring short transmission scans [10]. An additional advantage of reconstructing
attenuation maps in PET is that if the patient moves between the transmission and
emission scan, and if one can estimate this motion, then one can calculate appropri-

4 Statistical Image Reconstruction Methods


ate attenuation correction factors by reprojecting the attenuation map at the proper
angles.
The traditional approach to tomographic image reconstruction is based on the
nonstatistical filtered backprojection method [11, 12]. The FBP method and the
data-weighted least-squares method [13, 14] for transmission image reconstruction
both lead to systematic biases for low-count scans [1416]. These biases are due
to the nonlinearity of the logarithm applied to the transmission data. To eliminate these biases, one can use statistical methods based on the Poisson measurement statistics. These method use the raw measurements rather than the logarithms
thereof [14, 1719]. Statistical methods also produce images with lower variance
than FBP [14, 16, 20]. Thus, in this chapter we focus on statistical methods.
The organization of this chapter is as follows. Section 1.2 first reviews the
low-count tomographic reconstruction problem. Section 1.3 gives an overview of
the principles underlying optimization algorithms for image reconstruction. Section 1.4 through Section 1.7 describe in detail four categories of reconstruction
algorithms: expectation maximization, coordinate ascent, paraboloidal surrogates,
and direct algorithms. All these algorithms are presented for the Poisson statistical
model; Section 1.8 summarizes alternatives to that approach. Section 1.9 briefly
summarizes application of the algorithms to emission reconstruction. Section 1.10
gives an overview of some advanced topics. Section 1.11 presents illustrative results for real PET transmission scans.
A few of the algorithms presented are new in the sense that they are derived
here under more realistic assumptions than were made in some of the original papers. And we have provided simple extensions to some algorithms that were not
intrinsically monotonic as previously published, but can be made monotonic by
suitable modifications.
1.2

The problem

In transmission tomography, the quantity of interest is the spatial distribution


of the linear attenuation coefficient, denoted   where         denotes spatial location in 3-space, and the argument parameterizes the dependence
of the attenuation coefficient on incident photon energy [12]. The units of are
typically inverse centimeters (cm ). If the object (patient) is moving, or if there
are variations due to, e.g., flowing contrast agent, then we could also denote the
temporal dependence. For simplicity we assume the object is static in this chapter.
The ideal transmission imaging modality would provide a complete description of
  for a wide range of energies, at infinitesimal spatial and temporal resolutions, at a modest price, and with no harm to the subject. In practice we settle for
much less.

The problem 5
Object



Collimator

Translate

Source



Rotate

Detector

Figure 1.1: Transmission scanning geometry for a 1st-generation CT scanner.

1.2.1

Transmission measurements

The methods described in this chapter are applicable to general transmission


geometries. However, the problem is simplest to describe in the context of a firstgeneration CT scanner as illustrated in Fig. 1.1. A collimated source of photons
with intensity   is transmitted through the attenuating object and the transmitted photons are recorded by a detector with detector efficiency  . The source
and detector are translated and rotated around the object. We use the letter  to index
the source/detector locations, where    
 . Typically
 is the product
of the number of radial positions assumed by the source for each angular position
times the number of angular positions. For 2D acquisitions, in SPECT transmission
scans
   ; in PET transmission scans,
   ; and in modern x-ray CT
systems
   .
For simplicity, we assume that the collimation eliminates scattered photons,
which is called the narrow beam geometry. In PET and SPECT transmission
scans, the source is usually a monoenergetic radioisotope 1 that emits gamma photons with a single energy  , i.e.,



  

 

where  is the Dirac delta function. For simplicity, we assume this monoenergetic case hereafter. (In the polyenergetic case, one must consider effects such as
beam hardening [21].)
The absorption and Compton scattering of photons by the object is governed by
Beers law. Let denote the mean number of photons that would be recorded by
the detector (for the th source-detector position, hereafter referred to as a ray) if
the object were absent. This depends on the scan duration, the source strength,
and the detector efficiency at the source photon energy  . The dependence on 
reflects the fact that in modern systems there are multiple detectors, each of which
1

Some gamma emitting radioisotopes produce photons at two or more distinct energies. If the
detector has adequate energy resolution, then it can separate photons at the energy of interest from
other photons, or bin the various energies separately.

6 Statistical Image Reconstruction Methods


can have its own efficiency. By Beers law, the mean number of photons recorded
for the th ray ideally would be [12]


where
where



 

(1.1)

 is the line or strip between the source and detector for the th ray, and



 





is the linear attenuation coefficient at the source photon energy. The number of
photons actually recorded in practice differs from the ideal expression (1.1) in several ways. First, for a photon-counting detector 2 , the number of recorded photons
is a Poisson random variable [12]. Second, there will usually be additional background counts recorded due to Compton scatter [22], room background, random
coincidences in PET [23,24], or emission crosstalk in SPECT [9,2527]. Third, the
detectors have finite width, so the infinitesimal line integral in (1.1) is an approximation. For accurate image reconstruction, one must incorporate these effects into
the statistical model for the measurements, rather than simply using the idealized
model (1.1).
Let  denote the random variable representing the number of photons counted
for the th ray. A reasonable statistical model 3 for these transmission measurements
is that they are independent Poisson random variables with means given by

 






(1.2)

where  denotes the mean number of background events (such as random coincidences, scatter, and crosstalk). In many papers, the  s are ignored or assumed
to be zero. In this chapter, we assume that the  s are known, which in practice
means that they are determined separately by some other means (such as smoothing
a delayed-window sinogram in PET transmission scans [31]). The noise in these
estimated  s is not considered here, and is a subject requiring further investigation and analysis. In some PET transmission scans, the random coincidences are
subtracted from the measurements in real time. Statistical methods for treating this
problem have been developed [3234], and require fairly simple modifications of
the algorithms presented in this chapter.
We assume the s are known. In PET and SPECT centers, these are determined by periodic blank scans: transmission scans with nothing but air in the
2
For a current integrating detector, such as those used in commercial x-ray CT scanners, the
measurement noise is a mixture of Poisson photon statistics and gaussian electronic noise.
3
Due to the effects of detector deadtime in PET and SPECT, the measurement distributions are
not exactly Poisson [2830], but the Poisson approximation seems adequate in practice.

The problem 7
scanner portal. Since no patient is present, these scans can have fairly long durations (typically a couple of hours, run automatically in the middle of the night).
Thus the estimated s computed from such a long scan have much less variability than the transmission measurements ( s). Therefore, we ignore the variability
in these estimated s. Accounting for the small variability in the estimates is
another open problem (but one likely to be of limited practical significance).
1.2.2

Reconstruction problem

After acquiring a transmission scan, the tomographic reconstruction problem


is to estimate   from a realization     of the measurements. This
collection of measurements is usually called a sinogram 4 [35]. The conventional
approach to this problem is to first estimate the th line integral from the model (1.2)
and then to apply the FBP algorithm to the collection of line-integral estimates.
Specifically, let







denote the true line integral along the th ray. Conventionally, one forms an estimate
of
by computing the logarithm of the measured sinogram as follows:



?




  
 

(1.3)

One then reconstructs an estimate  from   using FBP [35]. There are
several problems with this approach. First, the logarithm is not defined when
   , which can happen frequently in low-count transmission scans. (Typically one must substitute some artificial value (denoted ? above) for such rays,
or interpolate neighboring rays [36], which can lead to biases.) Second, the above
procedure yields biased estimates of the line-integral. By Jensens inequality, since
   is a concave function, for any random variable  ,

   
    

(see [37], p. 50), so when applied to (1.2) and (1.3)



  



 
  


  

 


 



 

(1.4)

Thus, the logarithm in (1.3) systematically over-estimates the line integral on average. This over-estimation has been verified empirically [14, 15]. One can show
4

When the ray measurements are organized as a 2D array according their radial and angular
coordinates, the projection of a point object appears approximately as a sinusoidal trace in the array.

8 Statistical Image Reconstruction Methods

8
6

8
4
2
x2

6
4

2
0

8
4

6
4

Figure 1.2: Illustration of 2D function

2
0

x1

    parameterized using the pixel basis (1.6).

analytically that the bias increases as the counts decrease [14], so the logarithm is
particularly unsuitable for low-count scans. A third problem with (1.3) is that the
variances of the s can be quite nonuniform, so some rays are much more informative than other rays. The FBP method treats all rays equally, even those for which
   is non-positive, which leads to noisy images corrupted by streaks originating
from high variance s. Noise is considered only as an afterthought by apodizing
the ramp filter, which is equivalent to space-invariant smoothing. (There are a few
exceptions where space-variant sinogram filtering has been applied, e.g., [3840].)
Fourth, the FBP method is poorly suited to nonstandard imaging geometries, such
as truncated fan-beam or cone-beam scans, e.g., [4146].
Since noise is a primary concern, the image reconstruction problem is naturally
treated as a statistical estimation problem. Since we only have a finite number

of measurements, it is natural to also represent   with a finite parameterization.
Such parameterizations are reasonable in practice since ultimately the estimate of
 will be viewed on a digital display with a finite number of pixels. After one
has parameterized  , the reconstruction problem becomes a statistical problem:
estimate the parameters from the noisy measurements   .
A general approach to parameterizing the attenuation map is to expand it in
terms of a finite basis expansion [47, 48]:


 




  

(1.5)



where

 is the number of coefficients

and basis functions

 . There are

The problem 9
many possible choices for the basis functions. We would like to choose basis functions that naturally represent nonnegative functions since   . We would also
like basis functions that have compact support, since such a basis yields a very
sparse system matrix in (1.9) below. The conventional basis is just the pixel
or voxel basis, which satisfies both of these requirements. The voxel basis   
is 1 inside the  th voxel, and is 0 everywhere else. In two-space, one can express
the pixel basis by

    

  



  



(1.6)

where     is the center of the  th pixel and  is the pixel width. This basis
gives a piecewise-constant approximation to  , as illustrated in Fig. 1.2. With any
parameterization of the form (1.5), the problem of estimating   is reduced to
the simpler problem of estimating the parameter vector    
 from
the measurement vector      
  where  denotes vector and matrix
transpose. Under the parameterization (1.5), the line integral in (1.2) becomes the
following summation:




 










  





where






  








 

is the line integral 5 along the th ray through the  th basis function. This simplification yields the following discrete-discrete measurement model:

 


    
 

(1.7)

where the ensemble mean of the th measurement is denoted










(1.8)

(1.9)



   . The remainder of this chapter will be based on the Poisson


where
measurement model (1.7). (See Appendix 1.14 for a review of Poisson statistics.)
5

In practice, we use normalized strip integrals [49, 50] rather than line integrals to account for
finite detector width [51]. Regardless, the units of   are length units (mm or cm), whereas the units
of the  s are inverse length.

10 Statistical Image Reconstruction Methods


1.2.3

Likelihood-based estimation

Maximum-likelihood (ML) estimation is a natural approach 6 for finding from


a particular measurement realization    when a statistical model such as (1.7)
is available. The ML estimate is defined as follows:
   
 



     



For the Poisson model (1.7), the measurement joint probability mass function is

 






 





 




 

(1.10)

The ML method seeks the object (as described by the parameter vector ) that maximizes the probability of having observed the particular measurements that were
recorded. The first paper to propose a ML approach for transmission tomography
appears to be due to Rockmore and Macovski in 1977 [47]. However, the pseudoinverse method described in [47] in general does not find the maximizer of the
likelihood  .
For independent transmission measurements, we can use (1.8) and (1.10) to
express the log-likelihood in the following convenient form:




(1.11)

where we use
hereafter for expressions that are equal up to irrelevant constants
independent of , and where the marginal log-likelihood of the th measurement is


   



 

 

(1.12)

A typical  is shown in Fig. 1.4 on page 17. For convenience later, we also list the
derivatives of  here:




    

 
       

6










 

(1.13)

(1.14)

The usual rationale for the ML approach is that ML estimators are asymptotically unbiased
and asymptotically efficient (minimum variance) under very general conditions [37]. Such asymptotic properties alone would be a questionable justification for the ML approach in the case of lowcount transmission scans. However, ML estimators often perform well even in the non-asymptotic
regime. We are unaware of any data-fit measure for low-count transmission scans that outperforms
the log-likelihood, but there is no known proof of optimality of the log-likelihood in this case, so the
question is an open one.

The problem 11
The algorithms described in the following sections are based on various strategies for finding the maximizer of  . Several of the algorithms are quite general
in the sense that one can easily modify them to apply to many objective functions
of the form (1.11), even when  has a functional form different from the form
(1.12) that is specific to transmission measurements. Thus, even though the focus of this chapter is transmission imaging, many of the algorithms and comments
apply equally to emission reconstruction and to other inverse problems.
Maximizing the log-likelihood   alone leads to unacceptably noisy images,
because tomographic image reconstruction is an ill-conditioned problem. Roughly
speaking, this means that there are many choices of attenuation maps   that
fit the measurements   reasonably well. Even when the problem is parameterized, there are many choices of the vector that fit the measurements  
reasonably well, where the fit is quantified by the log-likelihood  . Not all of
those images are useful or physically plausible. Thus, the likelihood alone does
not adequately identify the best image. One effective remedy to this problem is
to modify the objective function by including a penalty function that favors reconstructed images that are piecewise smooth. This process is called regularization
since the penalty function improves the conditioning of the problem 7 . In this chapter we focus on methods that form an estimate of the true attenuation map

by maximizing a penalized-likelihood objective function of the following form:

   


 



  



(1.15)

where the objective function  includes a roughness penalty   discussed in


more detail below. The parameter  controls the tradeoff between spatial resolution
and noise: larger values of  generally lead to reduced noise at the price of reduced
spatial resolution. Solving (1.15) is the primary subject of this chapter.
One benefit of using methods that are based on objective functions such as
(1.15) is that for such methods, image quality is determined by the objective function rather than by the particular iterative algorithm, provided the iterative algorithm converges to the maximizer of the objective function. In particular, from the
objective function one can analyze spatial resolution properties and bias, variance,
and autocorrelation properties [16, 20, 5459].
1.2.3.1

Connection to Bayesian perspective

By letting    in (1.15) and in the algorithms presented in the following


sections, one has ML algorithms as a special case. Bayesian image reconstruc7

For emission tomography, a popular alternative approach to regularization is simply to postsmooth the ML reconstruction image with a Gaussian filter. In the emission case, under the somewhat
idealized assumption of a shift-invariant Gaussian blur model for the system, a certain commutability
condition ((12) of [52] holds, which ensures that Gaussian post-filtering is equivalent to Gaussian
sieves. It is unclear whether this equivalence holds in the transmission case, although some authors
have implied that it does without proof, e.g. [53].

12 Statistical Image Reconstruction Methods


tion formulations also lead to objective functions of the form (1.15). Suppose one
considers to be a random vector drawn from a prior distribution   that is proportional to    . (Such priors arise naturally in the context of Markov random
field models for images [60].) One computes the maximum a posteriori (MAP)
estimate of by maximizing the posterior distribution   . By Bayes rule:

      





so the log posterior is


 

 
  

  

  

Thus MAP estimation is computationally equivalent to (1.15).


1.2.4

Penalty function

It has been considered by many authors to be reasonable to assume that the


attenuation maps of interest are piecewise smooth functions. An extreme example
of such assumptions is the common use of attenuation map segmentation in PET
imaging to reduce the noise due to attenuation correction factors [6163]. Under
the piecewise smooth attenuation map assumption, it is reasonable for the penalty
function   to discourage images that are too rough. The simplest penalty
function that discourages roughness considers the discrepancies between neighboring pixel values:










 



(1.16)



where    . Ordinarily


   for the four horizontal and vertical neighfor diagonal neighboring pixels, and     othboring pixels,   
erwise. One can also adopt the modifications described in [20, 5457] to provide
more uniform spatial resolution. The potential function  assigns a cost to    .
For the results presented in Section 1.11, we used a penalty function of the
form (1.16). However, the methods we present all apply to much more general
penalty functions. Such generality is needed for penalty functions such as the
weak-plate prior of [64] or the local averaging function considered in [65]. One
can express most8 of the penalty functions that have been used in tomographic
reconstruction in the following very general form:





One exception is the median root prior [66].



 

(1.17)

The problem 13
where

 is a


 penalty matrix and




 

!



We assume throughout that the functions   are symmetric and differentiable.


The pairwise penalty (1.16) is the special case of (1.17) where

 and
each row of has one  and one  entry corresponding to some pair of pixels.
In this chapter we focus on quadratic penalty functions where   "  # " 
for #  , so



#



where




  

    

(1.18)

 !# . The second derivative of such a penalty is given by




$

$ 



! #

(1.19)



This focus is for simplifying the presentation; in practice nonquadratic penalty


functions are often preferable for transmission image reconstruction e.g. [67, 68].
1.2.5

Concavity

     ,
From the second derivative expression (1.14), when   , 
which is always nonpositive, so  is concave over all of "# (and strictly concave if
 ). From (1.11) one can easily verify that the Hessian matrix (the


matrix of second partial derivatives) of   is:

 



 ! 

(1.20)

where !  is a

 diagonal matrix with th diagonal element . Thus
the log-likelihood is concave over all of "#  when    . If the  s are
all strictly convex and   is concave, then the objective  is strictly concave
under mild conditions on
[69]. Such concavity is central to the convergence
proofs of the algorithms described below. In the case   , the likelihood   is
not necessarily concave. Nevertheless, in our experience it seems to be unimodal.
(Initializing monotonic iterative algorithms with different starting images seems
to lead to the same final image.) In the non-concave case, we cannot guarantee
global convergence to the global maximum for any of the algorithms described
below. For the monotonic algorithms we usually can prove convergence to a local
maximum [70]; if in addition the objective function is unimodal, then the only local
maximum will be the global maximum, but proving that  is unimodal is an open
problem.

14 Statistical Image Reconstruction Methods


1.3

Optimization algorithms

analytically

    
 

(1.21)

Ignoring the nonnegativity constraint, one could attempt to find


by zeroing the gradient of . The partial derivatives of  are

$

$ 






  



$

$ 

where  was defined in (1.13). Unfortunately, even disregarding both the nonnegativity constraint and the penalty function, there are no closed-form solutions
to the set of equations (1.21), except in the trivial case when
 . Even when
 there are no closed-form solutions for nonseparable penalty functions. Thus
iterative methods are required to find the maximizer of such objective functions.

1.3.1

Why so many algorithms?

Analytical solutions for the maximizer of (1.15) appear intractable, so one


must use iterative algorithms. An iterative algorithm is a procedure that is initialized with an initial guess  of , and then recursively generates a sequence
     , also denoted   . Ideally, the iterates    should rapidly approach the maximizer . When developing algorithms for image reconstruction
based on penalized-likelihood objective functions, there are many design considerations, most of which are common to any problem involving iterative methods. In
particular, an algorithm designer should consider the impact of design choices on
the following characteristics.
 Monotonicity ((  ) increases every iteration)
 Nonnegativity constraint (  )
 Parallelization
 Sensitivity to numerical errors
 Convergence rate (as few iterations as possible)
 Computation time per iteration (as few floating point operations as possible)
 Storage requirements (as little memory as possible)
 Memory bandwidth (data access)
Generic numerical methods such as steepest ascent do not exploit the specific structure of the objective function , nor do they easily accommodate the nonnegativity
constraint. Thus for fastest convergence, one must seek algorithms tailored to this
type of problem. Some of the relevant properties of  include:
  % is a sum of scalar functions  .
 The  s have bounded curvature, and are concave when   .
 The arguments of the functions  are inner products.
 The inner product coefficients are all nonnegative.
The cornucopia of algorithms that have been proposed in the image reconstruction
literature exploit these properties (implicitly or explicitly) in different ways.

Optimization algorithms 15
1.3.2

Optimization transfer principle

Before delving into the details of the many algorithms that have been proposed
for maximizing , we first describe a very useful and intuitive general principle
that underlies almost all the methods. The principle is called optimization transfer.
This idea was described briefly as a majorization principle in the limited context
of 1D line searches in the classic text by Ortega and Rheinbolt [71, p. 253]. It was
rediscovered and generalized to inverse problems in the recent work of De Pierro
[72, 73] and Lange [69, 74]. Since the concept applies more generally than just to
transmission tomography, we use % as the generic unknown parameter here.
The basic idea is illustrated in Fig. 1.3. Since  is difficult to maximize, at the
&th iteration we can replace  with a surrogate function ' % %   that is easier to
maximize, i.e., the next iterate is defined as:

%

   '




% % 

(1.22)

The maximization is restricted to the valid parameter space (e.g. %   for problems
with nonnegative constraints). Maximizing '  %   will usually not lead directly
to the global maximizer . Thus one repeats the process iteratively, finding a new
surrogate function ' at each iteration and then maximizing that surrogate function.
If we choose the surrogate functions appropriately, then the sequence %   should
eventually converge to the maximizer [75].
Fig. 1.3 does not do full justice to the problem, since 1D functions are usually
fairly easy to maximize. The optimization transfer principle is particularly compelling for problems where the dimension of % is large, such as in inverse problems
like tomography.
It is very desirable to use algorithms that monotonically increase  each iteration, i.e., for which  %      % . Such algorithms are guaranteed to be
stable, i.e., the sequence %   will not diverge if  is concave. And generally
such algorithms will converge to the maximizer if it is unique [70]. If we choose
surrogate functions that satisfy


%   %   ' % %   ' %  %  % % 

(1.23)

then one can see immediately that the algorithm (1.22) monotonically increases .
To ensure monotonicity, it is not essential to find the exact maximizer in (1.22). It
suffices to find a value %   such that ' %    %   ' %  % , since that
alone will ensure  %      %  by (1.23). The various algorithms described
in the sections that follow are all based on different choices of the surrogate function
', and on different procedures for the maximization in (1.22).
Rather than working with (1.23), all the surrogate functions we present satisfy

16 Statistical Image Reconstruction Methods

Objective
Surrogate

0.8

0.6

0.4

% and ' % % 

0.2

%

%

Figure 1.3: Illustration of optimization transfer in 1D.

the following conditions:

' %  % 



 ' % %    
 




' % %  

% 
 % 

(1.24)


(1.25)

% %  

(1.26)

Any surrogate function that satisfies these conditions will satisfy (1.23). (The middle condition follows from the outer two conditions when  and ' are differentiable.)
1.3.3

Convergence rate

The convergence rate of an iterative algorithm based on the optimization transfer principle can be analyzed qualitatively by considering Fig. 1.3. If the surrogate
function ' has low curvature, then it appears as a broad graph in Fig. 1.3, which
means that the algorithm can take large steps (%    %  can be large) which
means that it reaches the maximizer faster. Conversely, if the surrogate function
has high curvature, then it appears as a skinny graph, the steps are small, and
many steps are required for convergence. So in general we would like to find low
curvature surrogate functions, with the caveat that we want to maintain '   to
ensure monotonicity [76]. And of course we would also like the surrogate ' to be

Optimization algorithms 17
150

Log-likelihood  
Parabola surrogate (

  and (   

140

  

130
120
110
100
90
80
70
60
50
0

Figure 1.4: Ray log-likelihood 

 ,

and 





 and parabola surrogate     for 

 ,

 ,

 .

easy to maximize for (1.22). Unfortunately, the criteria low curvature and easy
to maximize are often incompatible, so we must compromise.
1.3.4

Parabola surrogate

Throughout this chapter we struggle with the ray log-likelihood function


defined in (1.12), which we rewrite here without the s for simplicity:

     

 

 

where      are known constants. Of the many possible surrogate functions


that could replace   in an optimization transfer approach, a choice that is particularly convenient is a parabola surrogate:

!

(                 

(1.27)

for some choice of the curvature !  , where

 



(1.28)

is the th line integral through the estimated attenuation map at the &th iteration.
The choice (1.27) clearly satisfies conditions (1.24) and (1.25), but we must care

fully choose !  !     to ensure that (         so that (1.26)
is satisfied. On the other hand, from the convergence rate description in the preceding section, we would like the curvature ! to be as small as possible. In other

18 Statistical Image Reconstruction Methods


words, we would like to find


!      


! $          

        

In [77], we showed that the optimal curvature is as follows:



 
     










 
    
  
 
!      

 


    
    



(1.29)

where 
 is  for positive  and zero otherwise. Fig. 1.4 illustrates the surrogate
parabola ( in (1.27) with the optimal curvature (1.29).
One small inconvenience with (1.29) is that it changes every iteration since it

depends on . An alternative choice of the curvature that ensures (   is the
maximum second derivative of   over  . In [77] we show that



 









 

(1.30)


We can precompute this curvature before iterating since it is independent of .


However, typically this curvature is much larger than the optimal choice (1.29), so
the floating point operations (flops) saved by precomputing may be lost in increased
number of iterations due to a slower convergence rate.
The surrogate parabola (1.27) with curvature (1.29) will be used repeatedly
in this chapter, both for the derivation of recently developed algorithms, as well
as for making minor improvements to older algorithms that were not monotonic
as originally proposed. A similar approach applies to the emission reconstruction
problem [78].
1.4

EM algorithms

The emission reconstruction algorithm derived by Shepp and Vardi in [79] and
by Lange and Carson in [17] is often referred to as the EM algorithm in the nuclear imaging community. In fact, the expectation-maximization (EM) framework
is a general method for developing many different algorithms [80]. The appeal
of the EM framework is that it leads to iterative algorithms that in principle yield
sequences of iterates that monotonically increase the objective function. Furthermore, in many statistical problems one can derive EM algorithms that are quite
simple to implement. Unfortunately, the Poisson transmission reconstruction problem does not seem to be such a problem. Only one basic type of EM algorithm

EM algorithms 19
has been proposed for the transmission problem, and that algorithm converges very
slowly and has other difficulties described below. We include the description of the
EM algorithm for completeness, but the reader who is not interested in the historical perspective could safely skip this section since we present much more efficient
algorithms in subsequent sections.
We describe the general EM framework in the context of problems where one
observes a realization  of a measurement vector  , and wishes to estimate a parameter vector % by maximizing the likelihood or penalized log-likelihood. To develop an EM algorithm, one must first postulate a hypothetical collection of random
variables called the complete data space  . These are random variables that, in
general, were not observed during the experiment, but that might have simplified
the estimation procedure had they been observed. The only requirement that the
complete data space must satisfy is that one must be able to extract the observed
data from  , i.e. there must exist a function   such that

 

(1.31)

This is a trivial requirement since one can always include the random vector 
itself in the collection  of random variables.
Having judiciously chosen  , an essential ingredient of any EM algorithm is
the following conditional expectation of the log-likelihood of  :

) % %        % 

  %


  

 %     % 
(1.32)

and, in the context of penalized-likelihood problems, the following function


' % %   ) % %    % 

(1.33)

where  % is a penalty function. We refer to ' as an EM-based surrogate function,


since one replaces the difficult problem of maximizing  with a sequence of (hopefully) simpler maximizations of '. There are often alternative surrogate functions
that have advantages over the EM-based functions, as described in Section 1.6.1.
The surrogate function concept is illustrated in Fig. 1.3 on page 16.
An EM algorithm is initialized at an arbitrary point %  and generates a sequence
of iterates %  %   . Under fairly general conditions [81], the sequence %  
converges to the maximizer of the objective function  %    %   % The
EM recursion is as follows:
E-step: find ) % %   using (1.32) and ' % %   using (1.33)
M-step: %      ' % %  


where the maximization is restricted to the set of valid parameters, as in (1.22).


Many generalizations of this basic framework have been proposed, see for example
[8286] and a recent review paper [87].

20 Statistical Image Reconstruction Methods


By applying Jensens inequality, one can show [80] that the EM-based surrogate function satisfies the monotonicity inequality (1.23) for all % and %  . Since
the M-step ensures that

' %   %   ' %  %  


it follows from (1.23) that the above EM recursion is guaranteed to lead to monotone increases in the objective function . Thus, the EM framework is a special
case of the optimization transfer approach (1.22), where the surrogate function derives from statistical principles (1.32).
1.4.1

Transmission EM algorithm

Although Rockmore and Macovski proposed ML transmission reconstruction


in 1977 [47], it took until 1984 for the first practical algorithm to appear, when
Lange and Carson proposed a complete data space for a transmission EM algorithm
[17]. Lange and Carson considered the case where   . In this section we derive
a transmission EM algorithm that generalizes that of [17]; we allow   , and we
consider arbitrary pixel orderings, as described below. The algorithm of [17] is a
special case of what follows.
The complete data space that we consider for the case    is the following
collection of random variables, all of which have Poisson marginal distributions:

   

    
  *   
   

   
 

We assume that



and





     
 

and that the  s are all mutually independent and statistically independent of all
the   s. Furthermore,   and   are independent for   . However,   and
  are not independent. (The distributions of  and   do not depend on , so
are of less importance in what follows.)
For each , let        be any permutation of the set of pixel indices
   
 . Notationally, the simplest case is just when    *, which
corresponds to the algorithm considered in [88]. Lange and Carson [17] assign
  to the physical ordering corresponding to the ray connecting the source to the
detector. Statistically, any ordering suffices, and it is an open (and probably academic) question whether certain orderings lead to faster convergence. Given such
an ordering, we define the remaining   s recursively by the following conditional
distributions:

            Binomial    

 

EM algorithms 21
where


+    

 



Thus the conditional probability mass function (PMF) of 


given by:

,             

where

&
-



 


 



for *

,    

 



 



 


 is

(1.34)

 

is the binomial coefficient. An alternative way of writing this recur-

sion is

 

 

.   *   
     
 

where .  is a collection of independent 0-1 Bernoulli random variables with

 .   
 

 

Since a Bernoulli-thinned Poisson process remains Poisson (see Appendix 1.14), it


follows that each   has a Poisson distribution:





 








 

(1.35)

The special case *

 


 in (1.35) yields





 




Therefore, the observed measurements are related to the complete data space by

 

so the condition (1.31) is satisfied. As noted in [89], there are multiple orderings of
the   s that can be considered, each of which would lead to a different update, but
which would leave unchanged the limit if there is a unique maximizer (and provided
this EM algorithm is globally convergent, which has never been established for the
case   ).
Figure 1.5 illustrates a loose physical interpretation of the above complete data
space. For the th ordering, imagine a sequence of layers of material with attenuation coefficients        and thicknesses          . Suppose a

22 Statistical Image Reconstruction Methods



 



 

 

  

 

...












 

Figure 1.5: Pseudo-physical interpretation of transmission EM complete-data space.

Poisson number   of photons is transmitted into the first layer. The number that
survive passage through that layer is  , which then proceed through the second
layer and so on. The final number of photons exiting the sequence of layers is
  , and this number is added to the random coincidences  to form the observed counts  . This interpretation is most intuitive when the pixels are ordered
according to the actual passage of photons from source to detector (as in [17]), but
a physical interpretation is not essential for EM algorithm development.
It follows from (1.35) and Appendix 1.14 that



 Binomial 


 



 




 

since a cascade of independent Binomials is Binomial with the product of the success probabilities.
Having specified the complete-data space, the next step in developing an EM
algorithm is to find the surrogate function )    of (1.32). It follows from the
above specifications that the joint probability mass function (PMF) of  is given
by

, 






,   

 





,  

By applying the chain rule for conditional probabilities [90] and using (1.34):

,    




,  






,    

 

EM algorithms 23
so
 ,







 ,






   



   

 

 

 

    



 



Thus, following [17, 88], the EM-based surrogate function for the above complete
data space has the following form:













   

)





  

/  
      



  

(1.36)



where

)










   

  

/  
      

  

(1.37)

 

/ 

   

  









(1.38)

  

  

(1.39)

To complete the E-step of the EM algorithm, we must find the preceding conditional
expectations. Using the law of iterated expectation [90]:

   

        
  



         
  


   
   
     

  
   
  
(1.40)
   




using results from [17] and Appendix 1.14. From (1.35)

   




  


0






  









  



   

(1.41)

24 Statistical Image Reconstruction Methods


from which one can show using (1.40) that

 

/  
   0  

   

(1.42)

Combining (1.36), (1.37), (1.40), and (1.41) yields an explicit expression for the
EM surrogate )   , completing the E-step.
For the M-step of the EM algorithm, we must find the maximizer of )   .
The function )    is a separable function of the  s, as shown in (1.36), so
it is easier to maximize than . Thus the M-step reduces to the
 separable 1D
maximization problems:
 

   )
 





(1.43)

Unfortunately however, due to the transcendental functions in (1.36), there is no


closed-form expression for the maximizer of )    . In fact, finding the maximizer of )    is no easier than maximizing   with respect to  while
holding the other parameters fixed, which is the coordinate ascent algorithm described in Section 1.5. Nevertheless, the EM algorithm is parallelizable, unlike the
coordinate ascent algorithm, so we proceed here with its description. Zeroing the
derivative of ) in (1.37) yields the following:



)
 










  

  

    /



  
 (1.44)
 


 

. Unfortunately, (1.44) can only be solved for  anthe solution to which is 


alytically when the   s are all equal. However, Lange and Carson [17] noted that
typically    will be small (much less than unity), so that the following Taylorseries expansion around zero should be a reasonable approximation:







for   

Applying this approximation to (1.44) with   




    







/  
  
/ 


Solving this equality for


EM algorithm [17]:

yields:




  






yields the final iterative form for the ML transmission

 


     


 /



 
 







  / 

(1.45)

EM algorithms 25
This algorithm is very slow to converge [69] and each iteration is very computationally expensive due to the large number of exponentiations required in (1.41).
One exponentiation per nonzero   is required.
Lange and Carson [17] also describe an update based on a second-order Taylor
series, and they note that one can use their expansion to find upper and lower bounds
for the exact value of  that maximizes )   .
1.4.2

EM algorithms with approximate M-steps

Since the M-step of the transmission EM algorithm of [17] did not yield a
closed form for the maximizer, Browne and Holmes [91] proposed a modified EM
algorithm that used an approximate M-step based on image rotations using bilinear
interpolation. Kent and Wright made a similar approximation [89]. An advantage
of these methods is that (after interpolation) the   s are all equal, which is the
case where one can solve (1.44) analytically. Specifically, if     for all  and
 , then (1.44) simplifies to







%  






/%  
%  





%
%


where

and /
replace /
and





When solved for  , this yields the iteration

 






respectively, in rotated coordinates.

 %  !
 /
 %  





 





(1.46)

which is the logarithm of the ratio of (conditional expectations of) the number of
photons entering the  th pixel to the number of photons leaving the  th pixel, di%  and
vided by the pixel size. However, the interpolations required to form

/%  presumably destroy the monotonicity properties of the EM algorithm. Although bookkeeping is reduced, these methods require the same (very large) number of exponentiations as the original transmission EM algorithm, so they are also
impractical algorithms.
1.4.3

EM algorithm with Newton M-step

Ollinger [92,93] reported that the M-step approximation (1.45) proposed in [17]
led to convergence problems, and proposed a 1D Newtons method for maximizing
) in the context of a GEM algorithm for the M-step. Since Newtons method is
not guaranteed to converge, the step length was adjusted by a halving strategy to

26 Statistical Image Reconstruction Methods


ensure a monotone increase in the surrogate function:
 



1

)


 


)
 




 

 






 



  

(1.47)



  



    )     by choosing 1 via a


where one ensures that ) 
line-search. This line-search can require multiple evaluations of )  as each pixel
is updated, which is relatively expensive. The large number of exponentiations
  and

  also remains a drawback.


required to compute /


From (1.44) and (1.42),


)
 



















  













  



  

 




!



 

 
 


/  
 




 0  0  








 

so (1.25) is indeed satisfied. From (1.44) and (1.42),


  )










so


  )








/  
  








  

 


  
   


 0     



  

(1.48)

Thus, the ML EM Newton-Raphson (EM-NR) algorithm (1.47) becomes





 





1




 










0  





  



(1.49)

From (1.48), the curvature of ) becomes unbounded as   , which appears to preclude the use of parabola surrogates as described in Section 1.4.5.2 to
form an intrinsically monotonic M-step (1.43).
Variations on the transmission EM algorithm continue to resurface at conferences, despite its many drawbacks. The endurance of the transmission EM algorithm can only be explained by its having ridden on the coat tails of the popular
emission EM algorithm. The modern methods described in subsequent sections are
entirely preferable to the transmission EM algorithm.

EM algorithms 27
1.4.4

Diagonally-scaled gradient-ascent algorithms

Several authors, e.g. [94], noted that the emission EM algorithm can be expressed as a diagonally-scaled, gradient-ascent algorithm, with a particular diagonal scaling matrix that (almost miraculously) ensures monotonicity and preserves
nonnegativity. (The EM-NR algorithm (1.49) has a similar form.) Based on an
analogy with that emission EM algorithm, Lange et al. proposed a diagonallyscaled gradient-ascent algorithm for transmission tomography [95]. The algorithm
can be expressed as the following recursion:
 













(1.50)

where 2  is some iteration-dependent diagonal matrix and where   denotes the


column gradient operator (cf (1.21)):

  
 

$

$ 










(1.51)

Since the gradient of the objective function is evaluated once per iteration, the number of exponentiations required is roughly
 , far fewer than required by the transmission EM algorithm (1.45).
The choice of the

 diagonal scaling matrix 2  critically affects
convergence rate, monotonicity, and nonnegativity. Considering the case   ,
Lange et al. [95] suggested the following diagonal scaling matrix, chosen so that
(1.50) could be expressed as a multiplicative update in the case   :

 




 

(1.52)

The natural generalization of this choice to the general case where    is the
diagonal matrix with the following expression for the  th diagonal element:

 

  



 

 

(1.53)

Using (1.51) and (1.53) one can rewrite the diagonally-scaled gradient ascent (DSGA) algorithm (1.50) as follows:
 









  
 
    




 

(1.54)

This is a multiplicative update that preserves nonnegativity, and at least its positive
fixed points are stationary points of the log-likelihood. However, the particular
choice of diagonal scaling matrix (1.53) does not guarantee intrinsically monotone

28 Statistical Image Reconstruction Methods


increases in the likelihood function. In Section 1.6.2 below, we present one form
of a paraboloidal surrogates algorithm that has the same general form as (1.50) but
overcomes the limitations of (1.54) by choosing 2 appropriately.
Lange et al. also proposed modifications of the iteration (1.50) to include a
separable penalty function and a line-search to enforce monotonicity [95] (for  
 case, but the ideas generalize easily to the    case).
Lange proposed another diagonally-scaled gradient-ascent algorithm in [96],
based on the following diagonal scaling matrix:

 




  

(1.55)

Although the rationale for this choice was not given in [96], Lange was able to
show that the algorithm has local convergence properties, but that it may not yield
nonnegative estimates. Lange further modified the scaled-gradient algorithm in
[97] to include nonseparable penalty functions, and a practical approximate linesearch that ensures global convergence for   .
Considering the case   , Maniawski et al. [98] proposed the following overrelaxed unregularized version of the diagonally-scaled gradient-ascent algorithm
(1.50):
 





 
#


   


  





  #

(1.56)

where # was selected empirically to be &    times the total number of measured
counts in a SPECT transmission scan. Like (1.54), this is a multiplicative update
that preserves nonnegativity. One can also express the above algorithm more generally as follows:
 

   #



#2











where 2  is chosen as in (1.52). No convergence analysis was discussed for the


algorithm (1.56), although fast convergence was reported.
1.4.5

Convex algorithm

De Pierro [72] described a non-statistical derivation of the emission EM algorithm using the concavity properties of the log-likelihood for emission tomography. Lange and Fessler [69] applied a similar derivation to the transmission
log-likelihood for the case   , yielding a convex9 algorithm that, like the
transmission EM algorithm, is guaranteed to monotonically increase   each iteration. As discussed in Section 1.2.5, the transmission log-likelihood is concave
9

The algorithm name is unfortunate, since the algorithm itself is not convex, but rather the algorithm is derived by exploiting the concavity of the log-likelihood.

EM algorithms 29
when   , so De Pierros convexity method could be applied directly in [69].
In the case   , the log-likelihood is not concave, so De Pierros convexity
argument does not directly apply. Fessler [14] noted that even when   , the
marginal log-likelihood functions (the  s in (1.11)) are concave over a (typically)
large interval of the real line, and thereby developed an approximate convex algorithm. However, the convex algorithm of [14] is not guaranteed to be globally
monotonic.
Rather than presenting either the convex algorithm of [69], which is incomplete
since it did not consider the case   , or the algorithm of [14], which is nonmonotone, we derive a new convex algorithm here. The algorithm of [69] falls
out as a special case of this new algorithm by setting   . The idea is to first use
the EM algorithm to find a concave surrogate function ) that eliminates the 
terms, but is still difficult to maximize directly; then we apply De Pierros convexity
argument to ) to find another surrogate function )  that is easily maximized. The
same idea was developed independently by Kim [99].
Consider a complete data space that is the collection of the following statistically independent random variables:

  
    


where


 



and where the observed measurements are related to the elements of  by

so the condition (1.31) is satisfied. The complete-data log-likelihood is simply:


 ,








 

 

since the distribution of the  s is a constant independent of , so can be ignored.


Since by Appendix 1.14

  

















the EM surrogate function is the following concave function:









 

 




3 

(1.57)

30 Statistical Image Reconstruction Methods


where


3    
   

(1.58)

The form of (1.57) is identical to the form that (1.11) would have if all the  s were
zero. Therefore, by this technique we can generalize any algorithm that has been
derived for the    case to the realistic case where    simply by replacing 
  . However, in general the convergence of an algorithm
in the algorithm with

derived this way may be slower than methods based on direct maximization of
  since the curvatures of the ) components are smaller than those of  since

    . For rays where the random fraction is large,


    , leading to
slow convergence rates. In statistical terms, the complete-data space  is much
more informative than the observed data  [100].
We could attempt to naively perform the M-step of the EM algorithm:
 

   )
 





except that maximizing ) is (almost10 ) as difficult as maximizing the original


log-likelihood.

By differentiating twice, one can easily show that each 3 is a concave function and that )    is a concave function. Therefore, rather than maximizing
) directly, we find a surrogate function for ) by applying the convexity method
of De Pierro [72]. The essence of De Pierros method is the following clever expression for matrix-vector multiplication:




1



1







(1.59)

where the projection of the current attenuation map estimate is given by


 



The expression (1.59) holds for any collection of


    and




1

1  s, provided 1 

 

only if

 



If we choose nonnegative 1  s, then because each 3





 

3
1
1







is concave, by (1.59):



  


1 3


















(1.60)
10

 is concave, unlike .

EM algorithms 31
where

3 






1

3 



Thus, a suitable surrogate function for )

)














13





 

  

is










)





(1.61)



where

)












13










1 3


1








(1.62)
Since ) is a separable function, its maximization reduces to
maximization problems:
 


   )


 





 simultaneous

(1.63)

Unfortunately there is not a closed-form analytical solution for the maximizer, so


we must apply approximations, line searches, or the optimization transfer principle.
1.4.5.1

Convex-NR algorithms

A simple solution to (1.63) is to apply one or more Newton-Raphson steps,


as in (1.47). Such an algorithm should be locally convergent, and can presumably
be made globally convergent by a line-search modification of the type proposed by
Lange [97]. From (1.62), (1.57), and (1.13):


)
 










  

 







 



so

 )





  3   






 


 



 













$
 
$

 



 

32 Statistical Image Reconstruction Methods


Thus the new convex algorithm (1.63), based on one 1D Newton-Raphson step for
each pixel, has the same form as the diagonally-scaled gradient-ascent algorithm
(1.50), except that it uses the following diagonal scaling matrix:



 



   )

 





  

where from (1.62) and (1.58)


  )















1

  

3










1



(1.64)
In [69] and [14], the following choice for the 1  s was used, following [72]

1













  



(1.65)

Substituting into (1.64) etc. yields the Convex-NR-1 algorithm:



 











 








  





(1.66)

The diagonal scaling matrix of this Convex-NR-1 algorithm is identical to (1.55),


which is interesting since (1.55) was presented for the case   . There are
potential problems with the choice (1.65) when the  s approach zero [69], so the
following alternative choice, considered in [73] and [101], may be preferable:

1
where 







 









(1.67)

  , for which (1.64) leads to


2


 



  



A small advantage of the choice (1.67) over (1.65) is that the  s in the denominator of (1.67) are independent of so they can be precomputed, unlike the denominator of (1.65).

EM algorithms 33
Substituting the above into (1.50) yields the following Convex-NR-2 algorithm:


 













 




  





(1.68)



Each iteration requires one forward projection (to compute the s) and two backprojections (one each for the numerator and denominator). In general a line search
would be necessary with this algorithm to ensure monotonicity and convergence.
1.4.5.2

Convex-PS algorithm

The function ) in (1.62) cannot be maximized analytically, but we can apply
the optimization transfer principle of Section 1.3.2 to derive the first intrinsically
monotonic algorithm presented in this chapter. Using the surrogate parabola (1.27):

!

3    (      3    3       
      
(1.69)
where from (1.29) the optimal curvature is



!   !   
      
 







       


  
(1.70)

This suggests the following quadratic surrogate function

)













1







with corresponding algorithm


 


   )


 





(1.71)

Since ) is quadratic, it is trivial to maximize analytically. Furthermore, since this


is a 1D maximization problem for a concave function, to enforce the nonnegativity
constraint we simply reset any negative pixels to zero. The derivatives of )  are:


)
 









  




  3

  (    

 










   
   

   



 









 



34 Statistical Image Reconstruction Methods


)
 














1

  

!

Using the choice (1.67) for the 1  s yields the following Convex-PS algorithm:

"
#
 
#
$











 







  !



%
&&
'

(1.72)

The 
 operation enforces the nonnegativity constraint. Since this is the first intrinsically monotonic algorithm presented in this chapter, we provide the following
more detailed description of its implementation.
for &    



$

 





$

compute !


$

for 



    






using (1.70),   



 !     


 

 
 


$




    
 (1.73)





 



 




.

  

(1.74)


This ML algorithm monotonically increases the log-likelihood function   


each iteration.
To derive (1.72), we have used all three of the optimization transfer principles
that are present in this chapter: the EM approach in (1.57), the convex separability
approach in (1.61), and the parabola surrogate approach in (1.69). Undoubtably
there are other algorithms awaiting discovery by using different combinations of
these principles!
1.4.6

Ordered-subsets EM algorithm

Hudson and Larkin [102] proposed an ordered subsets modification of the


emission EM algorithm in which one updates the image estimate using a sequence

Coordinate-ascent algorithms 35
of subsets of the measured data, subsampled by projection angle, rather than using
all the measurements simultaneously. Manglos et al. [44] applied this concept to
the transmission EM algorithm (1.45), yielding the iteration:
 



 
 
 / 
  

 
 






   / 


(1.75)

where  is a subset of the ray indices  


  selected for the &th subiteration. This type of modification destroys the monotonicity properties of the EM
algorithm, and typically the sequence of images asymptotically approaches a limit
cycle [103105]. However, at least in the emission case, the OSEM algorithm
seems to produce visually appealing images fairly quickly and hence has become
very popular.
Any of the algorithms described in this chapter could be easily modified to
have a block-iterative form akin to (1.75) simply by replacing any ray summations
(those over ) with partial summations over     . Since (1.75) requires the same
number of exponentiations per iteration as the transmission EM algorithm (1.45), it
is still impractical. However, block-iterative forms of some of the other algorithms
described in this chapter are practical. In particular, Nuyts et al. proposed a blockiterative modification of a gradient-based method [106] (only in the case   ).
Kamphius and Beekman [107] proposed a block-iterative version of (1.66) (only
in the case   ). Erdogan and Fessler propose a block-iterative version of the
separable paraboloidal surrogates algorithm of Section 1.6.2 in [108, 109].
1.4.7

EM algorithms with nonseparable penalty functions

All the algorithms described above were given for the ML case (where   ).
What happens if we want to include a nonseparable penalty function for regularization, for example in the Convex-PS algorithm? Considering (1.33), it appears that
we should replace (1.71) with
 

   )




  

(1.76)

Unfortunately, this is a nontrivial maximization since a nonseparable penalty  


leads to coupled equations. One approach to circumventing this problem is the generalized EM (GEM) method [80,110112], in which one replaces the maximization
in (1.76) with a few cycles of, for example, the coordinate ascent algorithm.
A clever alternative is to replace   in (1.76) with a separable surrogate
function using a similar trick as in (1.60), which was proposed by De Pierro [73].
We discuss this approach in more detail in Section 1.6.2: see (1.92).
1.5

Coordinate-ascent algorithms

A simple and natural approach to finding the maximizer of   is to sequentially maximize   over each element  of using the most recent values for all

36 Statistical Image Reconstruction Methods


other elements of . A general coordinate ascent method has the following form:
for &    
for   
 



 

$   
 

 

 

 


 




 




(1.77)

The operation in (1.77) is performed in place, i.e., the new value of  replaces
the old value, so that the most recent values of all elements of are always used.
An early use of such a method for tomography was in [113].
Sauer and Bouman analyzed such algorithms using clever frequency domain
arguments [13], and showed that sequential algorithms yield iterates whose high
frequency components converge fastest. This is often ideal for tomography, since
we can use a low-resolution FBP image as the initial guess, and then iterate to
improve resolution and reduce noise, which is mostly high frequency errors. (Using
a uniform or zero initial image for coordinate ascent is a very poor choice since low
frequencies can converge very slowly.)
The long string of arguments in (1.77) is quite notationally cumbersome. For
the remainder of this section, we use

% 

 

 









(1.78)

as shorthand for the vector of the most recent parameter values. For simplicity, this
notation leaves implicit the dependence of % on iteration & and pixel index  .
The general method described by (1.77) is not exactly an algorithm, since the
procedure for performing the 1D maximization is yet unspecified. In practice it is
impractical to find the exact maximizer, even in the 1D problem (1.77), so we settle
for methods that increase .
1.5.1

Coordinate-ascent Newton-Raphson

If   were a quadratic functional, then the natural approach to performing


the maximization in (1.77) would be Newtons method. Since   in (1.15) is
nonquadratic, applying Newtons method to (1.77) does not guarantee monotonic
increases in , but one might still try it anyway and hope for the best. In practice
monotonicity usually does not seem to be a problem, as suggested by the success of
Bouman et al. with this approach [18, 114]. For such a coordinate-ascent Newton-

Coordinate-ascent algorithms 37
Raphson (CA-NR) algorithm, we replace (1.77) with the following update:

"
##
 
#
$






 



   




%
&&
&' 



(1.79)

The 
 operation enforces the nonnegativity constraint. The first partial derivative
is given by (1.21), and the second is given by:

$

$ 






 



$

$ 



(1.80)

 is given by (1.14).
where 
Specifically, using (1.13), (1.14), (1.21), and (1.80), the update (1.79) of the
CA-NR algorithm becomes

"
##
 
#
#$































!


     






 


%

%

%
&&
&&
'


(1.81)
Literally interpreted, this form of the CA-NR algorithm appears to be extremely inefficient computationally, because it appears to require that % be recomputed after
every pixel is updated sequentially. This would lead to 4
  flops per iteration,
which is impractical.
In the following efficient implementation of CA-NR, we maintain a copy of


%


as a state vector, and update that vector after each pixel is updated.

38 Statistical Image Reconstruction Methods


Initialization: % $
for &    
for   

"
##
 
$ #
#$


 




























 

     




 


%

%

%
&&
&&
'


(1.82)
%
$ % 

 









   

(1.83)

The computational requirements per iteration are summarized in [77].


Bouman et al. also present a clever method to search for a zero-crossing to
avoid using Newton-Raphson for the penalty part of the objective function [19].
The numerator in (1.82) is essentially a backprojection, and appears to be quite
similar to the backprojection in the numerator of (1.68). One might guess then that
coordinate ascent and an algorithm like Convex-NR in (1.68) would have similar
computational requirements,
but they do not. We can precompute the entire expres
sion



 







for each  in the numerator of (1.68) before starting

the backprojection, which saves many flops and nonsequential memory accesses.
In contrast, the numerator of (1.82) contains % s that change after each pixel is
updated, so that expression cannot be precomputed. During the backprojection
step, one must access four arrays (nonsequentially): the  s, s,  s, and % s,
in addition to the system matrix elements   . And one must compute an exponentiation and a handful of addition and multiplications for each nonzero   . For
these reasons, coordinate ascent is quite expensive computationally per iteration.
On the other hand, experience shows that if one considers the number of iterations
required for convergence, then CA-NR is among the best of all algorithms. The
PSCA algorithm described in Section 1.6.4 below is an attempt to capture the convergence rate properties of CA-NR, but yet guaranteeing monotonicity and greatly
reducing the flop counts per iteration.
An alternative approach to ensuring monotonicity would be to evaluate the objective function  after updating each pixel, and impose an interval search in the
(hopefully relatively rare) cases where the objective function decreases. Unfortunately, evaluating  after every pixel adds considerable computational overhead.

Coordinate-ascent algorithms 39
1.5.2

Variation 1: Hybrid Poisson/polynomial approach

One approach to reducing the flops required by (1.82) is to replace some of


the nonquadratic  s in the log-likelihood (1.11) with quadratic functions. Specifically, any given measured sinogram is likely to contain a mixture of high and low
count rays. For high-count rays, a quadratic approximation to  should be adequate, e.g. a Gaussian approximation to the Poisson statistics. For low count rays,
the Poisson  function (1.12) can be retained to avoid biases. This hybrid Poisson/polynomial approach was proposed in [14], and was shown to significantly
reduce CPU time. However, implementation is somewhat inelegant since the system matrix must be stored by sparse columns, and those sparse columns must be
regrouped according to the indices of low and high count rays, which is a programming nuisance.
1.5.3

Variation 2: 1D parabolic surrogates

Besides CPU time, another potential problem with (1.82) is that it is not guaranteed to monotonically increase , so divergence is possible. One can ensure
monotonicity by applying the optimization transfer principle to the maximization
problem (1.77). One possible approach is to use a parabolic surrogate for the 1D
 
           . For the fastest confunction     
vergence rate, the optimal parabolic surrogate would have the lowest possible curvature, as discussed in Section 1.6.2 below. The surrogate parabola (1.27) with
optimal curvature (1.29) can be applied to (1.77) to yield an algorithm of the form
(1.82) but with a different expression in the denominator. Ignoring the penalty
function, the ML coordinate ascent parabola surrogate (CA-PS) algorithm is

"
 
$ $















 



%




  

%
' 

(1.84)

where !  was defined in (1.29). Unfortunately, this algorithm suffers from the
same high CPU demands as (1.82), so is impractical. To incorporate a penalty
function, one could follow a similar procedure as in Section 1.6.2 below.
Another approach to applying optimization transfer to (1.77) was proposed by
Saquib et al. [115] and Zheng et al. [116], called functional substitution. That
method also yields a monotonic algorithm, for the case    since concavity of
 is exploited in the derivation. The required flops are comparable to those of CANR. We can generalize the functional substitution algorithm of [116], to the case
   by exploiting the EM surrogate described in Section 1.4.5 to derive a new


monotic algorithm. Essentially one simply replaces  with      
in the curvature terms in [116], yielding an algorithm that is identical to (1.82) but
with a different denominator.

40 Statistical Image Reconstruction Methods


1.6

Paraboloidal surrogates algorithms

Coordinate ascent algorithms are sequential update algorithms because the pixels are updated in sequence. This leads to fast convergence, but requires column
access of the system matrix , and makes parallelization quite difficult. In contrast, simultaneous update algorithms can update all pixels independently in parallel, such as the EM algorithms (1.45), (1.46), and (1.49), the scaled gradient ascent
algorithms (1.50), (1.54), and (1.56), and the Convex algorithms (1.66), (1.68), and
(1.72). However, a serious problem with all the simultaneous algorithms described
above, except Convex-PS (1.72), is that they are not intrinsically monotonic. (They
can all be forced to be monotonic by adding line searches, but this is somewhat
inconvenient.) In this section we describe an approach based on the optimization
transfer principle of Section 1.3.2 that leads to a simultaneous update algorithm that
is also intrinsically monotonic, as well as a sequential algorithm that is intrinsically
monotonic like CA-PS (1.84), but much more computationally efficient.
As mentioned in Section 1.3.4, a principal difficulty with maximizing (1.15)
is the fact that the  s in (1.12) are nonquadratic. Maximization is much easier
for quadratic functions, so it is natural to use the surrogate parabola described in
(1.27) to construct a paraboloidal surrogate function for the log-likelihood   in
(1.11).
Using (1.29), define

!  !        


where
parabola



was defined in (1.28). For this choice of curvatures, the




(               !   




is a surrogate for   in the sense that    (   for all  . Summing


these 1D surrogate functions, as in (1.11), leads to the following surrogate function
for the log-likelihood:










(1.85)

This is a surrogate for the log-likelihood in the sense that if we define

'







  



(1.86)

then ' satisfies (1.24), (1.25), and (1.26).


When expanded, the paraboloidal surrogate function ) in (1.85) has the following quadratic form:













  ! !



(1.87)

Paraboloidal surrogates algorithms 41


Maximizing a quadratic form like (1.87) is potentially much easier than the loglikelihood (1.11).
1.6.1

Paraboloidal surrogate with Newton Raphson

 '

 , as in (1.18), the surrogate function


For a quadratic penalty    
' in (1.86) above is a quadratic form. Disregarding the nonnegativity constraint,
in principle we can maximize ' (as in (1.22)) by zeroing the gradient of ' :




 







  ! !



This leads to the following paraboloidal surrogates Newton-Raphson (PS-NR) algorithm:


 

 





 '

 ! !



 '

 









 




(1.88)

There are three problems with this algorithm. The matrix inverse is impractical,
the method appears only to apply to quadratic penalty functions, and nonnegativity
is not enforced. Fortunately all three of these limitations can be overcome, as we
describe next.
1.6.2

Separable paraboloidal surrogates algorithm

In this section we derive an intrinsically monotonic algorithm that we believe


to be the current method of choice for cases where one desired a simultaneous
update without any line searches.
A difficulty in maximizing (1.86) is that in general both ) and  are nonseparable functions of the elements of the parameter vector . However, since each (
in (1.85) is concave, we can apply precisely the same convexity trick of De Pierro
used in (1.59) and (1.60) to form a second separable surrogate function. Since






(
1







1 (



1


1

 












 
 



the natural separable surrogate function for the log-likelihood is

 )










)





(1.89)

42 Statistical Image Reconstruction Methods


where

)









1

1 (





 





(1.90)

Since ) is a separable function, it is easily maximized.


We need to apply a similar trick to separate the penalty function. We assume
that  has the form (1.18). Similar to (1.59) we have


where 0   and

 

0






0

!



0



#


!


#
0
0








!
0

0 #









(1.91)










So since "  is a convex function:

 .














(1.92)

so the natural surrogate function is


















where












0 #



Combining
function:

'

) and  





 

)

!
0








yields the following separable quadratic surrogate



  








'





where

'







)





  







Paraboloidal surrogates algorithms 43


Using the choice (1.67) for the 1  s, the surrogate curvature is


)
 








  1  !



Similarly, if we choose 0





 

 








   !

! ! , where !




! # 0



 



! , then

! ! #

(1.93)



Since the surrogate is separable, the optimization transfer algorithm (1.22) becomes
 


   '


 







   


Since '     is quadratic, it is easily maximized by zeroing its derivative,


leading to the following separable paraboloidal surrogates (SPS) algorithm:
 







      
! # 




   !




(1.94)



was defined in (1.73) and we precompute the  s in (1.93) before iterwhere 


ating. This algorithm is highly parallelizable, and can be implemented efficiently
using the same structure as the Convex-PS algorithm (1.74). It is also easy to form
an ordered subsets version (cf. Section 1.4.6) of the SPS algorithm [109].
See [109, 117] for the extension to nonquadratic penalty functions, which is
based on a parabola surrogate for   proposed by Huber [118].
1.6.3

Ordered subsets revisited

One can easily form an ordered subsets version (cf. Section 1.4.6) of the SPS
algorithm (1.94) by replacing the sums over  with sums over subsets of the rays,
yielding the ordered subsets transmission (OSTR) algorithm described in [109].
Since ordered subsets algorithms are not guaranteed to converge, one may as well
further abandon monotonicity and replace the denominator in the ordered subsets
version of (1.94) with something that can be precomputed. Specifically, in [109]

we recommend replacing the ! s in (1.94) with11

!
11


 






  
  

(1.95)

This trick is somewhat similar in spirit to the method of Fisher scoring [119, 120], in which one
replaces the Hessian with its expectation (the Fisher information matrix) to reduce computation in
nonquadratic optimization problems.

44 Statistical Image Reconstruction Methods

where  
  . For this fast denominator approximation, the OSTR-/
algorithm becomes:

 




   






 

! # 



(1.96)

where  is a cyclically chosen subset of the rays, formed by angular subsampling



by a factor / , where 
was defined in (1.73), and where we precompute




   !     


The results in [109] show that this algorithm does not quite find the maximizer of
the objective function , but the images are nearly as good as those produced by
convergent algorithms in terms of mean squared error and segmentation accuracy.
1.6.4

Paraboloidal surrogates coordinate-ascent (PSCA) algorithm

A disadvantage of simultaneous updates like (1.94) is that they typically converge slowly since separable surrogate functions have high curvature and hence
slow convergence rates (cf. Section 1.3.3). Thus, in [77, 121] we proposed to apply coordinate ascent to the quadratic surrogate function (1.86). (We focus on the
quadratic penalty case here; the extension to the nonquadratic case is straightforward following a similar approach as in Section 1.6.2.) To apply CA to (1.86),
we sequentially maximize '    over each element  , using the most recent
values for all other elements of , as in Section 1.5. We again adopt the shorthand
(1.78) here. In its simplest form, this leads to a paraboloidal surrogates coordinate
ascent (PSCA) algorithm having a similar form as (1.82), but with the inner update
being:

"
 
$ $







  (     

  
  %


%

%
' 

(1.97)

where, before looping over  in each iteration, we precompute




 !     
 

(1.98)

and the following term is maintained as a state vector (analogous to the


(1.83):

(
















%
s

in

Direct algorithms 45
This precomputation saves many flops per iteration, yet still yields an intrinsically
monotonic algorithm. Even greater computational savings are possibly by a fast
denominator trick similar to (1.95), although one should then check for monotonicity after each iteration and redo the iteration using the monotonicity preserving denominators (1.98) in those rare cases where the objective function decreases.
There are several details that are essential for efficient implementation; see [77].
1.6.5

Grouped coordinate ascent algorithm

We have described algorithms that update a single pixel at a time, as in the


PSCA algorithm (1.97) above, or update all pixels simultaneously, as in the SPS
algorithm (1.94) above. A problem with the sequential algorithms is that they are
difficult to parallelize, whereas a problem with simultaneous algorithms is their
slow convergence rates. An alternative is to update a group of pixels simultaneously. If the pixels are well separated spatially, then they may be approximately
uncorrelated12, which leads to separable surrogate functions that have lower curvature [101, 122, 123]. We call such methods grouped coordinate ascent (GCA)
algorithms. The statistics literature has work on GCA algorithms e.g. [124], which
in turn cites related algorithms dating to 1964! In tomography, one can apply the
GCA idea directly to the log-likelihood (1.11) [101,117,123], or to the paraboloidal
surrogate (1.85) [77].
1.7

Direct algorithms

The algorithms described above have all been developed, to some degree, by
considering the specific form of the log-likelihood (1.11). It is reasonable to hypothesize that algorithms that are tailor made for the form of the objective function (1.15) in tomography should outperform (converge faster) general purpose optimization methods that usually treat the objective function as a black box in the
interest of greatest generality. Nevertheless, general purpose optimization is a very
active research area, and it behooves developers of image reconstruction algorithms
to keep abreast of progress in that field. General purpose algorithms that are natural
candidates for image reconstruction include the conjugate gradient algorithm and
the quasi-Newton algorithm, described next.
1.7.1

Conjugate gradient algorithm

For unconstrained quadratic optimization problems, the preconditioned conjugate gradient (CG) algorithm [125] is particularly appealing because it converges
rapidly13 for suitably chosen preconditioners, e.g. [67]. For nonquadratic objective
12

To be more precise: the submatrix of the Hessian matrix of corresponding to a subset of


spatially separated pixels is approximately diagonal.
13
It is often noted that CG converges in  iterations in exact arithmetic, but this fact is essentially
irrelevant in tomography since  is so large. More relevant is the fact that the convergence rate of
CG is quite good with suitable preconditioners.

46 Statistical Image Reconstruction Methods


functions, or when constraints such as nonnegativity are desired, the CG method
is somewhat less convenient due to the need to perform line searches. It may be
possible to adopt the optimization transfer principles to simplify the line searches,
cf. [67].
Mumcuoglu et al. [31, 126] have been particularly successful in applying diagonally preconditioned conjugate gradients to both transmission and emission tomography. Their diagonal preconditioner was based on (1.52). They investigated both
a penalty function approach to encourage nonnegativity [31, 126], as well as active
set methods [127] for determining the set of nonzero pixels [128, 129].
An alternative approach to enforcing nonnegativity in gradient-based methods
uses adaptive barriers [130].
1.7.2

Quasi-Newton algorithm

The ideal preconditioner for the conjugate gradient algorithm would be the inverse of the Hessian matrix, which would lead to superlinear convergence [131].
Unfortunately, in tomography the Hessian matrix is a large non-sparse matrix, so
its inverse is impractical to compute and store. The basic idea of the quasi-Newton
family of algorithms is to form low-rank approximations to the inverse of the Hessian matrix as the iterations proceed [85]. This approach has been applied by
Kaplan et al. [132] to simultaneous estimation of SPECT attenuation and emission distributions, using the public domain software for limited memory, boundconstrained minimization (L-BFGS-B) [133]. Preconditioning has been found to
accelerate such algorithms [132].
1.8

Alternatives to Poisson models

Some of the algorithms described above are fairly complex, and this complexity derives from the nonconvex, nonquadratic form of the transmission Poisson
log-likelihood (1.11) and (1.12). It is natural then to ask whether there are simpler
approaches that would give adequate results in practice. Every simpler approach
that we are aware of begins by using the logarithmic transformation (1.3), which
compensates for the nonlinearity of Beers law (1.2) and leads then to a linear problem

    


(1.99)

Unfortunately, for low-count transmission scans, especially those contaminated by


background events of any type (  ), the logarithm (1.3) simply cannot be used
since    can be nonpositive for many rays. In medium to high-count transmission scans, the bias described in (1.4) should be small, so one could work with the
estimated line integrals (the s) rather than the raw transmission measurements
(the  s).

Alternatives to Poisson models 47


1.8.1

Algebraic reconstruction methods

A simple approach to estimating is to treat (1.99) as a set of


 equations in

 unknowns and try to solve for . This was the motivation for the algebraic reconstruction technique (ART) family of algorithms [11]. For noisy measurements
the equations (1.99) are usually inconsistent, and ART converges to a limit cycle for inconsistent problems. One can force ART to converge by introducing appropriate strong underrelaxation [134]. However, the limit is the minimum-norm
weighted least-squares solution for a particular norm that is unrelated to the measurement statistics. The Gauss-Markov theorem [90] states that estimator variance
is minimized when the least-squares norm is chosen to be the inverse of the covariance matrix, so it seems preferable to approach (1.99) by first finding a statisticallymotivated cost function, and then finding algorithms that minimize that cost function, rather than trying to fix up algorithms that were derived under the unrealistic
assumption that (1.99) is a consistent system of equations.
1.8.2

Methods to avoid

A surprising number of investigators have applied the emission EM algorithm


to solve (1.99), even though the statistics of are entirely different from those
of emission sinogram measurements, e.g. [15]. We strongly recommend avoiding
this practice. Empirical results with simulated and phantom data show that this
approach is inferior to methods such as OSTR which are based on the transmission
statistical model (1.7).
Liang and Ye [135] present the following iteration for MAP reconstruction of
attenuation maps without giving any derivation:
 


 

 





 

 

The iteration looks like an upside down emission EM algorithm. The convergence properties of this algorithm are unknown.
Zeng and Gullberg [136] proposed the following steepest ascent method with a
fixed step-length parameter:
 








     



$
 

$ 



for an interesting choice of penalty   that encourages attenuation values near


those of air/lung, soft tissue, or bone. Without a line search of the type studied
by Lange [95, 97], monotonicity is not guaranteed. Even with a line search it is
unlikely that this algorithm maximizes the objective function since its fixed points
are not stationary points of .

48 Statistical Image Reconstruction Methods


1.8.3

Weighted least-squares methods

Rather than simply treating (1.99) as a system of equations, we can use (1.99)
as the rationale for a weighted least-squares cost function. There are several choices
for the weights.
1.8.3.1

Model-weighted LS

By a standard propagation-of-errors argument, one can show from (1.3) and


(1.7) that

'



   

(1.100)

where  was defined in (1.8). A natural model-weighted least-squares cost function is then







 



   


(1.101)

This type of cost function has been considered in [137]. Unfortunately, the above
cost function is nonquadratic, so finding its minimizer is virtually as difficult as
maximizing (1.15).
1.8.3.2

Data-weighted LS

A computationally simpler approach arises if we replace the estimate-dependent


variance (1.100) with a data-based estimate by substituting the data  for  . This
leads naturally to the following data-weighted least-squares cost function:








(1.102)




where       is a precomputed weight. Minimizing  is straightforward


since it is quadratic, so one can apply, for example, conjugate gradient algorithms or
coordinate descent algorithms [13]. This approach gives more weight to those measurements that have lower variance, and less weight to the noiser measurements.
This type of weighting can significantly reduce the noise in the reconstructed image relative to unweighted least squares. In fact, unweighted least squares estimates are essentially equivalent to FBP images. (The shift-invariant FBP methods treat all data equally, since noise is ignored.) However, as mentioned below
(1.4), data-weighting leads to a systematic negative bias that increases as counts
decrease [14, 15]. So (1.102) is only appropriate for moderate to high SNR problems.
One can also derive (1.102) by making a second-order Taylor expansion of the
log-likelihood (1.12) about [13, 14, 138, 139].

Emission reconstruction 49
1.8.3.3

Reweighted LS

The two cost functions given above represent two extremes. In (1.102), the
weights are fixed once-and-for-all prior to minimization, whereas in (1.101), the
weights vary continuously as the estimate of changes. A practical alternative is
to first run any inexpensive algorithm (such as OSTR) for a few iterations and then

reproject the estimated image  to form estimated line integrals  
.
Then perform a second-order Taylor expansion of the log-likelihood (1.12) around
 to find a quadratic approximation that can be minimized easily. This approach
should avoid the biases of data-weighted least-squares, and if iterated is known as
reweighted least squares [140, 141].
1.9

Emission reconstruction

In emission tomography, the goal is to reconstruct an emission distribution + 


from recorded counts of emitted photons. We again parameterize the emission
distribution analogous to (1.5), letting +  denote the mean number of emissions
from the  th voxel. The goal is to estimate +  +   +
 from projection
measurements      
 . The usual Poisson measurement model is
identical to (1.7), except that the measurement means are given by






  + 

(1.103)



where   represents the probability that an emission from the  th voxel is recorded
by the th detector, and  again denotes additive background counts such as random
coincidences and scatter. (Accurate models for the   s can lead to significant
improvements in image spatial resolution and accuracy, e.g. [142, 143]. The loglikelihood has a similar form to (1.11):

 + 




+
 

where

   



    

(1.104)

This  function is concave for    , and is strictly concave if   .


Since  is linearly related to the +  s (in contrast to the nonlinear relationship in
(1.8) in the transmission case), the emission reconstruction problem is considerably
easier than the transmission problem. Many of the algorithms described above
apply to the emission problem, as well as to other inverse problems having loglikelihood functions of the general form (1.11). We describe in this section a few
algorithms for maximizing the emission log-likelihood  +. Extensions to the
regularized problem are similar to those described for the transmission case.

50 Statistical Image Reconstruction Methods


1.9.1

EM Algorithm

One can derive the classical EM algorithm for the emission problem by a formal
complete-data exposition [17], which is less complicated than the transmission case
but still somewhat mysterious to many readers, or by fixed-point considerations
[79] (which do not fully illustrate the monotonicity of the emission EM algorithm).
Instead, we adopt the simple concavity-based derivation of De Pierro [72], which
reinforces the surrogate function concepts woven throughout this chapter.
The key to the derivation is the following multiplicative trick, which applies

if +  :

 

 +








+ 

  

+


(1.105)

The
  terms in parentheses are nonnegative and sum to unity, so we can apply
the concavity inequality. Since 3
that

 +





  



 is concave on

 ,

it follows

 


!
! 
  +  +  

     
3
 
 

+



!
!
!
 + 
+ 

3


3 




















+



) + + 

The surrogate function ) is separable:




) + +








) +  +





) +  +



 +






+ 


+
(1.106)

Thus the the following parallelizable maximization step is guaranteed to monotonically increase the log-likelihood  + each iteration:

+

   )
 

The maximization is trivial:




+ 
$
) +  +  
  3

$+
+



+  + 




(1.107)




+ 

+ 

Emission reconstruction 51
Equating to zero and solving for +  yields the famous update:
 

+



+

   
    








(1.108)

Unfortunately, the emission EM algorithm (1.108) usually converges painfully


slowly. To understand this, consider the curvatures of the surrogate functions )  :

+

$

  ) +  +   
  
$+
+ 



(1.109)

For any pixels converging towards zero, these curvatures grow without bound. This
leads to very slow convergence; even sublinear convergence rates are possible [76].
1.9.2

An improved EM algorithm

One can choose a slightly better decomposition than (1.105) to get slightly
faster converging EM algorithms [75]. First find any set of nonnegative constants

-   that satisfy

 




  -  

(1.110)



Then an alternative to (1.105) is:


 

 +




- 





+

+ -




  (1.111)



where         -   Again the terms in parentheses in (1.111) are


nonnegative and sum to unity. So a similar derivation as that yielding (1.106) leads
to a new surrogate function:


) +  +  

 +




- 



+ 3    
+

Maximizing as in (1.107) leads to the following algorithm


 

+



+

- 



    
 -

 



    


(1.112)

This algorithm was derived by a more complicated EM approach in [75], and called
ML-EM-3. The surrogate function derivation is simpler to present and understand,
and more readily generalizable to alternative surrogates.

52 Statistical Image Reconstruction Methods


The curvatures of the second )  s just derived are smaller than those in (1.106),
due to the - s (replace + with + - in the denominator of (1.109)). The convergence rate improves as the -  s increase, but of course (1.110) must be satisfied
to ensure monotonicity. Since the EM algorithm updates all parameters simultaneously, the - values must be shared among all pixels, and typically are fairly
small due to (1.110). In contrast the SAGE algorithm [75] updates the pixels sequentially, which greatly relaxes the constraints on the -  s, allowing larger values
and hence faster convergence rates.
1.9.3

Other emission algorithms

Most of the other methods for developing reconstruction algorithms described


in this chapter have counterparts for the emission problem. Monotonic acceleration
is possible using line searches [94]. Replacing the sums over  in (1.108) with sums
over subsets of the projections yields the emission OSEM algorithm [102]; see also
the related variants RAMLA [144] and RBBI [103, 104]. Although the OSEM algorithm fails to converge in general, it often gives reasonable looking images in a
small number of iterations when initialized with a uniform image. Sequential updates rather than parallel updates leads to the fast converging SAGE algorithms [75]
and coordinate ascent algorithms [19], including paraboloidal surrogate variations
thereof [78]. The conjugate gradient algorithm has been applied extensively to
the emission problem and is particularly effective provided one carefully treats the
nonnegativity constraints [31].
1.10

Advanced topics

In this section we provide pointers to the literature for several additional topics,
all of which are active research areas.
1.10.1

Choice of regularization parameters

A common critique of penalized-likelihood and Bayesian methods is that one


must choose (subjectively?) the regularization parameter  in (1.15). (In unregularized methods there are also free parameters that one must select, such as the
number of iterations or the amount of post-filtering, but fiddling these factors to
get visually pleasing images is perhaps easier than adjusting  , since each new 
requires another run of the iterative algorithm.) A large variety of methods for automatically choosing  have been proposed, based on principles such as maximum
likelihood or cross validation e.g., [145154], most of which have been evaluated
in terms of mean-squared error performance, which equally weights bias (squared)
and variance, even though resolution and noise may have unequal importance in
imaging problems.
For quadratic regularization methods, one can choose both  and   to control the spatial resolution properties and to relate the desired spatial resolution to an
appropriate value of  by a predetermined table [20, 5457].

Advanced topics 53
In addition to the variety of methods for choosing  , there is an even larger
variety of possible choices for the potential functions   in (1.17), ranging from
quadratic to nonquadratic to nonconvex and even nondifferentiable. See [155] for
a recent discussion.
The absolute value potential (  "  " ) is particularly appealing in problems
with piecewise constant attenuation maps. However, its nondifferentiability greatly
complicates optimization [156158].
1.10.2

Source-free attenuation reconstruction

In PET and SPECT imaging, the attenuation map is a nuisance parameter; the
emission distribution is of greatest interest. This has spawned several attempts
to estimate the attenuation map from the emission sinograms, without a separate
transmission scan. See e.g., [132, 159, 160].
1.10.3

Dual energy imaging

We have focused on the case of monoenergetic imaging, by the assumption


(1.23). For quantitative applications such as bone densitometry, one must account
for the polyenergetic property of x-ray source spectra. A variety of methods have
been proposed for dual energy image reconstruction, including (recently) statistical
methods [161163].
1.10.4

Overlapping beams

Some transmission scan geometries involve multiple transmission sources, and


it is possible for a given detector element to record photons that originated in
more than one of these sources, i.e., the beams of photons emitted from the various sources overlap on the detector. The transmission statistical model (1.8) must
be generalized to account for such overlap, leading to new reconstruction algorithms [164167].
1.10.5

Sinogram truncation and limited angles

In certain geometries, portions of the sinogram are missing due to geometric


truncation (such as fan-beam geometries with a short focal length). In such cases,
prior information plays an essential role in regularizing the reconstruction problem,
e.g., [43]. Similarly, in limited angle tomography the sinograms are truncated due
to missing angles. Nonquadratic regularization methods have shown considerable
promise for such problems [68].
1.10.6

Parametric object priors

Throughout this chapter we have considered the image to be parameterized by


the linear series expansion (1.5), and the associated regularization methods have
used only fairly generic image properties, such as piecewise smoothness. In some
applications, particularly when the counts are extremely low or the number of

54 Statistical Image Reconstruction Methods


projection views is limited, it can be desirable (or even essential) to apply much
stronger prior information to the reconstruction problem. Simple parametric object
models such as circles and ellipses (with unknown location, shape, and intensity parameters) have been used for certain applications such as angiography [168172]
or for analysis of imaging system designs, e.g., [173, 174]. Polygonal models have
been applied to cardiac image reconstruction, e.g., [175]. More general and flexible object models based on deformable templates have also shown considerable
promise and comprise a very active research area, e.g., [176180]. (See Chapter
3.)
1.11

Example results

This section presents representative results of applying penalized likelihood


image reconstruction to real PET transmission scan data, following [109]. Many
more examples can be found in the references cited throughout this chapter.
We collected a 12minute transmission scan ( s) on a Siemens/CTI ECAT
EXACT 921 PET scanner with rotating rod sources of an anthropomorphic thorax
phantom (Data Spectrum, Chapel Hill, NC). The sinogram size was 160 radial bins
by 192 angles (over 180 ), with 3mm radial spacing. The reconstructed images
were  (  ( pixels that were 4.5mm on each side.
For the penalized-likelihood reconstructions we used a second order penalty
function of the form (1.16) with the following potential function proposed in [97]:

 "   "  

 " 

(1.113)

where    and   &cm were chosen visually. This function approaches


the quadratic  "  "  as  , but provides a degree of edge preservation
for finite . The derivative of  requires no transcendental functions, which is
computationally desirable.
Fig. 1.6 presents a representative example of performance on real PET transmission data. The FBP image is noisy and blurry, since there are only 921K prompt
coincidences in this scan [101]. In the upper right is the emission OSEM algorithm applied to the logarithm (1.3) and (1.99). As discussed in Section 1.8.2,
this approach yields suboptimal images. The lower two images in Fig. 1.6 were
reconstructed by penalized likelihood methods based on (1.15) and (1.16) with the
penalty described above. The lower left image used 2 iterations of the OSTR-16
algorithm (1.96) (modified for a nonquadratic penalty as described in [77]). The
lower right image used 10 iterations of the PSCA algorithm (1.97). Both penalized likelihood images have lower noise and better spatial resolution than FBP and
ML-OSEM-8, as quantified in [77]. There are small differences between the nonconvergent OSTR image and the image reconstructed by the monotonic PSCA algorithm, but whether these differences are important in practice is an open question.
In PET the main purpose of the attenuation map is to form attenuation correction factors (ACFs) for the emission scan. Fig. 1.7 shows sagittal views (47 slices)

Example results 55

FBP

MLOSEM8

PLOSTR16

PLPSCA

Figure 1.6: Reconstructed images of thorax phantom from 12minute PET transmission
scan.

of a patient injected with FDG and scanned with PET. In this case, a 2minute
transmission scan was emulated by binomial thinning of a 12minute transmission
scan [109]. For the subfigures labeled T-PL and T-FBP the ACFs were computed from attenuation maps reconstructed by penalized likelihood methods or by
FBP respectively. For the subfigures labeled E-PL and E-FBP, the emission
data was reconstructed by penalized likelihood methods or by FBP respectively.
The best image (upper left) is formed when both the emission and transmission
images are reconstructed by statistical approaches. The second best image (upper
right) is formed by using statistical reconstruction of the attenuation map, but ordinary FBP for the emission data. Clearly for such low-count transmission scans,
reducing the noise in the ACFs is as important, if not more so, than how the emission images are reconstructed.

56 Statistical Image Reconstruction Methods

EPL,TPL

EFBP,TPL

EPL,TFBP

EFBP,TFBP

Figure 1.7: FDG PET emission images, reconstructed by both FBP (E-FBP) and penalizedlikelihood (E-PL) methods. Attenuation correction was performed using attenuation maps
generated either by transmission FBP (T-FBP) or transmission penalized-likelihood (T-PL)
reconstructions. The use of statistical reconstruction methods significantly reduces image
noise.

1.12

Summary

We have summarized a wide variety of algorithms for statistical image reconstruction from transmission measurements. Most of the ideas underlying these
algorithms are applicable to emission tomography, as well as to image recovery
problems in general.
There is a wide variety of algorithms in part because there is yet to have been
found any algorithm that has all the desirable properties listed in Section 1.3.1.
In cases where the system matrix can easily be precomputed and stored, and a
non-parallel computer is to be used, we recommend the PSCA algorithm of Section 1.6.4. For parallel computing, the conjugate gradient algorithm [31] is a reasonable choice, particularly if exact nonnegativity constraints can be relaxed. If
an inexact maximum is acceptable, the OSTR algorithm of Section 1.6.3 is a very
practical choice, and is likely to be widely applied given the popularity of the emission OSEM algorithm. Meanwhile, the search continues for an algorithm with the
simplicity of OSTR that is parallelizable, monotone and fast converging, and can
accommodate any form of system matrix.

Acknowledgements 57
1.13

Acknowledgements

The ideas in this chapter were greatly influenced by the dissertation research
of Hakan Erdogan [181], who also prepared Fig. 1.6 and 1.7. The author also
gratefully acknowledges ongoing collaboration with Neal Clinthorne, Ed Ficaro,
Ken Lange, and Les Rogers. The author thanks Ken Hanson for his careful reading
of this chapter. This work was supported in part by NIH grants CA-60711 and
CA-54362.
1.14

Appendix: Poisson properties

Suppose a source transmits


photons of a certain energy along a ray passing
through an object towards a specified pixel on the detector. We assume
is a
Poisson random variable with mean
 :

*


*



Each of the
transmitted photons may either pass unaffected (survive passage)
or may interact with the object. These are Bernoulli trials since the photons interact
independently. From Beers law we know that the probability of surviving passage
is given by

,

  

The number of photons / that pass unaffected through the object is a random
variable, and from Beers law:

 /

&


Using total probability:

 /

&
-

-


,   ,

 /

 -    &

&


 

&





&
-

,   ,

&

 

-

 ,  -   

Therefore the distribution of photons that survive passage is also Poisson, with
mean  /

 ,

58 Statistical Image Reconstruction Methods


Furthermore, by applying Bayes rule, for &  -  :

& /

 /
&
-

-
 &


 /  -

,   ,

&  -

&  -



 

&





 ,

 
 ,



  /





 

  

  

Thus, conditioned on / , the random variable


 / has a Poisson distribution
with mean 

  /
. In particular,


 / /
 

  /

which is useful in deriving the transmission EM algorithm proposed in [17].
1.15

References

[1] M. M. Ter-Pogossian, M. E. Raichle, and B. E. Sobel, Positron-emission tomography, Scientific American, vol. 243, pp. 171181, Oct. 1980.
[2] J. M. Ollinger and J. A. Fessler, Positron emission tomography, IEEE Sig. Proc.
Mag., vol. 14, pp. 4355, Jan. 1997.
[3] T. F. Budinger and G. T. Gullberg, Three dimensional reconstruction in nuclear
medicine emission imaging, IEEE Tr. Nuc. Sci., vol. 21, no. 3, pp. 220, 1974.
[4] T. H. Prettyman, R. A. Cole, R. J. Estep, and G. A. Sheppard, A maximumlikelihood reconstruction algorithm for tomographic gamma-ray nondestructive assay, Nucl. Instr. Meth. Phys. Res. A., vol. 356, pp. 40752, Mar. 1995.
[5] G. Wang, D. L. Snyder, J. A. OSullivan, and M. W. Vannier, Iterative deblurring
for CT metal artifact reduction, IEEE Tr. Med. Im., vol. 15, p. 657, Oct. 1996.
[6] R. M. Leahy and J. Qi, Statistical approaches in quantitative positron emission
tomography, Statistics and Computing, 1998.
[7] K. Wienhard, L. Eriksson, S. Grootoonk, M. Casey, U. Pietrzyk, and W. D. Heiss,
Performance evaluation of a new generation positron scanner ECAT EXACT, J.
Comp. Assisted Tomo., vol. 16, pp. 804813, Sept. 1992.
[8] S. R. Cherry, M. Dahlbom, and E. J. Hoffman, High sensitivity, total body PET
scanning using 3D data acquisition and reconstruction, IEEE Tr. Nuc. Sci., vol. 39,
pp. 10881092, Aug. 1992.
[9] E. P. Ficaro, J. A. Fessler, W. L. Rogers, and M. Schwaiger, Comparison of
Americium-241 and Technicium-99m as transmission sources for attenuation correction of Thallium-201 SPECT imaging of the heart, J. Nuc. Med., vol. 35,
pp. 65263, Apr. 1994.

References 59
[10] S. R. Meikle, M. Dahlbom, and S. R. Cherry, Attenuation correction using countlimited transmission data in positron emission tomography, J. Nuc. Med., vol. 34,
pp. 143150, Jan. 1993.
[11] G. T. Herman, Image reconstruction from projections: The fundamentals of computerized tomography. New York: Academic Press, 1980.
[12] A. Macovski, Medical imaging systems. New Jersey: Prentice-Hall, 1983.
[13] K. Sauer and C. Bouman, A local update strategy for iterative reconstruction from
projections, IEEE Tr. Sig. Proc., vol. 41, pp. 534548, Feb. 1993.
[14] J. A. Fessler, Hybrid Poisson/polynomial objective functions for tomographic image reconstruction from transmission scans, IEEE Tr. Im. Proc., vol. 4, pp. 143950,
Oct. 1995.
[15] D. S. Lalush and B. M. W. Tsui, MAP-EM and WLS-MAP-CG reconstruction
methods for transmission imaging in cardiac SPECT, in Proc. IEEE Nuc. Sci. Symp.
Med. Im. Conf., vol. 2, pp. 11741178, 1993.
[16] J. A. Fessler, Mean and variance of implicitly defined biased estimators (such as
penalized maximum likelihood): Applications to tomography, IEEE Tr. Im. Proc.,
vol. 5, pp. 493506, Mar. 1996.
[17] K. Lange and R. Carson, EM reconstruction algorithms for emission and transmission tomography, J. Comp. Assisted Tomo., vol. 8, pp. 306316, Apr. 1984.
[18] C. Bouman and K. Sauer, Fast numerical methods for emission and transmission
tomographic reconstruction, in Proc. 27th Conf. Info. Sci. Sys., Johns Hopkins,
pp. 611616, 1993.
[19] C. A. Bouman and K. Sauer, A unified approach to statistical tomography using
coordinate descent optimization, IEEE Tr. Im. Proc., vol. 5, pp. 48092, Mar. 1996.
[20] J. A. Fessler and W. L. Rogers, Spatial resolution properties of penalized-likelihood
image reconstruction methods: Space-invariant tomographs, IEEE Tr. Im. Proc.,
vol. 5, pp. 134658, Sept. 1996.
[21] P. M. Joseph and R. D. Spital, A method for correcting bone induced artifacts in
computed tomography scanners, J. Comp. Assisted Tomo., vol. 2, pp. 1008, 1978.
[22] B. Chan, M. Bergstrom, M. R. Palmer, C. Sayre, and B. D. Pate, Scatter distribution in transmission measurements with positron emission tomography, J. Comp.
Assisted Tomo., vol. 10, pp. 296301, Mar. 1986.
[23] E. J. Hoffman, S. C. Huang, M. E. Phelps, and D. E. Kuhl, Quantitation in positron
emission computed tomography: 4 Effect of accidental coincidences, J. Comp. Assisted Tomo., vol. 5, no. 3, pp. 391400, 1981.
[24] M. E. Casey and E. J. Hoffman, Quantitation in positron emission computed tomography: 7 a technique to reduce noise in accidental coincidence measurements and
coincidence efficiency calibration, J. Comp. Assisted Tomo., vol. 10, no. 5, pp. 845
850, 1986.

60 Statistical Image Reconstruction Methods


[25] E. P. Ficaro, W. L. Rogers, and M. Schwaiger, Comparison of Am-241 and Tc99m as transmission sources for the attenuation correction of Tl-201 cardiac SPECT
studies, J. Nuc. Med. (Abs. Book), vol. 34, p. 30, May 1993.
[26] E. P. Ficaro, J. A. Fessler, R. J. Ackerman, W. L. Rogers, J. R. Corbett, and
M. Schwaiger, Simultaneous transmission-emission Tl-201 cardiac SPECT: Effect
of attenuation correction on myocardial tracer distribution, J. Nuc. Med., vol. 36,
pp. 92131, June 1995.
[27] E. P. Ficaro, J. A. Fessler, P. D. Shreve, J. N. Kritzman, P. A. Rose, and J. R. Corbett, Simultaneous transmission/emission myocardial perfusion tomography: Diagnostic accuracy of attenuation-corrected 99m-Tc-Sestamibi SPECT, Circulation,
vol. 93, pp. 46373, Feb. 1996.
[28] D. F. Yu and J. A. Fessler, Mean and variance of photon counting with deadtime,
in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., 1999.
[29] D. F. Yu and J. A. Fessler, Mean and variance of singles photon counting with
deadtime, Phys. Med. Biol., 1999. Submitted.
[30] D. F. Yu and J. A. Fessler, Mean and variance of coincidence photon counting with
deadtime, Phys. Med. Biol., 1999. Submitted.
[31] E. U. Mumcuoglu, R. Leahy, S. R. Cherry, and Z. Zhou, Fast gradient-based methods for Bayesian reconstruction of transmission and emission PET images, IEEE
Tr. Med. Im., vol. 13, pp. 687701, Dec. 1994.
[32] M. Yavuz and J. A. Fessler, New statistical models for randoms-precorrected PET
scans, in Information Processing in Medical Im. (J. Duncan and G. Gindi, eds.),
vol. 1230 of Lecture Notes in Computer Science, pp. 190203, Berlin: Springer
Verlag, 1997.
[33] M. Yavuz and J. A. Fessler, Statistical image reconstruction methods for randomsprecorrected PET scans, Med. Im. Anal., vol. 2, no. 4, pp. 369378, 1998.
[34] M. Yavuz and J. A. Fessler, Penalized-likelihood estimators and noise analysis for
randoms-precorrected PET transmission scans, IEEE Tr. Med. Im., vol. 18, pp. 665
74, Aug. 1999.
[35] A. C. Kak and M. Slaney, Principles of computerized tomographic imaging. New
York: IEEE Press, 1988.
[36] A. Celler, A. Sitek, E. Stoub, P. Hawman, R. Harrop, and D. Lyster, Multiple line
source array for SPECT transmission scans: Simulation, phantom and patient studies, J. Nuc. Med., vol. 39, pp. 21839, Dec. 1998.
[37] E. L. Lehmann, Theory of point estimation. New York: Wiley, 1983.
[38] K. Sauer and B. Liu, Nonstationary filtering of transmission tomograms in high
photon counting noise, IEEE Tr. Med. Im., vol. 10, pp. 445452, Sept. 1991.
[39] J. A. Fessler, Tomographic reconstruction using information weighted smoothing
splines, in Information Processing in Medical Im. (H. H. Barrett and A. F. Gmitro,
eds.), vol. 687 of Lecture Notes in Computer Science, pp. 37286, Berlin: Springer
Verlag, 1993.

References 61
[40] M. N. Wernick and C. T. Chen, Superresolved tomography by convex projections
and detector motion, J. Opt. Soc. Am. A, vol. 9, pp. 15471553, Sept. 1992.
[41] S. H. Manglos, Truncation artifact suppression in cone-beam radionuclide transmission CT using maximum likelihood techniques: evaluation with human subjects, Phys. Med. Biol., vol. 37, pp. 549562, Mar. 1992.
[42] E. P. Ficaro and J. A. Fessler, Iterative reconstruction of truncated fan beam transmission data, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 3, 1993.
[43] J. A. Case, T. S. Pan, M. A. King, D. S. Luo, B. C. Penney, and M. S. Z. Rabin,
Reduction of truncation artifacts in fan beam transmission imaging using a spatially
varying gamma prior, IEEE Tr. Nuc. Sci., vol. 42, pp. 22605, Dec. 1995.
[44] S. H. Manglos, G. M. Gagne, A. Krol, F. D. Thomas, and R. Narayanaswamy,
Transmission maximum-likelihood reconstruction with ordered subsets for cone
beam CT, Phys. Med. Biol., vol. 40, pp. 122541, July 1995.
[45] T.-S. Pan, B. M. W. Tsui, and C. L. Bryne, Choice of initial conditions in the ML
reconstruction of fan-beam transmission with truncated projection data, IEEE Tr.
Med. Im., vol. 16, pp. 42638, Aug. 1997.
[46] G. L. Zeng and G. T. Gullberg, An SVD study of truncated transmission data in
SPECT, IEEE Tr. Nuc. Sci., vol. 44, pp. 10711, Feb. 1997.
[47] A. J. Rockmore and A. Macovski, A maximum likelihood approach to transmission
image reconstruction from projections, IEEE Tr. Nuc. Sci., vol. 24, pp. 19291935,
June 1977.
[48] Y. Censor, Finite series expansion reconstruction methods, Proc. IEEE, vol. 71,
pp. 409419, Mar. 1983.
[49] R. Schwinger, S. Cool, and M. King, Area weighted convolutional interpolation
for data reprojection in single photon emission computed tomography, Med. Phys.,
vol. 13, pp. 350355, May 1986.
[50] S. C. B. Lo, Strip and line path integrals with a square pixel matrix: A unified
theory for computational CT projections, IEEE Tr. Med. Im., vol. 7, pp. 355363,
Dec. 1988.
[51] J. A. Fessler, ASPIRE 3.0 users guide: A sparse iterative reconstruction
library, Tech. Rep. 293, Comm. and Sign. Proc. Lab., Dept. of EECS,
Univ. of Michigan, Ann Arbor, MI, 48109-2122, July 1995. Available from
http://www.eecs.umich.edu/ fessler.
[52] D. L. Snyder, M. I. Miller, L. J. Thomas, and D. G. Politte, Noise and edge artifacts
in maximum-likelihood reconstructions for emission tomography, IEEE Tr. Med.
Im., vol. 6, pp. 228238, Sept. 1987.
[53] J. A. Browne and T. J. Holmes, Maximum likelihood techniques in X-ray computed
tomography, in Medical imaging systems techniques and applications: Diagnosis
optimization techniques (C. T. Leondes, ed.), vol. 3, pp. 11746, Amsteldijk, Netherlands: Gordon and Breach, 1997.

62 Statistical Image Reconstruction Methods


[54] J. A. Fessler, Resolution properties of regularized image reconstruction methods,
Tech. Rep. 297, Comm. and Sign. Proc. Lab., Dept. of EECS, Univ. of Michigan,
Ann Arbor, MI, 48109-2122, Aug. 1995.
[55] J. A. Fessler and W. L. Rogers, Uniform quadratic penalties cause nonuniform
image resolution (and sometimes vice versa), in Proc. IEEE Nuc. Sci. Symp. Med.
Im. Conf., vol. 4, pp. 19151919, 1994.
[56] J. W. Stayman and J. A. Fessler, Spatially-variant roughness penalty design for uniform resolution in penalized-likelihood image reconstruction, in Proc. IEEE Intl.
Conf. on Image Processing, vol. 2, pp. 6859, 1998.
[57] J. W. Stayman and J. A. Fessler, Regularization for uniform spatial resolution properties in penalized-likelihood image reconstruction, IEEE Tr. Med. Im., 1998.
[58] J. Qi and R. M. Leahy, A theoretical study of the contrast recovery and variance of
MAP reconstructions with applications to the selection of smoothing parameters,
IEEE Tr. Med. Im., vol. 18, pp. 293305, Apr. 1999.
[59] J. Qi and R. M. Leahy, Resolution and noise properties of MAP reconstruction for
fully 3D PET, in Proc. of the 1999 Intl. Mtg. on Fully 3D Im. Recon. in Rad. Nuc.
Med., pp. 359, 1999.
[60] J. Besag, On the statistical analysis of dirty pictures, J. Royal Stat. Soc. Ser. B,
vol. 48, no. 3, pp. 259302, 1986.
[61] S. C. Huang, R. E. Carson, M. E. Phelps, E. J. Hoffman, H. R. Schelbert, and D. E.
Kuhl, A boundary method for attenuation correction in positron computed tomography, J. Nuc. Med., vol. 22, no. 1, pp. 627637, 1981.
[62] E. Z. Xu, N. A. Mullani, K. L. Gould, and W. L. Anderson, A segmented attenuation
correction for PET, J. Nuc. Med., vol. 32, pp. 161165, Jan. 1991.
[63] M. Xu, W. K. Luk, P. D. Cutler, and W. M. Digby, Local threshold for segmented
attenuation correction of PET imaging of the thorax, IEEE Tr. Nuc. Sci., vol. 41,
pp. 15321537, Aug. 1994.
[64] S.-J. Lee, A. Rangarajan, and G. Gindi, Bayesian image reconstruction in SPECT
using higher order mechanical models as priors, IEEE Tr. Med. Im., vol. 14,
pp. 66980, Dec. 1995.
[65] G. T. Herman, D. Odhner, K. D. Toennies, and S. A. Zenios, A parallelized algorithm for image reconstruction from noisy projections, in Large-Scale Numerical
Optimization (T. F. Coleman and Y. Li, eds.), pp. 321, Philadelphia: SIAM, 1990.
[66] S. Alenius, U. Ruotsalainen, and J. Astola, Using local median as the location of
the prior distribution in iterative emission tomography image reconstruction, IEEE
Tr. Nuc. Sci., vol. 45, pp. 3097104, Dec. 1998.
[67] J. A. Fessler and S. D. Booth, Conjugate-gradient preconditioning methods for
shift-variant PET image reconstruction, IEEE Tr. Im. Proc., vol. 8, pp. 68899,
May 1999.
[68] A. H. Delaney and Y. Bresler, Globally convergent edge-preserving regularized
reconstruction: an application to limited-angle tomography, IEEE Tr. Im. Proc.,
vol. 7, pp. 204221, Feb. 1998.

References 63
[69] K. Lange and J. A. Fessler, Globally convergent algorithms for maximum a posteriori transmission tomography, IEEE Tr. Im. Proc., vol. 4, pp. 14308, Oct. 1995.
[70] R. R. Meyer, Sufficient conditions for the convergence of monotonic mathematical
programming algorithms, J. Comput. System. Sci., vol. 12, pp. 10821, 1976.
[71] J. M. Ortega and W. C. Rheinboldt, Iterative solution of nonlinear equations in several variables. New York: Academic Press, 1970.
[72] A. R. De Pierro, On the relation between the ISRA and the EM algorithm for
positron emission tomography, IEEE Tr. Med. Im., vol. 12, pp. 328333, June 1993.
[73] A. R. De Pierro, A modified expectation maximization algorithm for penalized likelihood estimation in emission tomography, IEEE Tr. Med. Im., vol. 14, pp. 132137,
Mar. 1995.
[74] K. Lange, Numerical analysis for statisticians. New York: Springer-Verlag, 1999.
[75] J. A. Fessler and A. O. Hero, Penalized maximum-likelihood image reconstruction using space-alternating generalized EM algorithms, IEEE Tr. Im. Proc., vol. 4,
pp. 141729, Oct. 1995.
[76] J. A. Fessler, N. H. Clinthorne, and W. L. Rogers, On complete data spaces for PET
reconstruction algorithms, IEEE Tr. Nuc. Sci., vol. 40, pp. 105561, Aug. 1993.
[77] H. Erdogan and J. A. Fessler, Monotonic algorithms for transmission tomography,
IEEE Tr. Med. Im., vol. 18, pp. 80114, Sept. 1999.
[78] J. A. Fessler and H. Erdogan, A paraboloidal surrogates algorithm for convergent
penalized-likelihood emission image reconstruction, in Proc. IEEE Nuc. Sci. Symp.
Med. Im. Conf., vol. 2, pp. 11325, 1998.
[79] L. A. Shepp and Y. Vardi, Maximum likelihood reconstruction for emission tomography, IEEE Tr. Med. Im., vol. 1, pp. 113122, Oct. 1982.
[80] A. P. Dempster, N. M. Laird, and D. B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. Royal Stat. Soc. Ser. B, vol. 39, no. 1, pp. 138,
1977.
[81] C. F. J. Wu, On the convergence properties of the EM algorithm, Ann. Stat.,
vol. 11, no. 1, pp. 95103, 1983.
[82] X. L. Meng and D. B. Rubin, Maximum likelihood estimation via the ECM algorithm: A general framework, Biometrika, vol. 80, no. 2, pp. 267278, 1993.
[83] J. A. Fessler and A. O. Hero, Space-alternating generalized expectationmaximization algorithm, IEEE Tr. Sig. Proc., vol. 42, pp. 266477, Oct. 1994.
[84] C. H. Liu and D. B. Rubin, The ECME algorithm: a simple extension of EM and
ECM with faster monotone convergence, Biometrika, vol. 81, no. 4, pp. 63348,
1994.
[85] K. Lange, A Quasi-Newton acceleration of the EM Algorithm, Statistica Sinica,
vol. 5, pp. 118, Jan. 1995.
[86] D. A. van Dyk, X. L. Meng, and D. B. Rubin, Maximum likelihood estimation via
the ECM algorithm: computing the asymptotic variance, Statistica Sinica, vol. 5,
pp. 5576, Jan. 1995.

64 Statistical Image Reconstruction Methods


[87] X. L. Meng and D. van Dyk, The EM algorithm - An old folk song sung to a fast
new tune, J. Royal Stat. Soc. Ser. B, vol. 59, no. 3, pp. 51167, 1997.
[88] J. A. Fessler, EM and gradient algorithms for transmission tomography with
background contamination, Tech. Rep. UM-PET-JF-94-1, Cyclotron PET Facility, Univ. of Michigan, Ann Arbor, MI, 48109-2122, Dec. 1994. Available from
http://www.eecs.umich.edu/ fessler.
[89] J. T. Kent and C. Wright, Some suggestions for transmission tomography based
on the EM algorithm, in Stochastic Models, Statistical Methods, and Algorithms in
Im. Analysis (M. P. P Barone, A Frigessi, ed.), vol. 74 of Lecture Notes in Statistics,
pp. 219232, New York: Springer, 1992.
[90] H. Stark, Image recovery, theory and application. Orlando: Academic, 1987.
[91] J. A. Browne and T. J. Holmes, Developments with maximum likelihood X-ray
computed tomography, IEEE Tr. Med. Im., vol. 12, pp. 4052, Mar. 1992.
[92] J. M. Ollinger and G. C. Johns, The use of maximum a-posteriori and maximum
likelihood transmission images for attenuation correction PET, in Proc. IEEE Nuc.
Sci. Symp. Med. Im. Conf., vol. 2, pp. 11851187, 1992.
[93] J. M. Ollinger, Maximum likelihood reconstruction of transmission images in emission computed tomography via the EM algorithm, IEEE Tr. Med. Im., vol. 13,
pp. 89101, Mar. 1994.
[94] L. Kaufman, Implementing and accelerating the EM algorithm for positron emission tomography, IEEE Tr. Med. Im., vol. 6, pp. 3751, Mar. 1987.
[95] K. Lange, M. Bahn, and R. Little, A theoretical study of some maximum likelihood
algorithms for emission and transmission tomography, IEEE Tr. Med. Im., vol. 6,
pp. 106114, June 1987.
[96] K. Lange, An overview of Bayesian methods in image reconstruction, in Proc.
SPIE 1351, Dig. Im. Synth. and Inverse Optics, pp. 270287, 1990.
[97] K. Lange, Convergence of EM image reconstruction algorithms with Gibbs
smoothing, IEEE Tr. Med. Im., vol. 9, pp. 439446, Dec. 1990. Corrections, TMI, 10:2(288), June 1991.
[98] P. J. Maniawski, H. T. Morgan, G. L. Gullberg, G. L. Zeng, A. E. Welch, and C. H.
Tung, Performance evaluation of a transmission reconstruction algorithm with simultaneous transmission-emission SPECT system in a presence of data truncation,
in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 4, pp. 157881, 1994.
[99] S. G. Kim, Draft on some modifications of GCA algorithms for transmission CT,
1998. preprint Dec. 16, 1998.
[100] J. A. Fessler and A. O. Hero, Complete-data spaces and generalized EM algorithms, in Proc. IEEE Conf. Acoust. Speech Sig. Proc., vol. 4, pp. 14, 1993.
[101] J. A. Fessler, E. P. Ficaro, N. H. Clinthorne, and K. Lange, Grouped-coordinate ascent algorithms for penalized-likelihood transmission image reconstruction, IEEE
Tr. Med. Im., vol. 16, pp. 16675, Apr. 1997.

References 65
[102] H. M. Hudson and R. S. Larkin, Accelerated image reconstruction using ordered
subsets of projection data, IEEE Tr. Med. Im., vol. 13, pp. 601609, Dec. 1994.
[103] C. L. Byrne, Block-iterative methods for image reconstruction from projections,
IEEE Tr. Im. Proc., vol. 5, pp. 7923, May 1996.
[104] C. L. Byrne, Convergent block-iterative algorithms for image reconstruction from
inconsistent data, IEEE Tr. Im. Proc., vol. 6, pp. 12961304, Sept. 1997.
[105] C. L. Byrne, Accelerating the EMML algorithm and related iterative algorithms by
rescaled block-iterative methods, IEEE Tr. Im. Proc., vol. 7, pp. 1009, Jan. 1998.
[106] J. Nuyts, B. D. Man, P. Dupont, M. Defrise, P. Suetens, and L. Mortelmans, Iterative reconstruction for helical CT: A simulation study, Phys. Med. Biol., vol. 43,
pp. 72937, Apr. 1998.
[107] C. Kamphius and F. J. Beekman, Accelerated iterative transmission CT reconstruction using an ordered subsets convex algorithm, IEEE Tr. Med. Im., vol. 17,
pp. 10015, Dec. 1998.
[108] H. Erdogan, G. Gualtieri, and J. A. Fessler, An ordered subsets algorithm for transmission tomography, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., 1998. Inadvertently omitted from proceedings. Available from web page.
[109] H. Erdogan and J. A. Fessler, Ordered subsets algorithms for transmission tomography, Phys. Med. Biol., vol. 44, pp. 283551, Nov. 1999.
[110] T. Hebert and R. Leahy, A Bayesian reconstruction algorithm for emission tomography using a Markov random field prior, in Proc. SPIE 1092, Med. Im. III: Im.
Proc., pp. 4584662, 1989.
[111] T. Hebert and R. Leahy, A generalized EM algorithm for 3-D Bayesian reconstruction from Poisson data using Gibbs priors, IEEE Tr. Med. Im., vol. 8, pp. 194202,
June 1989.
[112] T. J. Hebert and R. Leahy, Statistic-based MAP image reconstruction from Poisson
data using Gibbs priors, IEEE Tr. Sig. Proc., vol. 40, pp. 22902303, Sept. 1992.
[113] G. Gullberg and B. M. W. Tsui, Maximum entropy reconstruction with constraints:
iterative algorithms for solving the primal and dual programs, in Proc. Tenth
Intl. Conf. on Information Processing in Medical Im. (C. N. de Graaf and M. A.
Viergever, eds.), pp. 181200, New York: Plenum Press, 1987.
[114] C. A. Bouman, K. Sauer, and S. S. Saquib, Tractable models and efficient algorithms for Bayesian tomography, in Proc. IEEE Conf. Acoust. Speech Sig. Proc.,
vol. 5, pp. 290710, 1995.
[115] S. Saquib, J. Zheng, C. A. Bouman, and K. D. Sauer, Provably convergent coordinate descent in statistical tomographic reconstruction, in Proc. IEEE Intl. Conf. on
Image Processing, vol. 2, pp. 7414, 1996.
[116] J. Zheng, S. Saquib, K. Sauer, and C. Bouman, Functional substitution methods in
optimization for Bayesian tomography, IEEE Tr. Im. Proc., Mar. 1997. IEEE Tr.
Image Proc.

66 Statistical Image Reconstruction Methods


[117] J. A. Fessler, Grouped coordinate descent algorithms for robust edge-preserving
image restoration, in Proc. SPIE 3071, Im. Recon. and Restor. II, pp. 18494, 1997.
[118] P. J. Huber, Robust statistics. New York: Wiley, 1981.
[119] H. M. Hudson, J. Ma, and P. Green, Fishers method of scoring in statistical image reconstruction: comparison of Jacobi and Gauss-Seidel iterative schemes, Stat.
Meth. Med. Res., vol. 3, no. 1, pp. 4161, 1994.
[120] S. L. Hillis and C. S. Davis, A simple justification of the iterative fitting procedure
for generalized linear models, American Statistician, vol. 48, pp. 288289, Nov.
1994.
[121] H. Erdogan and J. A. Fessler, Accelerated monotonic algorithms for transmission
tomography, in Proc. IEEE Intl. Conf. on Image Processing, vol. 2, pp. 6804,
1998.
[122] J. A. Fessler, E. P. Ficaro, N. H. Clinthorne, and K. Lange, Fast parallelizable
algorithms for transmission image reconstruction, in Proc. IEEE Nuc. Sci. Symp.
Med. Im. Conf., vol. 3, pp. 134650, 1995.
[123] K. D. Sauer, S. Borman, and C. A. Bouman, Parallel computation of sequential
pixel updates in statistical tomographic reconstruction, in Proc. IEEE Intl. Conf. on
Image Processing, vol. 3, pp. 936, 1995.
[124] S. T. Jensen, S. Johansen, and S. L. Lauritzen, Globally convergent algorithms for
maximizing a likelihood function, Biometrika, vol. 78, no. 4, pp. 86777, 1991.
[125] G. H. Golub and C. F. Van Loan, Matrix computations. Johns Hopkins Univ. Press,
1989.
[126] E. Mumcuoglu, R. Leahy, and S. Cherry, A statistical approach to transmission
image reconstruction from ring source calibration measurements in PET, in Proc.
IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 2, pp. 910912, 1992.
Mumcuoglu and R. M. Leahy, A gradient projection conjugate gradient al[127] E. U.
gorithm for Bayesian PET reconstruction, in Proc. IEEE Nuc. Sci. Symp. Med. Im.
Conf., vol. 3, pp. 12126, 1994.
[128] J. J. More and G. Toraldo, On the solution of large quadratic programming problems with bound constraints, SIAM J. Optim., vol. 1, pp. 93113, Feb. 1991.
[129] M. Bierlaire, P. L. Toint, and D. Tuyttens, On iterative algorithms for linear least
squares problems with bound constraints, Linear Algebra and its Applications,
vol. 143, pp. 11143, 1991.
[130] K. Lange, An adaptive barrier method for convex programming, Methods and
Appl. of Analysis, vol. 1, no. 4, pp. 392402, 1994.
[131] M. Al-Baali and R. Fletcher, On the order of convergence of preconditioned nonlinear conjugate gradient methods, SIAM J. Sci. Comp., vol. 17, pp. 65865, May
1996.
[132] M. S. Kaplan, D. R. Haynor, and H. Vija, A differential attenuation method for
simultaneous estimation of SPECT activity and attenuation distributions, IEEE Tr.
Nuc. Sci., vol. 46, pp. 53541, June 1999.

References 67
[133] R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, A limited memory algorithm for bound
constrained optimization, SIAM J. Sci. Comp., vol. 16, pp. 11901208, 1995.
[134] Y. Censor, P. P. B. Eggermont, and D. Gordon, Strong underrelaxation in Kaczmarzs method for inconsistent systems, Numerische Mathematik, vol. 41, pp. 83
92, 1983.
[135] Z. Liang and J. Ye, Reconstruction of object-specific attenuation map for quantitative SPECT, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 2, pp. 12311235,
1993.
[136] G. L. Zeng and G. T. Gullberg, A MAP algorithm for transmission computed tomography, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 2, pp. 12021204, 1993.
[137] J. M. M. Anderson, B. A. Mair, M. Rao, and C. H. Wu, A weighted least-squares
method for PET, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 2, pp. 12926,
1995.
[138] C. Bouman and K. Sauer, Nonlinear multigrid methods of optimization in Bayesian
tomographic image reconstruction, in SPIE Neural and Stoch. Methods in Image
and Signal Proc., 1992.
[139] C. Bouman and K. Sauer, A generalized Gaussian image model for edge-preserving
MAP estimation, IEEE Tr. Im. Proc., vol. 2, pp. 296310, July 1993.
[140] P. J. Green, Iteratively reweighted least squares for maximum likelihood estimation,
and some robust and resistant alternatives, J. Royal Stat. Soc. Ser. B, vol. 46, no. 2,
pp. 149192, 1984.
[141] M. B. Dollinger and R. G. Staudte, Influence functions of iteratively reweighted
least squares estimators, J. Am. Stat. Ass., vol. 86, pp. 709716, Sept. 1991.
[142] J. Qi, R. M. Leahy, C. Hsu, T. H. Farquhar, and S. R. Cherry, Fully 3D Bayesian
image reconstruction for the ECAT EXACT HR+, IEEE Tr. Nuc. Sci., vol. 45,
pp. 10961103, June 1998.
[143] J. Qi, R. M. Leahy, S. R. Cherry, A. Chatziioannou, and T. H. Farquhar, High resolution 3D Bayesian image reconstruction using the microPET small-animal scanner, Phys. Med. Biol., vol. 43, pp. 100114, Apr. 1998.
[144] J. A. Browne and A. R. D. Pierro, A row-action alternative to the EM algorithm
for maximizing likelihoods in emission tomography, IEEE Tr. Med. Im., vol. 15,
pp. 68799, Oct. 1996.
[145] A. M. Thompson, J. C. Brown, J. W. Kay, and D. M. Titterington, A study of methods or choosing the smoothing parameter in image restoration by regularization,
IEEE Tr. Patt. Anal. Mach. Int., vol. 13, no. 4, pp. 326339, 1991.
[146] J. W. Hilgers and W. R. Reynolds, Instabilities in the optimal regularization parameter relating to image recovery problems, J. Opt. Soc. Am. A, vol. 9, pp. 12731279,
Aug. 1992.
[147] Y. Pawitan and F. OSullivan, Data-dependent bandwidth selection for emission
computed tomography reconstruction, IEEE Tr. Med. Im., vol. 12, pp. 167172,
June 1993.

68 Statistical Image Reconstruction Methods


[148] F. OSullivan and Y. Pawitan, Bandwidth selection for indirect density estimation
based on corrupted histogram data, J. Am. Stat. Ass., vol. 91, pp. 61026, June
1996.
[149] C. R. Vogel, Non-convergence of the L-curve regularization parameter selection
method, Inverse Prob., vol. 12, pp. 53547, Aug. 1996.
[150] P. P. B. Eggermont and V. N. LaRiccia, Nonlinearly smoothed EM density estimation with automated smoothing parameter selection for nonparametric deconvolution
problems, J. Am. Stat. Ass., vol. 92, pp. 14518, Dec. 1997.
[151] D. M. Higdon, J. E. Bowsher, V. E. Johnson, T. G. Turkington, D. R. Gilland, and
R. J. Jaszczak, Fully Bayesian estimation of Gibbs hyperparameters for emission
computed tomography data, IEEE Tr. Med. Im., vol. 16, p. 516, Oct. 1997.
[152] G. Sebastiani and F. Godtliebsen, On the use of Gibbs priors for Bayesian image
restoration, Signal Processing, vol. 56, pp. 11118, Jan. 1997.
[153] Z. Zhou, R. M. Leahy, and J. Qi, Approximate maximum likelihood hyperparameter estimation for Gibbs priors, IEEE Tr. Im. Proc., vol. 6, pp. 84461, June 1997.
[154] S. S. Saquib, C. A. Bouman, and K. Sauer, ML parameter estimation for Markov
random fields, with applications to Bayesian tomography, IEEE Tr. Im. Proc., vol. 7,
pp. 102944, July 1998.
[155] M. Nikolova, J. Idier, and A. Mohammad-Djafari, Inversion of large-support illposed linear operators using a piecewise Gaussian MRF, IEEE Tr. Im. Proc., vol. 7,
pp. 57185, Apr. 1998.
[156] K. Sauer and C. Bouman, Bayesian estimation of transmission tomograms using
local optimization operations, in Proc. IEEE Nuc. Sci. Symp. Med. Im. Conf., vol. 3,
pp. 20892093, 1991.
[157] K. Sauer and C. Bouman, Bayesian estimation of transmission tomograms using
segmentation based optimization, IEEE Tr. Nuc. Sci., vol. 39, pp. 11441152, Aug.
1992.
[158] R. Mifflin, D. Sun, and L. Qi, Quasi-Newton bundle-type methods for nondifferentiable convex optimization, SIAM J. Optim., vol. 8, pp. 583603, May 1998.
[159] Y. Censor, D. E. Gustafson, A. Lent, and H. Tuy, A new approach to the emission
computerized tomography problem: simultaneous calculation of attenuation and activity coefficients, IEEE Tr. Nuc. Sci., vol. 26, Jan. 1979.
[160] A. Welch, R. Clack, F. Natterer, and G. T. Gullberg, Toward accurate attenuation correction in SPECT without transmission measurements, IEEE Tr. Med. Im.,
vol. 16, p. 532, Oct. 1997.
[161] R. E. Alvarez and A. Macovski, Energy-selective reconstructions in X-ray computed tomography, Phys. Med. Biol., vol. 21, pp. 73344, 1976.
[162] N. H. Clinthorne, A constrained dual-energy reconstruction method for materialselective transmission tomography, Nucl. Instr. Meth. Phys. Res. A., vol. 352,
pp. 3478, Dec. 1994.

References 69
[163] P. Sukovic and N. H. Clinthorne, Penalized weighted least-squares image reconstruction in single and dual energy X-ray computed tomography, IEEE Tr. Med.
Im., 1999. Submitted.
[164] J. A. Fessler, D. F. Yu, and E. P. Ficaro, Maximum likelihood transmission image
reconstruction for overlapping transmission beams, in Proc. IEEE Nuc. Sci. Symp.
Med. Im. Conf., 1999.
[165] D. F. Yu, J. A. Fessler, and E. P. Ficaro, Maximum likelihood transmission image reconstruction for overlapping transmission beams, IEEE Tr. Med. Im., 1999.
Submitted.
[166] J. E. Bowsher, M. P. Tornai, D. R. Gilland, D. E. G. Trotter, and R. J. Jaszczak, An
EM algorithm for modeling multiple or extended TCT sources, in Proc. IEEE Nuc.
Sci. Symp. Med. Im. Conf., 1999.
[167] A. Krol, J. E. Bowsher, S. H. Manglos, D. H. Feiglin, and F. D. Thomas, An EM algorithm for estimating SPECT emission and transmission parameters from emission
data only, Phys. Med. Biol., 1999. Submitted.
[168] D. J. Rossi and A. S. Willsky, Reconstruction from projections based on detection and estimation of objectsParts i & II: Performance analysis and robustness
analysis, IEEE Tr. Acoust. Sp. Sig. Proc., vol. 32, pp. 886906, Aug. 1984.
[169] D. J. Rossi, A. S. Willsky, and D. M. Spielman, Object shape estimation from
tomographic measurementsa performance evaluation, Signal Processing, vol. 18,
pp. 6388, Sept. 1989.
[170] Y. Bresler, J. A. Fessler, and A. Macovski, Model based estimation techniques
for 3-D reconstruction from projections, Machine Vision and Applications, vol. 1,
no. 2, pp. 11526, 1988.
[171] Y. Bresler, J. A. Fessler, and A. Macovski, A Bayesian approach to reconstruction
from incomplete projections of a multiple object 3-D domain, IEEE Tr. Patt. Anal.
Mach. Int., vol. 11, pp. 84058, Aug. 1989.
[172] J. A. Fessler and A. Macovski, Object-based 3-D reconstruction of arterial trees
from magnetic resonance angiograms, IEEE Tr. Med. Im., vol. 10, pp. 2539, Mar.
1991.
[173] S. P. Muller, M. F. Kijewski, S. C. Moore, and B. L. Holman, Maximum-likelihood
estimation: a mathematical model for quantitation in nuclear medicine, J. Nuc.
Med., vol. 31, pp. 16931701, Oct. 1990.
[174] C. K. Abbey, E. Clarkson, H. H. Barrett, S. P. Mu ller, and F. J. Rybicki, A method
for approximating the density of maximum likelihood and maximum a posteriori
estimates under a Gaussian noise model, Med. Im. Anal., vol. 2, no. 4, pp. 395
403, 1998.
[175] P. C. Chiao, W. L. Rogers, N. H. Clinthorne, J. A. Fessler, and A. O. Hero, Modelbased estimation for dynamic cardiac studies using ECT, IEEE Tr. Med. Im., vol. 13,
pp. 21726, June 1994.
[176] Y. Amit and K. Manbeck, Deformable template models for emission tomography,
IEEE Tr. Med. Im., vol. 12, pp. 260268, June 1993.

70 Statistical Image Reconstruction Methods


[177] K. M. Hanson, Bayesian reconstruction based on flexible priors, J. Opt. Soc. Am.
A, vol. 10, pp. 9971004, May 1993.
[178] X. L. Battle, G. S. Cunningham, and K. M. Hanson, Tomographic reconstruction
using 3D deformable models, Phys. Med. Biol., vol. 43, pp. 98390, 1998.
[179] G. S. Cunningham, K. M. Hanson, and X. L. Battle, Three-dimensional reconstructions from low-count SPECT data using deformable models, Optics Express, vol. 2,
pp. 22736, 1998. http://www.osa.org.
[180] X. L. Battle and Y. Bizais, 3D attenuation map reconstruction using geometrical
models and free form deformations, in Proc. of the 1999 Intl. Mtg. on Fully 3D Im.
Recon. in Rad. Nuc. Med., pp. 181184, 1999.
[181] H. Erdogan, Statistical image reconstruction algorithms using paraboloidal surrogates for PET transmission scans. Ph.D. thesis, Univ. of Michigan, Ann Arbor, MI,
48109-2122, Ann Arbor, MI., July 1999.

CHAPTER 2
Image Segmentation
Benoit M. Dawant
Vanderbilt University
Alex P. Zijdenbos
McGill University

Contents
2.1
2.2

Introduction
Image preprocessing and acquisition artifacts

73
73

2.3

2.2.1 Partial volume effect


2.2.2 Intensity nonuniformity (INU)
Thresholding

74
74
78

2.3.1
2.3.2

Shape-based histogram techniques


Optimal thresholding

78
79

2.3.3

Advanced thresholding methods for simultaneous


segmentation and INU correction

84

2.4

Edge-based techniques
2.4.1 Border tracing
2.4.2 Graph searching
2.4.3 Dynamic programming
2.4.4 Advanced border detection methods
2.4.5 Hough transforms

2.5

Region-based segmentation
2.5.1 Region growing
2.5.2 Region splitting and merging
2.5.3 Connected component labeling

98
98
99
100

2.6

Classification

101

2.6.1
2.6.2
2.6.3

103
109
110

Basic classifiers and clustering algorithms


Adaptive fuzzy -means with INU estimation
Decision trees

71

88
88
89
91
93
94

72 Image Segmentation
2.6.4
2.6.5
2.7
2.8
2.9

Artificial neural networks


Contextual classifiers

Discussion and Conclusion


Acknowledgements
References

111
116
119
120
120

Introduction 73
2.1

Introduction

Image segmentation, defined as the separation of the image into regions, is one
of the first steps leading to image analysis and interpretation. The goal is to separate the image into regions that are meaningful for a specific task. This may, for
instance, involve the detection of organs such as the heart, the liver, or the lungs
from MR or CT images. Other applications may require the calculation of white
and gray matter volumes in MR brain images, the labeling of deep brain structures
such as the thalamus or the hippocampus, or quantitative measurements made from
ultrasound images. Image segmentation approaches can be classified according to
both the features and the type of technique used. Features include pixel intensities,
gradient magnitudes, or measures of texture. Segmentation techniques applied to
these features can be broadly classified into one of three groups [1]: region-based,
edge-based, or classification. Typically, region-based and edge-based segmentation
techniques exploit, respectively, within-region similarities and between-region differences between features, whereas a classification technique assigns class labels
to individual pixels or voxels based on feature values.
Because of issues such as spatial resolution, poor contrast, ill-defined boundaries, noise, or acquisition artifacts, segmentation is a difficult task and it is illusory
to believe that it can be achieved by using gray-level information alone. A priori
knowledge has to be used in the process, and so-called low-level processing algorithms have to cooperate with higher level techniques such as deformable and
active models or atlas-based methods. These high-level techniques are described
in great detail in Chapters 3 and 17 of this handbook and they will only be mentioned briefly in this chapter to provide the appropriate links. This chapter focuses
on the segmentation of the image into regions based only on gray-level information. Methods relying on a single image, also called mono-modality methods, will
be presented as well as multi-modal methods that take advantage of several coregistered images (see Chapter 8). Because image segmentation is a broad field
that can only be touched upon in a single chapter, the reader will be introduced to
basic segmentation methods. A number of pointers to the pertinent literature will
be provided for more detailed descriptions of these methods or for more advanced
techniques that build upon the concepts being introduced.
2.2

Image preprocessing and acquisition artifacts

Image quality and acquisition artifacts directly affect the segmentation process.
These artifacts, their cause, and possible solutions are discussed in volume 1 of
this series, Physics and Psychophysics. Here, two leading artifacts that affect MR
images, namely partial volume and intensity nonuniformity, are discussed because
several segmentation methods have been proposed by the medical image processing
community to take these effects into consideration in the segmentation process.

74 Image Segmentation

Tissue 1

Tissue 2

I(tissue 1)

I(tissue 2)

Figure 2.1: A 2D view of an image slice cutting through the border between two tissues
(top). The resulting intensity profile along the line is also shown (bottom).

2.2.1

Partial volume effect

Strictly speaking, the so-called partial volume effect (PVE), i.e., the mixing of
different tissue types in a single voxel, is not an artifact, but it is caused by the
finite spatial resolution of the images. PVE causes edge blurring between different
tissue types and reduces the accuracy and reliability of measurements taken on the
images, as is shown in Figure 2.1.
This figure shows how a slice image cuts through the region that separates
two tissues; it also shows the intensity profile that would result in the slice. The figure clearly illustrates that, depending on the angle with which the slice image cuts
the tissue boundary, it will be more or less difficult, if not impossible, to localize
the true location of the edge from the image data. Without increasing the spatial
resolution of the data, PVE is not correctable. It can, however, be modeled, and
the model can be incorporated into the design of a segmentation technique. This
is for instance the case in fuzzy approaches such as the fuzzy -means classifiers
described in Section 2.6.
2.2.2

Intensity nonuniformity (INU)

Intensity nonuniformity (INU), also commonly referred to as shading artifact,


refers to smooth, local changes in the image intensity introduced by the data acquisition technique. This artifact is typically present in MRI data and depends on
a combination of factors, including the shape and electromagnetic properties of the
subject or object being scanned, the spatial sensitivity of the radio-frequency (RF)
receiver coil, gradient-driven eddy currents, the frequency response of the receiver,
and the spatial inhomogeneity of the excitation field [26].
The impact of INU on the visual interpretation of images is limited. It does,
however, affect image segmentation methods because most of them rely, at least
partially, on absolute image intensity values. Figure 2.2 shows an example of an

Image preprocessing and acquisition artifacts 75

intensity

Figure 2.2: An MR image that exhibits intensity nonuniformity in the vertical direction (left)
and the intensity profile along line A-B (right).

image affected by intensity nonuniformity. This figure also shows an intensity profile, taken along the indicated line, which clearly indicates the spatial variation of
the MRI signal.
Correction of intensity nonuniformity is typically based on a multiplicative
model of the artifact:








(2.1)

In this equation,   and   are respectively the observed and true (artifact-free)
signal at spatial location , and   is the distortion factor at the same position.
Noise effects are ignored in this simple model. Using (2.1), the true image intensities are obtained by multiplying the observed image   with the reciprocal of the
estimated nonuniformity field  .
A number of methods for estimating the INU field (often called bias field)
have been proposed in the literature, based on a) specialized acquisition protocols [79], b) images acquired of a homogeneous phantom (the correction matrix
approach) [3,1012], and c) analysis of the image data itself [10,1322]. Recently,
it has been shown that most of the intensity nonuniformity is caused by the geometry and electromagnetic properties of the subject in the scanner [5, 6], which
implies that the correction matrix approach only has limited applicability. Since a
discussion of MRI physics and acquisition sequences is beyond the scope of this
chapter, we will limit ourselves to the discussion of data-driven methods.
Two main approaches have been proposed: either the nonuniformity field is
estimated from the image data directly, or in conjunction with a segmentation algorithm. A number of authors have proposed spatial filtering (often homomorphic
filtering [23]) as a means to estimate the INU field [10, 14, 16, 17, 19]. The disad-

76 Image Segmentation
vantage of this technique is that the frequency spectrum of the INU field   and
that of the true image   are assumed to be separable, which is typically not the
case. As such, spatial filtering methods tend to introduce severe, undesirable filtering artifacts, the effects of which some authors have tried to reduce using heuristic
approaches [14, 16, 17, 19].
Dawant and Zijdenbos [15] have proposed a method for MRI brain scans which
estimates the INU field by interpolating, using a thin-plate spline, the intensity values at a collection of manually labeled white matter locations. The authors subsequently proposed a modified, semi-automated version of this algorithm where the
white matter reference locations were identified using an intermediate classification step [22]. Another approach that relies on an initial segmentation of the image
has been proposed by Meyer et al. [18]. In this work, segmentation of the image
into homogeneous regions is obtained using a method known as LCJ segmentation [24, 25], and the field is modeled by a polynomial. Brechb u hler et al. [13]
also use a polynomial model of the INU field, but rather than relying on an initial
segmentation of the image, they develop a cost function related to the sharpness
of the histogram peaks. Using taboo search [26], they compute the polynomial
coefficients that minimize this function.
A robust method, called N3 (Nonparametric Nonuniform intensity Normalization) has been proposed by Sled et al. [20]. This method is fully automatic and
does not rely on an explicit segmentation of the image, nor on prior knowledge
of tissue class statistics. Instead, the N3 algorithm relies on the observation that
the INU field widens the range of intensities occupied by one tissue class. This
can be viewed as a blurring of the probability density function (pdf) of the image
(approximated by the image histogram).
In order to correct for intensity nonuniformity, N3 aims to iteratively sharpen
the image histogram by removing spatially smooth fields from the image. Since the
space of possible fields that fulfill this requirement is extremely large, reasonable
field estimates are proposed by smoothing a field of local correction factors derived
from an active histogram sharpening operation:
Algorithm 2.1: N3 algorithm for INU correction
1. Perform a logarithmic transform of the image intensities: 
This permits writing Equation 2.1 as     
  .

   

 .

2. Approximate the probability density function (pdf)  of the log-transformed


image  with the histogram of .
3. Sharpen the histogram by de-convolving a Gaussian blurring kernel, representing the assumed pdf  of the bias field , from it. This produces an
estimate of the pdf  of the true image .

Image preprocessing and acquisition artifacts 77


4. Estimate the intensity mapping that maps the original histogram to the sharpened one. This maps the observed image intensities  to local (per voxel)
field estimates , in effect by moving intensity values towards the closest
histogram peak (see [20] for details).
5. Smooth the estimated field  using an approximating B-spline function.
6. If the resulting field estimate is sufficiently different from the one obtained
in the previous iteration, remove  from  and go back to step 2.
7. Exponentiate  to  , the final estimate of the bias field.

Before

Estimated Field

After

(a)

(c)

(e)

(b)

(d)

(f)

Figure 2.3: Intensity nonuniformity correction of a T1-weighted 27-scan averaged gradient


echo MR scan. (a) and (b) transverse and sagittal views of uncorrected data. (c) and (d)
non-uniformity field estimated by the N3 method. (e) and (f) corrected data (taken from [20]).

N3 requires only two parameters: the expected pdf of the INU field and a parameter governing the smoothness of the B-spline field approximation. Sled et
al. [20] have shown that N3 is robust with respect to these parameters, largely because of the iterative nature of the algorithm. Figure 2.3 shows an example of an
MR image suffering from intensity nonuniformity, the N3-estimated INU field, and
the corrected image.

78 Image Segmentation
Other methods, detailed in sections 2.3.2 and 2.6, have been proposed which
couple the INU field estimation to a segmentation technique. This includes the
Expectation-Maximization (EM) segmenter developed by Wells et al. [21], which
was subsequently improved by Guillemaud et al. [27]; an adapted version of the EM
algorithm by Van Leemput et al. [28]; and the fuzzy clustering method described
by Pham and Prince [29].
The remainder of this chapter describes methods and techniques for the segmentation and the classification of images. First, we discuss methods based on the
analysis of the gray-level histogram and we present some methods that have been
used to both segment the images and correct for the INU field. Next, edge-based
and region-based methods are introduced. This is followed by a section on classification methods in which techniques for simultaneous classification and intensity
correction are also described.
2.3

Thresholding

Gray-level thresholding is the simplest, yet often effective, segmentation method.


In this approach objects or structures in the image are assigned a label by comparing
their gray-level value to one or more intensity thresholds. A single threshold serves
to segment the image into only two regions, a background and a foreground; more
commonly however, the objective is to segment the image into multiple regions
using multiple thresholds.
Thresholds are either global or local, i.e., they can be constant throughout
the image, or spatially varying. Local thresholding techniques compute different
thresholds for subsections of the image. These local thresholds can then be interpolated to provide a spatially varying threshold value (a threshold surface or
threshold field), everywhere in the image.
Thresholding techniques can also be categorized as point-based or region-based.
Region-based methods compute a threshold based not only on the gray-level of an
individual pixel, but also based on the properties of its neighborhood. Whether local
or global, point-based or region-based, thresholds are typically estimated from the
intensity histogram using methods that can be categorized into two broad classes:
shape-based and optimal. Each of these are described in the following subsections.
2.3.1

Shape-based histogram techniques

Shape-based techniques are based on features of the histogram. In the case of a


bi-modal histogram, a simple shape-based method would select the threshold as the
minimum value between the two histogram peaks. Other features such as maximum
curvature have also been used to separate white matter, gray matter and CSF in
MR images [30]. To reduce sensitivity to noise, the histogram can be smoothed
or techniques can be used when computing the histogram to improve their peak-tovalley ratio. This can be done, for instance, by eliminating or reducing the influence
of pixels with a high gradient value. The simplest approach is to perform an edge

Thresholding 79
detection operation and to eliminate from the histogram computation all pixels that
have been labeled as an edge. Another approach [31, 32] is to create a modified
histogram as follows:

 

   

        

where     is the output of an edge detector,  ,  , and  are the ,  , and 


dimensions of the image, respectively (if the image is two dimensional, the expression is limited to the and  indices ),
is the symbol used for gray-level in the
histogram,      is the gray-level value in the image,  is the delta function
defined as

  







(2.2)

and  is a function such as


    


    

designed to reduce the contribution of pixels close to the edge to the overall histogram. By varying the value of , the weight associated with each edge pixel
while building the histogram can be adjusted.
2.3.2

Optimal thresholding

Optimal thresholding methods rely on the maximization (or minimization) of a


merit function. These methods can be further classified as being non-parametric or
parametric. In the latter case, a number of assumptions are made about the shape
of the histogram. The most common model is to assume that the histogram is the
sum of normal distributions, but other distributions can be used when the physics
of the problem suggests that normal distributions are inadequate models. The main
difficulty with parametric methods is to estimate the distribution parameters from
the available data. This can be done by using minimum-distance or maximum
likelihood techniques as described below.
Non-parametric optimal thresholding
These methods rely on the definition of a goodness criterion for threshold
 , the between-class
selection. Possibilities include the within-class variance  

variance  , or the entropy. A good example of such a technique is the method
proposed by Otsu [33] in which the optimum threshold is chosen as the one that
   , with   the total variance. This algorithm can be extended
maximizes 


to multiple thresholds. An efficient implementation proposed by Reddi et al. [34]
works as follows:

80 Image Segmentation
Algorithm 2.2: Iterative Computation of Intensity Thresholds
1. Select two initial thresholds  and  ; these can be chosen as  and  ,
respectively, with G the maximum intensity value.
2. Compute the following:
   

    

   
    

 


    
   



with
      









 

3. Update the thresholds  and  as follows:











 

4. If and  are below a preset tolerance, stop; otherwise go to 2.

This algorithm is fast and despite its simplicity has good convergence properties, but it is only one example of the many algorithms of this class that have been
proposed over the years. Good reviews and comparisons of several alternatives can
be found in [35] and [36].
Parametric optimal thresholding
Assuming again a two-class problem and assuming that the distribution of gray
levels for each class can be modeled by a normal distribution with mean  and variance   , the overall normalized intensity histogram can be written as the following
mixture probability density function:

 













 


 
 

with  and  the a priori probability for class 1 and class 2, respectively. It can
be shown [23] that the optimal thresholds, i.e., the threshold that minimizes the

Thresholding 81
probability of labeling a pixel pertaining to class one as being a pixel pertaining to
class 2 and vice and versa are the roots of the following quadratic equation:
 

!  

with




 


  

  

    
 

 
   

 



In the case when the two variances can be assumed to be equal, this expression
simplifies to
 

















Furthermore, if the a priori probabilities are equal, the optimal threshold is


simply chosen as being the average of the means. In the case of mixtures composed
of more than two Gaussians (see Section 2.6 for more details), a decision function

"  "  is computed for each class "  , with  "  the a priori probability
of this class, and 
"  the class probability density function. Segmentation is
achieved by assigning the voxel to the class whose decision function has the highest
value for the voxels gray value. Choosing the optimal thresholds when the mixture
parameters are known is thus a relatively simple matter. The next two paragraphs
discuss methods that have been used to compute these parameters (an exhaustive
treatment of finite mixture distributions can be found in [37]). In both cases, the
number of classes is assumed to be known.
Minimum distance methods aim at minimizing the following expression:
# 








   

in which
  and 
  are the observed and hypothesized histograms, respectively, and in which we have assumed histograms with G gray levels. Unfortunately, carrying the derivations required to determine the analytical solution of
this equation, even in the case of normal distributions, leads to a set of simultaneous transcendental equations that have to be solved numerically. Numerical
methods proposed to solve these equations range from conjugate gradient, Newton,
Levenburg-Marquardt [38], or tree annealing [39]. In [39] this approach was used
for the quantification of MR brain images into white matter, gray matter, CSF, as
well as three partial volume classes. These are modeled as mixtures of CSF and
gray matter, CSF and white matter, and gray and white matter, respectively.

82 Image Segmentation
Maximum likelihood methods do not try to estimate the distribution parameters
by fitting the intensity histogram. Rather they aim at estimating a set of parameters
that maximize the probability of observing the pixel intensity distribution. Suppose
the following:
1. The pixels come from a known number of classes "   

   $

2. The form of the class-conditional probability densities 


"     is known,
    $ .
3. The K parameter vectors       $ as well as the a priori probabilities
 "  for each class are unknown and need to be estimated.
If we assume again that the distribution of intensities for each class is normal,
      and 
"     
 , with 
  a normal distribution with mean  and variance    The overall conditional density function for a
pixel is given by:


 





"    "  

in which  is the complete parameter vector          . Assuming independence between pixels, their joint density function is given by 
 
 ,
with  the total number of pixels. When, as it is the case here, the joint density function is viewed as a function of  for fixed values of the observations, it
is called the likelihood function. The maximum likelihood estimator for the set
of parameters  is the set of parameters that maximizes this likelihood function or,
equivalently, that maximizes the logarithm of this function called the log-likelihood
function 
  
 . The parameters  can thus be obtained by computing
the partial derivatives of the log-likelihood function and solving these equations
with respect to the parameters. Performing this differentiation and using Bayes
rule

 "
   



 " 

  

 "    " 


 " 

(2.3)

Thresholding 83
one finds
 " 





 "
   

   
  
   
  



 "

  "

  "

  " 








 

  " 
 

  "

(2.4)







(2.5)
(2.6)

  

These equations can be best understood if we assume that  "  


   is equal to
one when
 is from class  , and equal to zero otherwise. In this case,  "   is simply the fraction of pixels that have been assigned to class  ;   is the mean of these
pixels, and  their variance. When  " 
   is not binary, these expressions
are weighted quantities, with the value of the weights being the probability that the
voxel pertains to a particular class. It should be noted, however, that these equations constitute a set of simultaneous nonlinear equations that do not have a unique
solution. Usual convergence issues or dependence on the starting point need to be
taken into consideration when solving these equations numerically. If a reasonable
starting point can be obtained, a good method to solve these equations is to use an
Expectation-Maximization (EM) [40] approach. When applied to the estimation
of mixture parameters, EM is an iterative technique that first estimates  "  
  
using 2.3 assuming that the parameters are known, then estimates the parameters,
using 2.4 - 2.6 assuming that  " 
   is known. This leads to the following
iterative procedure.

Algorithm 2.3: EM Algorithm for estimation of tissue class parameters


1. Compute the a posteriori probability  "  
      using estimated values for the parameter vector  (at the first iteration, an estimate for the parameter vector can be obtained, for instance, from the histogram using a
shape-based technique).
2. Use equations 2.42.6 and the  , compute  "    and  .
3. If the class probabilities, means, and variances did not change from one iteration to the other, stop. Otherwise go to step 1.

84 Image Segmentation

0.040
Original Estimate
Final Estimate
True Distribution

Probability

0.030

0.020

0.010

0.000
20

40

60
Gray Level

80

100

Figure 2.4: EM algorithm for mixture parameter estimation; simulated data.

Figures 2.4 and 2.5 show examples of results obtained with the EM algorithm
on simulated and real data, respectively. The histogram in the second figure has
been obtained from an MR image volume of the brain. The three peaks correspond
to CSF, gray matter, and white matter, respectively. In both cases, the dotted lines
shows the mixture probability density function obtained with the initial parameters.
The curves labeled with the diamonds and the crosses are the true distributions and
the distributions obtained with the parameter estimated with the EM algorithm, respectively. Note that for the simulated case, the true distribution and the distribution
obtained with the EM algorithm are indistinguishable.
2.3.3

Advanced thresholding methods for simultaneous segmentation and INU


correction

Variations on the EM algorithm have been proposed to segment MR images and


correct for INU field simultaneously. The first of these approaches was developed
by Wells et al. [21]. Rather than modeling the log-transformed (see Section 2.2.2)
tissue intensity distributions with Gaussians with known means and variances, these
authors model them as follows:


 "       




 

(2.7)

with  the value of the field a location  To capture the a priori knowledge that
the field is slowly variant, it is modeled as a M-dimensional (with M the total
number of voxels in the image) zero-mean Gaussian probability density function
      , with  the    covariance matrix of this distribution. The
posterior probability of the field given the observed intensity values can be written

Thresholding 85

Original Estimate
Final Es imate
True Distribution

0.020

Probability

0.015

0.010

0.005

0.000
20

40

60
Gray Level

80

100

Figure 2.5: EM algorithm for mixture parameter estimation; histogram computed from real
MR volume.

as

 
  
 

 



The INU field is then taken as the one with the largest posterior probability. In
practice, however, the covariance matrix of the distribution used to model the field
is huge and impractical to compute. The following scheme is introduced to solve
the problem. First, the weights % are introduced:
% 




 "     " 

   

 
 " 

" 

(2.8)

These express the probability that a bias-corrected voxel with gray-level value

belongs to class  Next, the mean residual values are computed as
& 




%

 



(2.9)

Again, this equation can be best understood if one supposes that the weights
are binary i.e., %  if voxel belongs to class  and zero otherwise. The
residual & is then simply the difference in intensity value between the mean of
class  and the intensity value of the particular voxel. If the classification is correct,
this difference is a good estimator of the field value at this point. The difference
is further divided by the variance of the class to capture the confidence in this

86 Image Segmentation
estimator. A residue image ' is created by computing the residual value at every
voxel. The field is finally obtained by applying a low-pass filter ( derived from the
mean variance of the tissue class density functions and the covariance of the field
on the residue image. This algorithm can be summarized as follows:

Algorithm 2.4: EM Algorithm for bias field correction


1. Compute the weights % using equation 2.8 and the residue image using
equation 2.9 assuming that the field is known (initially, the field is assumed
to be zero everywhere).
2. Estimate the field  as ( ' with ( a low-pass filter and the convolution
operation. If the field estimation changes from iteration to iteration, go to
step 1; otherwise stop.

This approach suffers from two main weaknesses. First, the mean and the variances of the various tissue classes are assumed to be known a priori and are not reestimated from iteration to iteration. Second, in regions where the partial volume
effect is important and when such partial volume regions are not modeled explicitly
in the original intensity probability density function, residual values can be erroneously large. This, in turn, affects the accuracy of the field estimator. Guillemaud
and Brady [27] have proposed an extension to this algorithm in which the overall
intensity probability density function is modeled as a sum of Gaussians with small
variances plus a non-Gaussian distribution. They have shown that this approach
increases the robustness of the algorithm. But, this approach still requires knowing
a priori the distribution parameters. These are typically estimated by identifying
manually representative voxels for each tissue class in the image. This may result
in segmentations that are not fully reproducible.
Van Leemput et al. [28] have proposed a generalization of the EM algorithm
that is fully automatic and in which the distribution parameters are re-estimated
from iteration to iteration. Rather than modeling the field using a zero-mean Gaussian distribution, they use a parametric model. The field is expressed as a linear
combination   )  of smooth basis functions )  In their approach, equation 2.7 is rewritten as follows:


 "       







)   

Following the same procedure that was used to derive equations 2.4-2.6, the

Thresholding 87

         
    
         
    

mean and variance parameters for the distributions are computed as:




)   "
 

  "
 

  "
 



..
.




* % & with * 

 * % *

(2.10)

The vector of coefficients is computed as:

 


 

) 

  "
 

(2.11)

+   +  
+   +  

..
.

..
.

..

(2.12)
and
%  , -
   

&




..
.



   

,


 "
     

  
 
" 
"



The weights and the residual matrix & are similar to those used by Wells et
al. Here, the field is computed by fitting a polynomial to the residual image using
a weighted least-squares (equation 2.12). The weights used in this equation are
inversely proportional to the weighted variance of the tissue distributions. If a pixel
is classified into a class with large variance, its contribution to the fit is reduced.
Conversely, pixels that have been classified to classes with small variances such as
white matter or gray matter will have a large impact on the values of the coefficients.

Algorithm 2.5: EM Algorithm for tissue class parameters and bias field estimation
1. Compute the a posteriori probability  "  
      using estimated values
for the parameter  and the bias field coefficient vector  (at the first iteration, an estimate for the parameter vector  can be obtained, for instance,
from the histogram using a shape-based technique, and the field can be assumed to be uniform).

88 Image Segmentation
2. Compute the parameter vector  using equations 2.10, 2.11 and the a posteriori probabilities computed in step 1.
3. Using equation 2.12, and the estimators computed in step 2, compute the bias
field coefficient vector  .
4. If the parameter and coefficient vectors did not change from one iteration to
the other, stop; otherwise, go to step 1.

Extension of this formalism to multi-modal images and corrections for interslice intensity variations in 2D MR image acquisition sequences have been presented in [28]. These authors also propose modeling some of the classes in the
image with non-Gaussian distributions and using a priori class information derived
from an atlas (see Chapter 17 for more information on this topic). This procedure allows them to use both intensity information and spatial information in the
segmentation process.
2.4

Edge-based techniques

In edge-based segmentation approaches, the processed image is described in


terms of the edges (boundaries) between different regions. Edges can be detected
with a variety of edge operators, generally named after their inventors. The most
popular ones are the Marr-Hildreth or LoG (Laplacian-of-Gaussian), Sobel, Prewitt, and Canny operators. Mathematical morphology has also been employed extensively for edge detection and it is treated as an independent topic in Chapter 4.
Although the final objective of image segmentation is to extract structures of interest in the images, edge detection algorithms produce only a series of boundary
candidates for these structures. These candidates need to be linked appropriately to
produce robust and accurate boundaries. Algorithms based on deformable models
described in Chapter 3 are efficient methods to use high-level information about the
structure of interest in the segmentation process. Border tracing and graph searching methods described here are also very powerful paradigms for using a priori
knowledge about the structure intensity, shape, or texture characteristics to guide
the segmentation process.
2.4.1

Border tracing

The simplest solution to finding the boundary of a structure is to follow the


edge detection operation by a border tracing algorithm. Assuming that the edge
detector produces both an edge magnitude    and an edge orientation )   ,
the successor   of a boundary pixel  is chosen as the pixel in the neighborhood

Edge-based techniques 89
(4- or 8-connected) for which the following inequalities hold:

)  

  
)

  

   mod 
  
  

 

 





with     and  predetermined thresholds. If more than one neighbor satisfies


these inequalities, then the one that minimizes the differences is chosen. The algorithm is applied recursively, and neighbors can be searched, for instance, starting
from the top left and proceeding in a row wise manner. The problem with this type
of simple algorithm is that it stops if none of the neighbors satisfies the aforementioned inequalities, a situation that occurs in most practical situation. Furthermore,
once committed to a path, the algorithm cannot backtrack when it has reached a
dead end. More robust border tracing algorithms involve finding a border that is
globally optimal in some sense over its entire length. Graph searching and dynamic programming are two methods that are often used for this purpose.
2.4.2

Graph searching

The application of graph searching techniques for boundary detection can be


attributed to Martelli [41] who used Nilssons A-algorithm [42]. A graph consists
of a set of points called 0, 1 and a set of 2 01 connecting the nodes. A -
through the graph is defined as a set of links that connect a 1-3 node to an 0,
node. Each path through the graph represents a possible solution to the problem.
Graph searching is a very general problem solving method and, in the context of
boundary detection, each node is a candidate boundary pixel and each path through
the graph corresponds to a possible boundary. The  -2 path is the overall best
solution; in our case the best border. Specifying the problem requires associating
a 1 with every node as well as a 3-01  0 1, capturing the cost associated
with the transition from one node to the other along a particular link. Finding the
optimal path then involves finding the path with the minimum cost. For boundary
detection purposes, the cost associated with a node could, for instance, be computed
using an edge detection operator. The more likely a pixel is to be an edge, the lower
its cost. The transition cost may be related to the distance between successive pixels
or the difference in edge strength or edge orientation between successive nodes.
Figure 2.6 illustrates how an edge image can be transformed into a directed graph.
The left panel shows candidate boundary points with their orientation. The right
panel shows that several paths corresponding to different boundaries are possible.
A number of algorithms have been proposed in the literature to find paths in directed graph (see [43] for a good introduction to various search methods). Among
these methods, the * algorithm has received wide attention for border detection.
The * algorithm is a so-called branch-and-bound search with an estimate of remaining distance (lower bound estimate). In addition to the node and transition

90 Image Segmentation

Figure 2.6: Left panel, edge map (magnitude and direction); right panel directed graph
derived from the edge map.

costs, this algorithm does require an estimate of the remaining distance to the end
node. For boundary detection, such an estimate can be computed as the difference
between the current length of the boundary and its expected total length.
Defining the cost of a node along a particular path 0   0 as    


 0   
  2 0  with 2 0 and 0    the local and transition
costs, the total cost at a particular node can be written as  0   0
0, with
0 the lower bound estimate. The * algorithm can then be described as follows.

Algorithm 2.6:  Algorithm for optimal path search with lower bound estimate (Adapted from [43].)
1. Select the starting node, expand it, put all its successors on an OPEN list, and
set a pointer back from each node to its predecessor.
2. If no node is on the OPEN list, stop; otherwise continue.
3. From the OPEN list, select the node 0 with the smallest cost  0. Remove
it from the OPEN list, and mark it CLOSED.
4. If 0 is is a goal node, backtrack from this node to the start node following the
pointers to find the optimum path and stop.
5. Expand 0, generating all of its successors.
6. If a successor is not CLOSED or on the OPEN list, set  0     0

0 0 
2 0 , put it on the open list, and set a pointer to its predecessor.

Edge-based techniques 91
7. If a successor 0 is already on the OPEN list or CLOSED, update it us
ing  0    0   0
0 0  , mark OPEN the CLOSED nodes
whose cost was lowered and redirect the pointers of the nodes for which the
cost was lowered to 0 Go to Step 2

In general this algorithm does not lead to an optimal path, but if 0 is truly
a lower bound estimate on the cost from node 0 to an end node, the path is optimal [44].
2.4.3

Dynamic programming

Dynamic programming is another approach used to compute optimal paths in a


graph that is well suited to situations in which the search for boundaries can be done
along one dimension. The key idea in dynamic programming is the principle of
optimality [45] which states that an optimal policy has the property that whatever
the initial state and initial decision are, the remaining decision must constitute an
optimal policy with regards to the state resulting from the first decision. In the
context of optimal path selection, this means that the choice of a transition from
0 to 0  can be made based on only the cost of the optimum path to 0  and the
local cost associated with a move from 0  to 0  . This algorithm can be applied
recursively and it is easily explained by means of an example. Suppose a 2D image
is stored in a two-dimensional array. Suppose also that a boundary runs from the top
to the bottom of the image. Suppose, finally, that a static cost matrix 4 is associated
with the image. In this matrix, small values indicate highly likely edge locations.
The left panel of figure 2.7 illustrates such a cost matrix. Each entry in this matrix
represents a node and any path in this matrix is a possible boundary. The optimal
path, i.e., the path that minimizes the overall cost of the path is computed by means
of a cumulative cost matrix. This matrix is computed in a row-column manner.
Starting on the second row, the cumulative cost of a node     is computed for
each column as
5     

 5    
        
4    
  

with     the set of possible predecessors of    . In the case of 8  


 
   
 

connectivity,      
         is the cost associated with a transition from one node to the
other along a link and 4     is the static cost of node     The middle
panel of figure 2.7 shows the cumulative cost matrix computed for the static cost
matrix shown on the left, assuming that the transition cost is zero. The node on the
last row of the cumulative cost matrix with the lowest value is the node that corresponds to the optimum path. To determine this path, a matrix of pointers is created
at the time the cumulative cost matrix is computed, as shown on the right panel of

92 Image Segmentation

Figure 2.7: Left panel: static cost matrix (small number indicates highly likely edge location); middle panel: cumulative cost matrix computed; right panel: pointer array and optimum
path (shaded circles).

the figure. Entries in this matrix simply point to the node from which the optimum
path reaching a particular node originates. The optimum path is thus determined
by starting at the end node with the lowest value and following the pointers back to
the first node.
In this example, the search for an optimal path is greatly simplified because
the boundary is elongated and the search is, in fact, only a 1D search performed
on the columns of the matrix for each row. Typical applications for this technique
include the detection of vessels in medical images. But, dynamic programming has
also been used successfully for the detection of closed contours. The only thing required to do so is to apply a spatial transformation (such as a polar transformation
as proposed by Gerbrands [46] for the detection of the ventricle in scintigraphic
images) to the image prior to boundary detection. The purpose of this geometric
transformation is to transform a 2D search problem into a 1D problem. Figure 2.8
illustrates a possible approach applied to the detection of the brain in MR images.
In this case, an approximate contour of the brain can be obtained either by manual delineation or by applying a series of image processing operators [30]. Lines
perpendicular to this first approximation are computed (second panel) and the transformed image (third panel) is created row by row by interpolating intensity values
in the original image along each perpendicular line. The optimum path is computed
in the transformed image, and mapped back to the original image (fourth panel).
To create a closed contour, the first row of the transformed matrix is copied to the
bottom. The last point in the path is then forced to be the same as the first one.
If the starting node is not known, each of the voxels in the first row is chosen as
a starting point and a closed contour is computed for each of these. The closed

Edge-based techniques 93

(a)

(b)

(c)

(d)

Figure 2.8: Example of a geometric transform: a) the original image with an approximation
of the true boundary; b) perpendicular lines computed along the original contour; c) spatially
transformed image; d) optimum contour.

contour with the smallest cost is chosen as the optimum one.


Graph searching and dynamic programming methods are generic. The difficulty with these approaches is the design of domain-specific cost functions that
lead to robust and accurate results. Typical terms used in the cost function are
gradient magnitude and direction, but more complex functions can be crafted. For
instance, in [30], a six term cost function has been used. These include a gradient
term, terms designed to attract the contour to pixels in a specific intensity range, a
term used to control the stiffness of the contour, as well as a terms used to integrate
results from various low-level processing operations performed on the transformed
image.
2.4.4

Advanced border detection methods

Basic edge-based techniques described in this section have been extended by


several authors. Sonka et al. [47] propose a method by which both sides of the
coronary arteries are detected simultaneously in coronary angiograms. This is done
by building a 3D graph and computing an optimal path in this graph. The advantage
of this approach is that when edge pixels are clearly identified on one side of the
vessel, the boundary on this side can constrain the location of the boundary on the
other side where edge pixels might not be clearly visible.
True 2D dynamic programming has been used by Geiger et al. [48] for the detection and tracking of the left and right heart ventricles. The size of the graph to
be built to perform 2D dynamic programming is of order 0  , with 0 the number of layers in the graph (the length of the boundary) and  and  the  and
 dimensions of the image, respectively. For large images, memory and computation resources required to perform the search become prohibitive. Typically, the
problem is circumvented by limiting the search to regions in which the boundary
is known to be using user-specified search regions and by subdividing the entire
contour into several sub-contours. Constraints are then imposed at the sub-contour

94 Image Segmentation
interfaces.
2D dynamic programming has also been proposed [49, 50] for rapid semiautomatic segmentation. In this approach, called live-wire, a starting point is specified and the cumulative cost and pointer matrices are computed for every pixel in
the search region. The user clicks on one pixel located on a boundary of interest
and the entire boundary between this point and the starting point is returned as being the minimum cost path between these points. Later, a best-first graph search
method was used to speed up the process [51].
Graph searching has also been used to detect borders in sequences of images [52] despite the fact that this technique does not extend readily from 2D contour to 3D surface detection. The problem was solved by transforming the sequence
of images into a data structure suitable for graph-searching. But the complexity of
the algorithm was such that, in practice, optimal paths could not be computed.
Heuristics were used to find a suboptimal, yet satisfactory, solution.
Another suboptimal, yet efficient, method for surface detection has been proposed by Frank [53], a good description of which can be found in [54]. The algorithm works in two passes. First, a 3D cumulative cost matrix that has the same
dimension as the original image is created. For a plane-like surface (i.e., a surface
that is not closed) this is done using a method referred to as surface-growing. The
image is traversed in  -- coordinate order and partial costs are accumulated from
voxel to voxel. Once the cost matrix is created, it is traversed in reverse order. The
surface is computed by choosing, for each voxel, predecessors that have minimum
cost and meet connectivity constraints. The method has been applied, for instance,
to the detection of the mid-sagittal plane in MR images. This technique has also
been used for the detection of surfaces with cylindrical shape such as the arterial
lumen in ultrasound images.
A good source for a C code implementation of several of the algorithms discussed in this section as well as in subsequent sections is the book by Pitas [55].
2.4.5

Hough transforms

The Hough transform [56] permits the detection of parametric curves (e.g., circles, straight lines, ellipses, spheres, ellipsoids, or more complex shapes) in a binary
image produced by thresholding the output of an edge detector operator. Its major
strength is its ability to detect object boundaries even when low-level edge detector
operators produce sparse edge maps. The Hough transform will be introduced first
for the simplest case involving the detection of straight lines in a binary image. To
do so, define the parametric representation of a line in the image as   -
6. In
the parameter space - 6, any straight line in image space is represented by a single point. Any line that passes through a point     in image space corresponds
to the line 6  -
 in parameter space. If two points are co-linear in image
space and located on the line   -  
6 , their corresponding lines in parameter
space intersect at -  6  This is illustrated in Figure 2.9 and it suggests a very

Edge-based techniques 95
y

b
b = -ax1 + y1

y = a0x+b0
y2

b = -ax2 + y2

b0

y1
x1

x2

a0

Figure 2.9: Left panel: straight line in the image space; right panel: loci of straight lines
passing through

  and   in parameter space.

simple procedure to detect straight lines in an image. First, discretize the parameter
space - 6 and create a two dimensional accumulator array. Each dimension in
this array corresponds to one of the parameters. For every on pixel      in the
binary image (i.e., every pixel that has been retained as a possible boundary pixel),
compute 6  -
 for every value of the discrete parameter - and increment
the value of the entry - 6 in the accumulator array by one. At the end of the procedure, the count in each entry in the accumulator array * -   6  corresponds to
the number of points lying on a straight line   -  
6  The accumulator array is
then thresholded above a predefined value to detect lines above a minimum length
and to eliminate spurious line segments.
Although convenient for explanation purposes, the parametric model used before is inadequate to represent vertical lines; a case for which - approaches infinity.
To address this problem, the normal representation of a line can be used:
3    7
  7 

This equation describes a line having orientation 7 at a distance 3 from the origin
as shown on the left panel of Figure 2.10. Here, a line passing through the point
    in the image corresponds to a sinusoidal curve 3    7
  7 in
the 3 7 parameter space. Points located on the line passing through     and
    in the original image, are located at the intersection of the two sinusoidal
curves 3    7
  7 and 3    7
  7 in the parameter space.
The right panel of Figure 2.10 shows three sinusoidal curves corresponding to three
points  ,  , and  located on the line shown on the left panel of this figure.
The intersection of the sinusoidal curves is located at 7    3  . The angle
7 ranges
 to  measured from the  axis. The parameter 3 varies
from

 
  to  
  with  and  the dimensions of the image.
from
Negative values of 3 correspond to lines with a negative intercept while positive
values of 3 correspond to lines with a positive intercept.
The Hough transform can be used for the detection of more complex paramet-

96 Image Segmentation

y
50

P3
r

P2
P1
50

Figure 2.10: Left panel: straight line in the image space; right panel: loci of straight lines
passing through  ,  , and  in the   parameter space.

ric curves in 2D or 3D if the dimension of the parameter space is increased. For


instance, a circle described by the following equation:


-


6  3 

(2.13)

needs three parameters: the radius 3 and the coordinates - 6 of its center (a sphere
would require four parameters). Figure 2.11 illustrates how the Hough transform
works in this case. In both panels, the dotted line is the circle with radius 3  and
centered at -  6  to be detected in the image. For a fixed radius, and for an edge
pixel    , the locus of points in parameter space is a circle centered at    ,
i.e.,
-

3  7 

3  7 

(2.14)

The left panel in the figure shows the locus of points (solid lines) in parameter
space for a number of edge pixels when the radius 3 is chosen smaller than the true
radius 3 . The right panel shows the locus of points in parameter space for four
edge pixels when the radius 3 is equal to 3   In this case all the circles intersect
at -  6  The accumulator cell at -  6  3  will thus be larger than any other
accumulator cell in the array.
The amount of computation required to build the accumulator array can be
greatly reduced if the direction of the edge can also be obtained from the edge
detector operator [55, 57]. Suppose, again, the problem of detecting a circle in the
image. If the edge direction of an edge pixel is known, this edge pixel can only be
part of one of the two circles of a given radius that are tangent to the edge. For a

Edge-based techniques 97

r smaller than true radius

b0

r equal to true radius

b0

a0

a0

Figure 2.11: Left panel: loci of parameters a and b for a fixed radius smaller than  ; Right

panel: loci of parameters a and b for a fixed radius equal to 

given 3, these correspond to only two points in parameter space. This would require
incrementing only two cells for each radius, i.e, limit the value of 7 in Eq. (2.14)
to two values 
) with ) the direction of the edge. In practice, however,
the accurate estimation of an edge direction is difficult and several accumulator
arrays are incremented. This is done by allowing ) to vary within an interval the
width of which is dictated by the reliability of the edge direction estimator. The
Hough transform can be generalized further for any curve with a known parametric
expression using the same procedure. It should be noted, however, that the rapid
increase in the size of the accumulator arrays limits its use to curves with only few
parameters.
Often, the parametric representation of a shape of interest is not known. In this
case, the generalized Hough transform [57,58] can be used. This technique builds a
parametric representation of a structure boundary from examples and it permits the
detection of this boundary, possibly rotated and scaled in new images. The Hough
transform has and is being used as part of segmentation procedures in a wide variety of applications such as the detection of the longitudinal fissure in tomographic
head images [59], the registration of sequences of retinal images [60], the detection
of the left ventricle boundary in echocardiographic images [61], the classification
of parenchymal patterns in mammograms [62], or the tracking of guide wire in
the coronary arteries in x-ray images [63]. A good description and comparison of
different varieties of the Hough transform can be found in [64]. Several implementations of the generalized Hough transforms are compared in [65].

98 Image Segmentation
2.5

Region-based segmentation

Region-based techniques segment the image ' into  regions &  based on some
homogeneity property. This process can be formally described as follows:


 

'



& 

for  

 &

 &8 9 for    

& 

 *49 for  

&
 &

&

in which   is a logical predicate. These equations state that the regions &   &
need to cover the entire image and that two regions are disjoint sets. The predicate
captures the set of conditions that must be satisfied by every pixel in a region, usually homogeneity criteria such as average intensity value in the region, texture, or
color. The last equation states that regions &  and & are different, according to the
set of rules expressed by the predicate  Region-based segmentation algorithms
fall into one the following broad categories: region growing, region splitting, and
split-and-merge.
2.5.1

Region growing

The simplest region-based segmentation algorithms start with at least one seed
(a starting point) per region. Neighbors of the seed are visited and the neighbors
that satisfy the predicate (a simple predicate compares the intensity values of the
pixel to the average intensity value of the region) are added to the region. Pixels
that satisfy the predicate of more than one region are allocated to one of these
arbitrarily. A good seeded region algorithm is the one proposed by Adams and
Bischof [66]. Suppose there are 0 regions &  &   & After  steps of the
algorithm, the set of all pixels  that have not yet been allocated to any region and
which are neighbors of at least one of the regions that have been created is:
 



 



&  

 


&  

with   the set of immediate neighbors of the pixel. If a pixel  in  touches


only one region & , a similarity measure   between the pixel and the region
is computed. This measure can, for instance, be the difference between the pixel
intensity value and the mean intensity value of the region. If  touches more than
one region, the similarity measure between the pixel and each region is computed
and the smallest one is retained. After   is computed, the pixel is put on a
sequentially sorted list (SSL). This is a linked list which is ordered according to
certain attributes; here  . The complete algorithm can be described as follows:

Region-based segmentation 99

Algorithm 2.7: Seeded region growing (SRG) algorithm for region segmentation
1. Label seed points using a manual or automatic method.
2. Put neighbors of seed point in the SSL.
3. Remove first pixel  from the top of the SSL.
4. Test the neighbors of .
If all neighbors of  that are already labeled have the same label, assign
this label to , update the statistics of the corresponding region, and add the
neighbors of  that are not yet labeled to the SSL according to their  value.
Else, label  with the boundary label.
5. If the SSL is not empty, go to 3, otherwise stop.

In this implementation, pixels that touch more than one region are labeled as
boundary pixels. This information can be used for display purposes or for contour
detection. It should be noted that this algorithm does not require parameter adjustments and that it can easily be extended to 3D. Despite its simplicity, it was found
to be robust and reliable for applications such as the extraction of the brain in 3D
MR image volumes [67].
2.5.2

Region splitting and merging

Region splitting methods take the opposite approach to region growing. These
methods start from the entire image. If it does not meet homogeneity criteria, it is
split into 4 sub-images (or 8 in 3D). This procedure is applied recursively on each
sub-image until each and every sub-image meets the uniformity criteria. When this
is done, the image can be represented as a quadtree which is a data structure in
which each parent node has four children (in 3D each parent node has eight children and the structure is called an octree). These structures can be used for efficient
storage and comparison between images [68]. The main drawback of the region
splitting approach is that the final image partition may contain adjacent regions
with identical properties. The simplest way to address this issue is to add a merging
step to the region splitting algorithm, leading to a split-and-merge approach [69].
One possibility is to first split an inhomogeneous region until homogeneous regions are created. When a homogeneous region is created, its neighboring regions
are checked and the newly created region is merged with an existing one if they
have identical properties. If the similarity criteria are met by more than one adjacent region, the new region is merged with the most similar one. This procedure

100 Image Segmentation


works well but does not produce a quadtree decomposition of the image. If such
a decomposition is important, the approach is to first split the image to produce a
quadtree decomposition. Merging is allowed between children of the same node.
This, however, presents the disadvantage that two adjacent regions with similar
characteristics, but which do not have the same parent, cannot be merged. This
problem can be addressed by adding one last step to the algorithm that permits the
merging of adjacent regions across branches of the trees, resulting in the following
algorithm.

Algorithm 2.8: Split-and-merge algorithm


1. Define a similarity criterion.
2. Split into four (or eight in 3D) subregions any region that does not meet the
uniformity criterion and merge children of the same parent node that meet
this criterion.
3. If no merging or splitting is possible, check adjacent regions across parent
nodes and merge those that meet the uniformity criterion.

An additional step can be included in this algorithm to eliminate regions below


a predefined size in the segmented image. This is done by merging these regions
with the most similar adjacent ones.
2.5.3

Connected component labeling

It is often of interest to assign a unique label to each region in a segmented


image, a process known as connected component labeling. A simple recursive
approach called grassfire [55] can be used. The image is scanned from one corner
until an edge pixel is reached. A fire is then set from that pixel that propagates to
all the pixels in the neighborhood (either in 3D or in 2D). When the fire reaches a
pixel, its value is set to zero and the procedure is repeated recursively until all the
pixels in the neighborhood are burnt, completing the detection of one connected
component in the image. The algorithm is repeated until the value of every pixel is
zero.
The so-called blob coloring algorithm [23, 57], which works in two passes, is
another method used for 2D connected component labeling. First, every foreground
pixel in the image is assigned a label using the following scheme for a 4-connected
neighborhood. If the left and up neighbor of a foreground pixel are zero, assign
a new label to the pixel. If either of its up or left neighbors already have a label
or if both of them have the same label, assign this label to the pixel. If the up

Classification 101
and left neighbors have different labels, note that these labels are equivalent and
assign one of these to the pixel. The same scheme can be used for 8-connected
neighborhoods if the two upper diagonal neighbors are also examined. After the
image has been scanned, a single label is assigned to equivalent classes. This can be
done efficiently by computing the transitive closure of the binary matrix capturing
class equivalence information [23]. The transitive closure of this matrix is itself a
5  5 matrix, where 5 is the number of classes. The elements  of this matrix
are one if the classes and  are equivalent and zero otherwise. The blob coloring
algorithm can be extended to 3D [70]; this is done by first labeling pixels in 2D,
and subsequently relabeling them by identifying equivalent classes along the third
dimension.
Labeling objects in an image is an important step for image interpretation and
the topic has been studied extensively. In addition to the basic methods described
here a large number of algorithms have been proposed over the years. These include, among others, approaches based on split-and-merge algorithms (see for instance [71]), parallel algorithms designed for multiprocessor machines [72] or algorithm based on the watershed transformation (see Chapter 4) [73].
2.6

Classification

As opposed to the thresholding techniques described in Section 2.3, which are


generally single-modality, classification algorithms are typically used when one
has access to multiple images of the same physical space or scene. This is for instance the case with MRI data, which are by nature multi-spectral (also referred to
as multi-modal, multi-channel, or multi-feature) because their contrast characteristics depend on the acquisition sequences and their parameters. As an example,
Figure 2.12 shows T -weighted, T -weighted and proton-density-weighted brain
MRI scans of the same subject.
Under the assumption that these images are spatially co-registered (image registration is treated as a special topic in Chapter 8), each multi-dimensional spatial
coordinate is now associated with a multi-dimensional intensity feature vector
 . Note that multiple image features need not necessarily be obtained by the
image acquisition process; it is also possible to derive a number of features (e.g.,
gradient/texture measures) from a single scan. Feature extraction as a general topic
is discussed in Chapter 5.
Labeling pixels or voxels in a multi-modal data set involves the identification
of regions, or clusters, in the feature space spanned by the different images. As
such, the threshold values obtained using a mono-modality technique (Section 2.3)
generalize to decision boundaries or decision surfaces in the multi-modal case. In
a general sense, these functions are also often referred to as discriminant functions. As an example, Figure 2.13 shows a 2-dimensional (T - and T -weighted)
histogram of an MRI scan. The same figure also shows the scatter plot of the
same image, where each pixel is shown as a dot at the location determined by the



102 Image Segmentation

Figure 2.12: Example MRI images. From top to bottom: T -weighted, T -weighted, and
proton-density-weighted MRI acquisitions of the same subject, in, from left to right, transverse, sagittal, and coronal cross-sections.

Classification 103
500
T1-weighted intensity

400
# pixels

300
200
100
0
500

400
300
200
100

500
T1-weighted int.

0 0

0
T2-weighted int.

200

400
600
T2-weighted intensity

Figure 2.13: Two-dimensional histogram (left) and scatter plot (right) of a multi-modal T /T -weighted MRI scan. A (linear) decision boundary that roughly discriminates between
brain parenchyma and CSF is also shown.

intensity of the pixel in the two modalities.


Although it is more difficult to visualize, this approach can be readily extended
to three or more modalities. Figure 2.14 shows the 3-feature classification of the
MRI data set shown in Figure 2.12.
Existing classification algorithms are either supervised or unsupervised. A supervised classifier requires input from the user, typically a set of class samples, for
the determination of the data structure from which discriminant functions are derived. This typically means that the user labels either data samples in the images or
clusters in the feature space, and that a classifier which partitions the feature space
is derived from these user-identified samples or clusters. Unsupervised classifiers
on the other hand rely on cluster analysis to derive the natural structure of the data
from the data itself.
The following sections first describe the most widely used classification and
clustering algorithms. For an exhaustive survey of all the techniques that have been
developed in the past, the reader is referred to a large body of literature dedicated to
their description (see for instance [7481]). Some of the more advanced methods
that have been proposed in the recent past are discussed in subsequent sections.
2.6.1

Basic classifiers and clustering algorithms

A number of common techniques are described in some detail in this section.


This includes unsupervised clustering algorithms such as  -means, ISODATA, and
fuzzy -means, and supervised classification techniques such as the parallelepiped,
minimum distance, Bayes or maximum likelihood, Parzen window, and  -nearestneighbor classifiers.

104 Image Segmentation

Figure 2.14: 3-feature, 4-class classification of the example image volumes shown in Figure 2.12. The four classes are background (black), white matter (white), gray matter (light
gray), and CSF (dark gray). The image set was classified using a supervised artificial neural
network classifier.

Parallelepiped
The parallelepiped classifier [80, 82] is essentially a multi-dimensional thresholding technique: the user specifies lower and upper bounds on each feature value,
for each class. This is usually done based on the class mean and variance in each
dimension, parameters which can be estimated from the sampling set provided by
the user. In this case, given the estimated means   and standard deviations  for
class and modality  , data vector is assigned to class "  , if



 .  



(2.15)

where  is a user-specified factor. The main disadvantages of the parallelepiped


classifier are that the hyper-boxes may overlap, resulting in classification ambiguity, and that patterns may not fall within the hyper-box of any class, as is illustrated in Figure 2.15. In this figure, the mean and standard deviation of each of the
three data clusters have been estimated from the labeled data points. The squares
show the areas around each cluster center for   .
Minimum distance
The minimum distance, also called minimum-distance-to-the-means, classifier [76] classifies a given feature vector into class "  if the Euclidean distance of
to the class mean vector  is minimum:

 .

     

(2.16)

The class mean vectors  are calculated from the set of data samples. This classifier is essentially one pass of the  -means algorithm (see below), where the initial

Classification 105

Figure 2.15: Parallelepiped classifier. The left panel shows all data points, including the
ones labeled by the user (indicated by a dot). The right panel shows the cluster centroids
and the  decision regions (based on the sampled points) are shown. Four points are left
unclassified.

cluster centroids are determined from a set of user-supplied samples. Figure 2.16
shows the minimum distance class assignments of the data points shown in Figure 2.15.
One should be aware that the use of an Euclidean distance measure favors
hyper-spherical clusters of approximately the same size, conditions that may not
be satisfied in practical situations. Without loss of generality, the Euclidean distance can be replaced with another distance measure, such as the Mahalanobis distance [76, 78] which favors a hyper-ellipsoid cluster geometry:
,




 

(2.17)

where is the covariance matrix of the multivariate normal density function, estimated from the data points.
$ -means and ISODATA

Common clustering methods are formulated around a criterion function or objective function that is used to express the quality of the clusters. A variety of criterion functions can be designed to fit a particular problem, but a popular function
is the sum-of-squared-error criterion #  [75], also denoted # [76] and  [78].
# 8   

 
 

: , 

(2.18)

106 Image Segmentation

Figure 2.16: Minimum distance classifier. Left: data points and the user-labeled samples;
right: membership assignments based on the calculated cluster centroids.

where and 0 are the number of clusters and data points, respectively, and the
parameters 8 ,  , : , and , are described below. The vector  is the cluster
centroid for cluster :  8 ,   , calculated from:


The matrix 8





 :

 :



 

(2.19)

 :

represents the partition of the data set


clusters; :  :   describes the membership
of  to cluster " . When a data point  is either a member (:  ), or not a
member (:  ) of one of the clusters " , the clustering is referred to as hard
or crisp. The generalization of this is called fuzzy clustering, which implies that a
data point can have partial membership in multiple clusters, resulting in continuous
membership functions.
For crisp clustering the membership values :  must satisfy:

 



  into

  
:



 

 

 :  
 : . 0
.
 

For example, matrix 8 could look like:





..
.

..
.

..
.

..

..
.

(2.20)

(2.21)

Classification 107
where the columns correspond to the 0 data vectors  , and each row corresponds
to one of the clusters : . , is a distance, or similarity, measure between  and
 . The generalization of 8 for fuzzy membership functions is described in the
section about the fuzzy -means algorithm below.
Using the Euclidean distance as a distance measure, we have:

and, because :

, 

 

 : , equations (2.18) and (2.22) result in:



# 8   
     
  

(2.22)

(2.23)

Here again, as for the minimum distance classifier, it should be noted that the Euclidean distance measure, which favors a hyper-spherical cluster geometry, can be
replaced by a Mahalanobis distance measure favoring hyper-ellipsoidal clusters.
A common way to minimize # is the  -means, also called hard -means [75],
clustering algorithm. This is an iterative approach, in which the cluster centroids
are updated at each iteration, starting from an initial estimate of the cluster centers:

Algorithm 2.9:  -means clustering


1. Select the number of clusters and the initial cluster centroids   .
2. Calculate matrix 8 by assigning each data point  to the closest cluster
centroid, as defined by the selected distance measure , (e.g., Equation 2.22).
3. Using Equation 2.19, recalculate the cluster centroids   from the membership assignment 8 .
4. If the cluster centroids (or assignment matrix 8 ) did not change since the
previous iteration, stop; otherwise, go back to step 2.

Weaknesses of this method are its sensitivity to the locations of the initial cluster
centroids, and to the choice of a distance measure. The initial cluster centroids
are important because the  -means algorithm is not guaranteed to find a global
minimum of # . Instead, it tends to find a local minimum that is close to the initial
cluster centroids.
ISODATA, essentially an adaptation of  -means, differs from the  -means algorithm in that the number of clusters is not fixed: clusters are split or merged
if certain conditions are met. These conditions are based on distance, number of
patterns in a cluster, and within-cluster and between-cluster variance measures.

108 Image Segmentation


Fuzzy -means
Fuzzy -means clustering (FCM) [74, 75] is a generalization of hard -means,
or  -means, clustering in which patterns are allowed to be associated with more
than one cluster. As mentioned earlier, the term fuzzy refers to the fact that the
membership of a pattern in the feature space is a continuous value, reflecting a certain degree of membership to each cluster, rather than an all-or-nothing assignment
of patterns to only one cluster. In this case, the membership values :  must satisfy
[cf. Equation (2.20)]:

 
:


:




.
 : . 0





(2.24)

and the criterion function is given by:


# 8   

 
 


:
 , 

(2.25)

where     is a weighting exponent, defining a family of criterion functions.


The larger  is, the fuzzier the membership assignments are.
The algorithm is essentially the same as described for the  -means clustering,
but differs in the calculation of the cluster centroids and the update rule for the
membership assignments; see Bezdek [74, 75] for details of the algorithm.
Bayes classifier
The Bayes classifier [76, 82], also referred to as maximum likelihood classifier,
is a statistical supervised classifier that assigns a pattern to a class "  if


"

" 

"  / 

"           

(2.26)

where  "  is the conditional probability density function of given class "  ,
and  "  is the a priori probability for class "  . In this context it is equivalent to
take the logarithm of  "  "  on both sides of Equation 2.26, resulting in the
decision functions
,

   

"
 

"  

(2.27)

The Bayes classifier is often used when it is reasonable to assume that the conditional probability density functions  "   are multivariate Gaussian. In this case
Equation 2.27 can be rewritten as:
,

    " 





 


 

with  and  the mean and covariance matrices for class , respectively. Since
the multivariate Gaussian probability density function is described completely by

Classification 109
its mean vector and covariance matrix, it is often referred to as a parametric density
function. All that is required to perform the classification is to estimate the mean
vector and covariance matrix for each class using the following equations:






 



(2.28)



    


(2.29)

where 0 is the number of training samples. A pattern is assigned to the class for
which the value of the decision function is the largest. If a priori statistical information is available on the mean and covariance matrices (it could, for instance,
be known that the mean vectors are themselves distributed normally with known
mean and covariance matrices), iterative procedures can be designed to refine these
estimates using the training data a procedure known as Bayesian learning [76].
Parzen window,  -nearest neighbor
Rather than estimating parametric density functions, the Parzen window and
 -nearest neighbor methods are used to estimate non-parametric density functions
from the data [7678]. Formulated as a classification technique, the Parzen window
method can be used to directly estimate the a posteriori probabilities  "    from
the given samples. In effect, is labeled with "  if a majority of samples in a
hyper-cubic volume ; centered around has been labeled "  . In this approach,
the volume ; is often chosen as a specific function of the total number of samples

0, such as ;   0 .
The  -nearest neighbor classifier is very similar to the Parzen window classifier,
only in this case the majority vote for the class labeling of is taken over the 
nearest neighbors (according to a predefined, usually Euclidean, distance measure)
of in the sampling set, rather than in a fixed volume of the feature space centered
around . See [76] for details.
The advantage of these non-parametric classifiers over parametric approaches
such as the Bayes classifier, is that no assumptions about the shape of the class
probability density functions are made. The disadvantage is that the error rate of
the classifier strongly depends on the number of data samples that are provided.
The more samples that are available, the lower the classification error rate will be.
2.6.2

Adaptive fuzzy -means with INU estimation

Pham and Prince [29] have developed an adaptive fuzzy -means (AFCM) algorithm that incorporates estimation of the INU field. In order to achieve this, the

110 Image Segmentation


authors adapt the basic FCM criterion function (cf. Equation 2.25)
# 8   

to
# 8  

 
 
 

 

:


   

 
 

<

<

Where

:


=

 
  
  

(2.30)



=

=
 

(2.31)

 is the multi-modal voxel intensity,   is a multi-modal cluster centroid,

 is the unknown INU field to be estimated, & is the number of spatial dimensions
in the images, and = is a known finite difference operator along the 3th image
dimension. The notation =
 refers to the convolution of
with difference
kernel =, which effectively acts as a derivative operation. Note that for simplicity

reasons the INU field is assumed to be scalar, i.e. the same for each image feature.
The last two terms in Equation 2.31 are the first and second order regularization
terms that force the bias field
to be spatially smooth and slowly varying. With
<  <  , i.e., without these regularization terms, one could always find a
bias field that results in #   . When < and < are set sufficiently large,

becomes constant and minimizing # is essentially equivalent to minimizing


the standard FCM criterion function. For the minimization of #  , a modified
version of the standard FCM algorithm is used (see [29] for details). Since the
derivation of the INU field estimate at every iteration is computationally expensive, the authors propose a truncated multigrid algorithm which greatly reduces the
computation time without loss of accuracy.
2.6.3

Decision trees

Decision tree classifiers, originating from the field of artificial intelligence, represent classification rules in the form of a symbolic decision tree, where each node
describes a test on a feature (attribute) value and the branches leaving that node
represent the possible outcomes of the test. The leaves at the bottom of the tree
correspond to the various classes. A decision tree is a supervised classifier; the tree
is constructed based on (induced from) a set of known data samples. The advantage of a decision tree classifier over other types of classifiers is that it results in a
description of the classification process as a set of rules, which are relatively easy
to interpret and may provide more insight in the data structure of the data set. The
description of decision trees given here is based on the ID3 algorithm [83, 84], a

Classification 111
member of the TDIDT (Top-Down Induction of Decision Trees) family. Although
a variety of other decision tree algorithms exist, the discussion of ID3 is illustrative
because many algorithms are based on it, and because it has been used in medical
imaging applications [85].
Initially, the tree starts as a single node, containing all training samples. Then,
the tree is iteratively grown by adding branches at every step, until all training
samples contained in each node belong to the same class and no more branches can
be grown. Central to this algorithm is an entropy measure, used to select the most
discriminating feature to partition the data samples at a given node. The entropy
measure used in the ID3 algorithm is defined as [83]:
Entropy 

 9 

(2.32)

with the weight of branch :


 

# samples in branch
# samples at the parent node

(2.33)

9 

(2.34)

and the entropy of branch :

   

where  is the probability of class  in the branch, estimated from the number
of class  training samples in the branch. If a node of the tree contains samples
from more than one class, that node must be expanded into a subtree. To do this,
all possible expansions (tests on a feature) of the node into branches are examined
and their entropy values calculated. Then, the test on the feature yielding the least
entropy in the resulting partitioning of the data samples is associated with that node
and used to create new branches.
2.6.4

Artificial neural networks

Artificial neural networks (ANNs) are mathematical abstractions of the nervous


system, in which nodes (neurons) are connected to each other by links (axons) with
associated weight values. A variety of neural network architectures have been used
for medical image processing applications (see for example [86101]). Among
these approaches, the most common ones are the feed-forward multi-layered perceptron, cascade correlation, Kohonen, and Hopfield neural networks.
Feed-forward ANN
Figure 2.17 shows the architecture of a feed-forward ANN. In a typical pattern
classification setup, the number of inputs is equal to the dimension of the feature
space and the number of outputs is equal to the number of object classes. The number of nodes in the intermediate (hidden) layer can be varied, but is related to the

112 Image Segmentation


(1)

(1)

w11

(0)

o1 = x1

o1

f()

(2)

w11

(1)

(2)

w21

w21

(1)

(0)
2

o = x2

w12

(l)

f()

o1

f()

o2

f()

on

(2)

(1)

w12

(1)

w22

o2

f()

(l)

(2)

w1,n-1

(1)

w1,n-1
(1)

(0)

on-1 = xn

on-1

f()

(0)

on = -1

f()

(l)

(1)

on = -1

dummy

Figure 2.17: Feed-forward ANN topology. Note: the number of nodes  differs between

layers, and between and . The subscript for  indicating the layer number is omitted

for the sake of clarity (modified from [102]).

complexity of the discriminant functions that the network implements. In a forward


pass through the network, the inputs feeding into each node are multiplied with the
corresponding connection weights and subsequently summed; the result (often denoted 0  for node  ) is then passed through a nonlinear activation function. For
an 2-layered network, this can be represented as:

    

   2

(2.35)

where the superscript in parentheses indicates the layer number, and the operator
  represents the activation function 
 , applied to all elements of the vector

  individually. Furthermore, we have:



 


  











..
.



 







 



..
.

 



(2.36)

Classification 113
and weight matrix

 









..
.











..
.



 

..







..
.



 

(2.37)

where the varying number of nodes in each layer is indicated by the subscript for 0.
The dummy nodes and the corresponding constant inputs (  ) serve as bias values;
their use originates in the number of coefficients (weights) that is needed to describe
the discriminant hyper-planes [102]. It should be noted that for each hidden layer,
 should be set to in the forward pass (2.35) through the network.
The objective is to determine the weight values in matrices  such that
an input vector that belongs to class "  results in a high value of output node

 . An algorithm known as the generalized delta rule is usually used to train the
network (i.e., adjust the weight values) by error back-propagation based on a set of
samples for each class. This algorithm implements a gradient descent technique,
in which weight values are changed in order to minimize the square error between
the output vector  and a target output vector , representing the known class
label of the training input pattern . During the training phase, the network cycles
through the training set until a stopping criterion is met. This stopping criterion
usually depends on the mean square error between the output and target vectors,
taken over the entire training set. There are different factors that will affect the
convergence of the network, such as when the weight values are changed, in which
order the training patterns are presented to the net, or whether a momentum factor
is used. These, and other practical considerations, are discussed in detail in [102].
Assuming that the weights are adjusted after each training pattern, the output
error vector is propagated back through the network, adjusting the weights in the
following manner:

  > 

   2

 

(2.38)

In this equation, > is a constant called the learning rate, and

      

   2

(2.39)

 . The operator indicates an element-by-element mulwith 


tiplication between two vectors. Most commonly, a unipolar sigmoidal activation
function is used:
  



(2.40)

114 Image Segmentation

   

The derivative of the activation function can then be written as:

 

  

(2.41)

In the following, the feed-forward ANN architecture trained with the generalized
delta rule will be denoted back-propagation ANN (BP-ANN). As mentioned already, the number of hidden nodes in the BP-ANN is related to the complexity of
discriminant functions that the network implements. In practical applications, this
number is determined in an empirical fashion.
A modification of the BP-ANN that exhibits a dynamical structure is called the
cascade-correlation ANN [103]. This network is initially trained without any hidden nodes while monitoring the error at the output nodes. If this error is sufficiently
low, the training is halted; otherwise a new node, fully connected to all other nodes
in the network, is added and the procedure is repeated.
Kohonen ANN
Clearly, both the back-propagation and cascade-correlation networks are supervised, i.e., the desired behavior is learned by example. Another type of network,
known as the Kohonen ANN, operates in an unsupervised fashion. This network
is a single-layer feed-forward network, trained with the so-called winner-take-all
learning rule, that essentially performs an unsupervised clustering in the feature
space. As with the BP-ANN, each output node corresponds to an output class. The
first step in the training phase is to normalize the weight vectors:


 

 

(2.42)

where is the number of classes (i.e., the number of nodes in the output layer), and
the weight vectors are the rows of the weight matrix :








..
.



 

(2.43)

Now the winner-take-all learning rule dictates that only the weight vector is
updated that most closely approximates input vector . That means that only the
weight vector is updated for which

  



 

(2.44)

When the neuron  with the highest output is identified, its weight vector  is
modified (in the direction of the gradient in weight space) using:

where > is again a learning constant.

>

  

(2.45)

Classification 115
Hopfield ANN
Another type of ANN, the Hopfield ANN, has a single-layer feedback architecture as illustrated in Figure 2.18. This type of network is a dynamical system which

o1
w21

i1
w12

f()

o1

i2

f()

o2

in

f()

on

wn1
o2
wn2

w1n

on

-1
Figure 2.18: Hopfield ANN topology (modified from [102]).

moves through a sequence of states to end up in a so-called attractor of the state


space. The system minimizes an energy function and the attractors correspond to
local minima of that function. A Hopfield ANN architecture can therefore be applied to optimization problems. Another frequent use of this type of ANN is that of
an associative or content-addressable memory, in which case the network operates
on binary data. In the area of medical image segmentation, the Hopfield ANN has
primarily been used as an optimization technique, which is also how it is described
here. See for instance Amartur et al. [87], who formulate a cost function which expresses pixel intensity similarity. This cost function is minimized using a Hopfield
ANN, resulting in the unsupervised classification of dual-echo MR images.
In the following, the superscript  denotes the iteration number of the recursive
update algorithm of the Hopfield net. In this case, we have [102]:

   
 

 for  = 1, 2, ... 

(2.46)

116 Image Segmentation


where  is the weight vector associated with the constant (bias) input, and  is a
vector of external inputs. The computational energy function which is minimized
by the network has the form
9

 




   
   

(2.47)

This function should be Liapunov, i.e., it should be positive definite, continuous,


and the update algorithm (2.46) should result in non-positive changes of 9 .
The application of this type of optimization network now consists of the design
of an appropriate energy function, which is to be minimized by the network. The
weight (or connectivity) matrix is typically symmetrical, and determined by the
Hessian of the energy function 9  [102]:


2.6.5

 

(2.48)

Contextual classifiers

The classification techniques described so far typically label each pixel or voxel
individually, without taking the spatial relationships between neighbors into account. As a result, the classified images are often noisy. Classification noise can
be reduced by using spatial information in the approach. This can be done either
retrospectively, for instance by applying morphological filtering operations to the
classified image, or by incorporating it in the design of the classifier. This type
of classifier is referred to as a contextual classifier. These are usually formulated
as an optimization procedure, in which a set of constraints must be satisfied or a
cost function must be minimized. Relaxation labeling and stochastic relaxation are
examples of this type of approach.
Relaxation labeling (RL) assigns labels to objects under a set of constraints,
typically describing the interaction between neighboring objects. The term relaxation labeling was first introduced by Rosenfeld, Hummel and Zucker [104].
Later, the theory was further developed by Hummel and Zucker [105], who gave a
formal description of RL and proposed an algorithm to solve the RL problem. The
following summary of RL is based on their theory.
A labeling problem is based on:
1. A set of objects, 
2.




 


 0;
A set of labels for each object ,     


 <


 ;
 

3. Neighbor relations between objects;


4. Labeling constraints for neighboring objects (or compatibility functions),
3 < < .

Classification 117
A solution to the labeling problem is a label assignment for all objects that is
consistent with the given labeling constraints. This notion of consistency is important in RL, and it is described by the compatibility functions 3  < < , capturing
the relative support for label < at object given that object  has a label <  . Generally, the magnitude of 3 < <  is proportional to the strength of the constraint,
whereas the sign indicates locally consistent (3  / ) or inconsistent (3 . )
labelings. The compatibility 3  is zero when objects and  are not neighbors, or
when there is no interaction between labels. Labels are also assigned to objects
independently of their neighbors using weights satisfying the following properties:

   

 <

  <  




(2.49)

Note that these properties are essentially identical to the fuzzy membership functions used for the FCM classifier (Equation 2.24). Using these fuzzy label assignments, the complete labeling of object can be represented by the -dimensional
vector    


    . Collecting these assignment vectors for all
objects, the complete labeling of all objects can be described by the 0-dimensional
vector  formed by the concatenation  


  . This vector is equivalent to the
assignment matrix 8 described for the FCM classifier.
 is the space of weighted labeling assignments, containing all    under
the constraints (2.49). The support 1  < for label < at object by the assignment
 is now defined as
1 <  1 <  

 
 

3 < <  <  

(2.50)

The support function 1 < is essentially a weighted sum of the compatibilities


of the labels < at neighboring objects with the label < at object . A consistent
labeling  is defined as the weighted labeling assignment   for which




 <1 < 




? <1 < 

 !


From definitions (2.50) and (2.51), it follows that a labeling 


and only if

 

3 < <  < ? <

 <

(2.51)

 is consistent if

  !


(2.52)

Relaxation labeling algorithms are designed to convert a given, initial labeling into
a consistent one by solving the variational inequality (2.52). There are different
ways to solve such an inequality. Together with their theory summarized here,
Hummel and Zucker [105] present an algorithm to solve (2.52), a method which is

118 Image Segmentation


related to a gradient ascent technique. In this paper, it is also shown that the earlier
algorithms proposed by Rosenfeld, Hummel, and Zucker [104] and the probabilistic approach by Peleg [106] are approximations of the proposed RL method.
Stochastic relaxation [107] is a related approach in which the analogy between
images and physical systems is explored. Stochastic relaxation produces a maximum a posteriori (MAP) estimate of the original scene, given the observed image. The contextual information is described in terms of the conditional probabilities of a Markov random field (MRF) image model, and the equivalence between the MRF and Gibbs distributions is exploited in the optimization process.
In the stochastic framework, a scene and the observed image are seen as realizations      


   of random vector @  @  @ 


 @  and
     


   of random vector A  A  A 


 A , respectively. A
may have several components, i.e., it may have a multidimensional character. Two
assumptions are made [108]:
1. Given a scene , the random variables A are independent and have known
conditional density functions     . This implies that the conditional density of  , given , is

2   




    

(2.53)

The conditional density  can be determined from knowledge of the imaging


system and/or from training data.
2. The true scene 
 , which is to be determined, is a realization of a locally
dependent MRF  . A locally dependent MRF must satisfy:



  /   
    3      3

where  is a neighborhood of pixel (


can, for instance, be found in [107, 109].

   

  ).

(2.54)

Further details of MRFs

 of 
 from the observations 
Bayes theory dictates that the MAP estimation 
maximizes

  2   

  

(2.55)

This is a global maximization, which is computationally extremely expensive. Simulated annealing [107, 110], which models the behavior of a physical system when
cooling down, has been used to find good local solutions; another technique used
for this purpose is a modification of simulated annealing called mean field annealing [111].

Discussion and Conclusion 119


Besag [108] has proposed an algorithm referred to as iterated conditional modes
(ICM), which considerably reduces the computational burden associated with simulated annealing. In this algorithm, the prior information used is not only the obser. In this case, an alternative optimization
vations  , but also the current estimation 
can be formulated, in which

   
  

    


(2.56)

is maximized. This maximization, when applied to each pixel in turn, constitutes


one iteration of the ICM algorithm. If the focus is to maximize the expected proportion of correctly classified pixels rather than to determine the MAP estimate of the
scene, the maximum marginal posterior (MMP) method can be used. Detailed descriptions of, and comparisons between, the simulated annealing, ICM, and MMP
algorithms can be found in Dubes et al. [112].
Contextual classifiers in general, and the ICM algorithm in particular, have been
used in a number of medical image segmentation applications; see for instance Choi
et al. [113], Kato et al. [114] , Rajapakse et al. [115], Wu et al. [116], and Yan and
Karp [117]. The latter paper describes an approach that combines ICM for the
labeling of tissue types with a cubic B-spline approximation of the INU field.
2.7

Discussion and Conclusion

As was said in the introduction to this chapter, image segmentation is a broad


and active field of research not only in the medical imaging community but also
in a variety of other fields such as computer vision or satellite imagery. The goal
of this chapter is to provide the reader with a sense for the various methods and
approaches that have been proposed over the years to tackle a broad spectrum of
problems. The choice of a particular method will be dictated not only by the characteristics of the problem to be solved but also by the general problem strategy
being developed. Indeed, in most real world applications, image segmentation and
classification algorithms are only a component of a larger system. Chapter 5 describes how features and characteristics can be computed from the image primitives
produced by segmentation algorithms. These image primitives can then be used in
conjunction with a priori information about the type and characteristics of objects
to be detected in the image. Techniques and systems developed for this purpose
are described in Chapter 7. Cooperation between these subsystems is important
if one wishes to develop robust interpretation systems. As a rule, the more relevant a priori information can be used in the segmentation process, the better the
system. This a priori information may, for instance, involve knowledge of the approximate gray-level value of a tissue class in which case it can be used to initialize
a threshold-based method, it can involve knowledge about the spatial location of
tissues and organs which would permit the automatic selection of training points
for a classifier, or it can involve knowledge about the shape of an object which can
be used in the design of spatial transformations and constraints for boundary de-

120 Image Segmentation


tection algorithms. This information can be used to improve the specificity and the
sensitivity of the low-level segmentation algorithms and facilitate the task of the
interpretation layer.
2.8

Acknowledgements

Parts of this chapter have been adapted from an article published in Critical
Reviews in Biomedical Engineering 22(5/6):401-465 (1994). The authors thank
Begell House for their permission. The authors also thank Milan Sonka for his
bibliography file, which is a very precious resource.
2.9

References

[1] K. S. Fu and J. K. Mui, A survey on image segmentation, Pattern Recognition,


vol. 13, pp. 316, 1981.
[2] P. A. Bottomley and E. R. Andrew, RF magnetic field penetration, phase shift and
power dissipation in biological tissue: implications for NMR imaging, Physics in
Medicine and Biology, vol. 23, pp. 63043, Jul 1978.
[3] E. R. McVeigh, M. J. Bronskill, and R. M. Henkelman, Phase and sensitivity of
receiver coils in magnetic resonance imaging, Medical Physics, vol. 13, pp. 806
814, Nov./Dec. 1986.
[4] A. Simmons, P. S. Tofts, G. J. Barker, and S. R. Arridge, Sources of intensity
nonuniformity in spin echo images, Magnetic Resonance in Medicine, vol. 32,
pp. 121128, 1994.
[5] J. G. Sled and G. B. Pike, Understanding intensity non-uniformity in MRI, in 1st
International Conference on Medical Computing and Computer-Assisted Intervention (W. M. Wells, A. Colchester, and S. Delp, eds.), no. 1496 in Lecture Notes in
Computer Science, pp. 614622, Springer, 1998.
[6] J. G. Sled and G. B. Pike, Standing-wave and RF penetration artifacts caused by
elliptic geometry: an electrodynamic analysis of MRI, IEEE Transactions on Medical Imaging, vol. 17, pp. 653662, Aug. 1998.
[7] P. A. Narayana, W. W. Brey, M. V. Kulkarni, and C. L. Sievenpiper, Compensation for surface coil sensitivity variation in magnetic resonance imaging, Magnetic
Resonance Imaging, vol. 6, no. 3, pp. 271274, 1988.
[8] R. Stollberger and P. Wach, Imaging of the active B1 field in vivo, Magnetic Resonance in Medicine, vol. 35, pp. 246251, 1996.
[9] K. R. Thulborn, F. E. Boada, J. D. Christensen, F. R. Haung-Hellinger, T. G. Reese,
and J. M. Kosewski, B1 correction maps and apparent water density maps as
tools for quantitative functional MRI, in Proc. Society of Magnetic Resonance in
Medicine, vol. 1, p. 347, 1993.
[10] L. Axel, J. Costantini, and J. Listerud, Intensity correction in surface-coil MR imaging, American Journal of Roentgenology, vol. 148, pp. 418420, Feb. 1987.
[11] M. Tincher, C. R. Meyer, R. Gupta, and D. M. Williams, Polynomial modeling and
reduction of RF body coil spatial inhomogeneity in MRI, IEEE Transactions on
Medical Imaging, vol. 12, pp. 361365, June 1993.

References 121
[12] D. A. G. Wicks, G. J. Barker, and P. S. Tofts, Correction of intensity nonuniformity
in MR images of any orientation, Magnetic Resonance Imaging, vol. 11, no. 2,
pp. 183196, 1993.
[13] C. BrechBuhler, G. Gerig, and G. Szekely, Compensation of spatial inhomogeneity
in MRI based on a parametric bias estimate, in Proceedings of the Fourth International Conference on Visualization in Biomedical Computing (VBC) (K.-H. Hohne
and R. Kikinis, eds.), (Hamburg, Germany), pp. 141146, Springer, 1996.
[14] W. W. Brey and P. A. Narayana, Correction for intensity falloff in surface coil
magnetic resonance imaging, Medical Physics, vol. 15, pp. 241245, Mar./Apr.
1988.
[15] B. M. Dawant, A. P. Zijdenbos, and R. A. Margolin, Correction of intensity variations in MR images for computer-aided tissue classification, IEEE Transactions on
Medical Imaging, vol. 12, pp. 770781, Dec. 1993.
[16] J. Haselgrove and M. Prammer, An algorithm for compensation of surface-coil images for sensitivity of the surface coil, Magnetic Resonance Imaging, vol. 4, no. 6,
pp. 469472, 1986.
[17] K. O. Lim and A. Pfefferbaum, Segmentation of MR brain images into cerebrospinal fluid spaces, white and gray matter, Journal of Computer Assisted Tomography, vol. 13, pp. 588593, July/Aug. 1989.
[18] C. R. Meyer, P. H. Bland, and J. Pipe, Retrospective correction of intensity inhomogeneities in MRI, IEEE Transactions on Medical Imaging, vol. 14, pp. 3641,
Mar. 1995.
[19] P. A. Narayana and A. Borthakur, Effect of radio frequency inhomogeneity correction on the reproducibility of intra-cranial volumes using MR image data, Magnetic
Resonance in Medicine, vol. 33, pp. 396400, Mar. 1995.
[20] J. G. Sled, A. P. Zijdenbos, and A. C. Evans, A nonparametric method for automatic
correction of intensity nonuniformity in MRI data, IEEE Transactions on Medical
Imaging, vol. 17, Feb. 1998.
[21] W. M. Wells III, W. E. L. Grimson, R. Kikinis, and F. A. Jolesz, Adaptive segmentation of MRI data, IEEE Transactions on Medical Imaging, vol. 15, pp. 429442,
Aug. 1996.
[22] A. P. Zijdenbos, B. M. Dawant, and R. A. Margolin, Inter- and intra-slice intensity correction in MRI, in Information Processing in Medical Imaging (IPMI)
(Y. Bizais, C. Barillot, and R. D. Paola, eds.), (France), pp. 349350, Kluwer, June
1995.
[23] R. C. Gonzalez and R. E. Woods, Digital Image Processing.
Addison-Wesley, 1993.

Reading, MA:

[24] S.-P. Liou, A. H. Chiu, and R. C. Jain, A parallel technique for signal-level perceptual organization, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 13, pp. 317325, Apr. 1991.
[25] S.-P. Liou and R. C. Jain, An approach to three-dimensional image segmentation,
Computer Vision, Graphics, and Image Processing, vol. 53, no. 3, pp. 237252,
1991.

122 Image Segmentation


[26] C. R. Reeves, Modern Heuristic Techniques for Combinatorial Problems. Oxford:
Blackwell Scientific Press, 1993.
[27] R. Guillemaud and M. Brady, Estimating the bias field of MR images, IEEE Transactions on Medical Imaging, vol. 16, pp. 238251, June 1997.
[28] K. V. Leemput, F. Maes, D. Vandermeulen, and P. Suetens, Automated model-based
bias field correction of MR images of the brain, IEEE Transactions on Medical
Imaging, 1999.
[29] D. L. Pham and J. L. Prince, An adaptive fuzzy segmentation algorithm for threedimensional magnetic resonance images, in Information Processing in Medical
amal, and A. Todd-Pokropek, eds.), pp. 140153,
Imaging (IPMI) (A. Kuba, M. S
Springer, June/July 1999.
[30] G. B. Aboutanos, J. Nikanne, N. Watkins, and B. M. Dawant, Model creation
and deformation for the automatic segmentation of the brain in MR images, IEEE
Transactions on Biomedical Engineering, vol. 46, no. 1, pp. 13461356, 1999.
[31] R. N. Nagel and A. Rosenfeld, Steps toward handwritten signature verification,
in Proceedings of the first International Joint Conference on Pattern Recognition,
pp. 5566, 1979.
[32] F. M. Wahl, Digital image signal processing. Artech, 1987.
[33] N. Otsu, A threshold selection method from graylevel histograms, IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 6266, 1979.
[34] S. S. Reddi, S. F. Rudin, and H. R. Keshavan, An optimal multiple threshold scheme
for image segmentation, IEEE Transactions on Systems, Man, and Cybernetics,
vol. 14, pp. 661665, 1984.
[35] C. A. Glasbey, An analysis of histogram-based thresholding algorithms, CVGIP
Graphics Models and Image Processing, vol. 55, pp. 532537, 1993.
[36] P. K. Sahoo, S. Soltani, A. K. C. Wong, and Y. C. Chen, Survey of thresholding
techniques, Computer Vision, Graphics, and Image Processing, vol. 41, no. 2,
pp. 233260, 1988.
[37] D. M. Titterington, A. F. M. Smith, and U. E. Markov, Statistical Analysis of Finite
Mixture Distributions. New York: Wiley, 1985.
[38] D. W. Marquardt, An algorithm for least squares estimation of non-linear parameters, Journal of the Society for Industrial and Applied Mathematics, vol. 11,
pp. 431444, 1963.
[39] P. Santago and H. D. Gage, Quantification of MR brain images by mixture density and partial volume modeling, IEEE Transactions on Medical Imaging, vol. 12,
pp. 566574, 1993.
[40] A. Dempster, N. Laird, and D. Rubin, Maximum likelihood from incomplete data
via the EM algorithm, Journal of the Royal Statistical Society, Series B, vol. 39,
no. 1, pp. 138, 1977.
[41] A. Martelli, Edge detection using heuristic search methods, Computer Graphics
and Image Processing, vol. 1, pp. 169182, 1972.

References 123
[42] N. J. Nilsson, Principles of Artificial Intelligence. Berlin: Springer Verlag, 1982.
[43] P. H. Winston, Artificial Intelligence. Reading, MA: Addison-Wesley, 3rd ed., 1992.
[44] P. Hart, N. Nilsson, and B. Raphael, A formal basis for the heuristic determination
of minimum-cost paths, IEEE Transactions on Systems, Man, and Cybernetics,
vol. SMC-4, pp. 100107, 1968.
[45] R. Bellmann, Dynamic Programming. Princeton, NJ: Princeton University Press,
1957.
[46] J. J. Gerbrands, Segmentation of noisy images. Ph.D. thesis, ETN-89-95461, Technische University, Delft, The Netherlands, 1988.
[47] M. Sonka, M. D. Winniford, and S. M. Collins, Robust simultaneous detection
of coronary borders in complex images, IEEE Transactions on Medical Imaging,
vol. 14, no. 1, pp. 151161, 1995.
[48] D. Geiger, A. Gupta, L. A. Costa, and J. Vlontzos, Dynamic programming for detecting, tracking, and matching deformable contours, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 17, no. 3, pp. 294302, 1995.
[49] A. Falcao, J. Udupa, S. Samarasekera, S. Sharma, B. Hirsch, and R. de a Lotufo,
User-steered image segmentation paradigms: live wire and live lane, Graphical
Models and Image Processing, vol. 60, no. 4, pp. 233260, 1998.
[50] E. Mortensen, B. Morse, W. Barrett, and J. Udupa, Adaptive boundary detection
using live-wire two-dimensional dynamic programming, in Computers in Cardiology, (Los Alamitos, CA), pp. 635638, IEEE Computer Society Press, 1992.
[51] W. A. Barret and E. N. Mortensen, Interactive live-wire boundary detection, Medical Image Analysis, vol. 1, no. 4, pp. 331341, 1996.
[52] D. R. Thedens, D. J. Skorton, and S. R. Fleagle, Methods of graph searching for
border detection in image sequences with application to cardiac magnetic resonance
imaging, IEEE Transactions on Medical Imaging, vol. 14, pp. 4255, 1995.
[53] R. J. Frank, Optimal surface detection using multi-dimensional graph search: Applications to intravascular ultrasound, Masters thesis, University of Iowa, 1996.
[54] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision. New York: PWS publishing, 1998.
[55] I. Pitas, Digital Image Processing Algorithms. Hemel Hempstead, UK: PrenticeHall, 1993.
[56] P. V. C. Hough, A Method and Means for Recognizing Complex Patterns. US Patent
3,069,654, 1962.
[57] D. H. Ballard and C. M. Brown, Computer Vision. Englewood Cliffs, NJ: PrenticeHall, 1982.
[58] D. H. Ballard, Generalizing the Hough transform to detect arbitrary shapes, Pattern Recognition, vol. 13, pp. 111122, 1981.
[59] M. E. Brummer, Hough transform detection of the longitudinal fissure in tomographic head images, IEEE Transactions on Medical Imaging, vol. 10, no. 1,
pp. 7481, 1991.

124 Image Segmentation


[60] F. Zana and J. Klein, A multimodal registration algorithm of eye fundus images using vessels detection and Hough transform, IEEE Transactions on Medical Imaging, vol. 18, no. 5, pp. 419428, 1999.
[61] S. Malassiotis and M. Strintzis, Tracking the left ventricle in echocardiographic images by learning heart dynamics, IEEE Transactions on Medical Imaging, vol. 18,
no. 3, pp. 282290, 1999.
[62] N. Karssemeijer, Automated classification of parenchymal patterns in mammograms, Physics in Medicine and Biology, vol. 43, no. 2, pp. 365378, 1998.
[63] D. Palti-Wasserman, A. Brukstein, and R. Beyar, Identifying and tracking a guide
wire in the coronary arteries during angioplasty from X-ray images, IEEE Transactions on Biomedical Engineering, vol. 44, no. 2, pp. 152164, 1997.
[64] H. Kalviainen, P. Hirvonen, L. Xu, and E. Oja, Probabilistic and non-probabilistic
Hough transforms: Overview and comparisons, Image and Vision Computing,
vol. 13, pp. 239252, 1995.
[65] A. Kassim, T. Tan, and K. Tan, A comparative study of efficient generalized hough
transforms techniques, Image and Vision Computing, vol. 17, pp. 737748, 1999.
[66] R. Adams and L. Bischof, Seeded region growing, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 16, pp. 641647, 1994.
[67] R. K. Justice, E. M. Stokeley, J. S. Strobel, R. E. Ideker, and W. M. Smith, Medical
image segmentation using 3-D seeded region growing, in Proceedings of SPIE:
Image Processing, vol. 3034, (Newport Beach), pp. 900910, Feb. 1997.
[68] H. Sammet, The design and analysis of spatial data structures. New York: AddisonWesley, 1990.
[69] S. L. Horowitz and T. Pavlidis, Picture segmentation by a directed split-and-merge
procedure, in Proceedings of the 2nd International Joint Conference on Pattern
Recognition, (Copenhagen, Denmark), pp. 424433, 1974.
[70] R. Lumia, A new three-dimensional connected components algorithm, Computer
Vision, Graphics, and Image Processing, vol. 23, pp. 207217, August 1983.
[71] D. Nassimi and S. Sahni, Finding connected components and connected ones on
a mesh connected parallel computer, SIAM Journal of Computation, vol. 9, no. 4,
pp. 744757, 1980.
[72] M. Manohar and H. Ramapriyan, Connected component labeling of binary images
on a mesh connected massively parallel processor, Computer Vision, Graphics, and
Image Processing, vol. 45, pp. 133149, February 1989.
[73] A. Moga and M. Gabbouj, Parallel image component labeling with watershed
transformation, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 19, pp. 441450, May 1997.
[74] J. C. Bezdek, Some non-standard clustering algorithms, in Developments in Numerical Ecology (P. Legendre and L. Legendre, eds.), pp. 225287, Berlin, Germany: Springer-Verlag, 1987.

References 125
[75] L. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithm. New
York: Plenum Press, 1981.
[76] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis. New York:
John Wiley and Sons, 1973.
[77] K. Fukunaga, Introduction to statistical pattern recognition. New York: Academic
Press, 1972.
[78] A. K. Jain and R. C. Dubes, Algorithms for Clustering Data. Englewood Cliffs,
New Jersey: Prentice Hall, Inc., 1988.
[79] M. James, Classification Algorithms. New York: John Wiley, 1985.
[80] I. L. Thomas, V. M. Benning, and N. P. Ching, Classification of Remotely Sensed
Images. Bristol: Adam Hilger, 1987.
[81] T. Y. Young and K.-S. Fu, eds., Handbook of Pattern Recognition and Image Processing. Academic Press, 1986.
[82] J. A. Richards, Remote Sensing Digital Image Analysis. Berlin: Springer-Verlag,
1986.
[83] A. R. Mirzai, ed., Artificial Intelligence: Concepts and applications in engineering.
Cambridge, MA: MIT Press, 1990.
[84] J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, pp. 81106,
1986.
[85] M. Kamber, R. Shinghal, D. L. Collins, G. S. Francis, and A. C. Evans, Modelbased 3-D segmentation of multiple sclerosis lesions in magnetic resonance brain
images, IEEE Transactions in Medical Imaging, vol. 14, pp. 442453, Sept. 1995.
[86] S. Aleynikov and E. Micheli-Tzanakou, Classification of retinal damage by a neural
network based system, Journal of Medical Systems, vol. 22, pp. 12936, Jun 1998.
[87] S. C. Amartur, D. Piraino, and Y. Takefuji, Optimization neural networks for the
segmentation of magnetic resonance images, IEEE Transactions on Medical Imaging, vol. 11, pp. 215220, June 1992.
[88] M. Binder, H. Kittler, A. Seeber, A. Steiner, H. Pehamberger, and K. Wolff, Epiluminescence microscopy-based classification of pigmented skin lesions using computerized image analysis and an artificial neural network, Melanoma Research,
vol. 8, pp. 2616, Jun 1998.
[89] S. Cagnoni, G. Coppini, M. Rucci, D. Caramella, and G. Valli, Neural network segmentation of magnetic resonance spin echo images of the brain, Journal of Biomedical Engineering, vol. 15, pp. 35562, Sep 1993.
[90] M. S. Gebbinck, J. T. Verhoeven, J. M. Thijssen, and T. E. Schouten, Application of neural networks for the classification of diffuse liver disease by quantitative
echography, Ultrasonic Imaging, vol. 15, pp. 20517, Jul 1993.
[91] L. O. Hall, A. M. Bensaid, L. P. Clarke, R. P. Velthuizen, M. S. Silbiger, and J. C.
Bezdek, A comparison of neural network and fuzzy clustering techniques in segmenting magnetic resonance images of the brain, IEEE Transactions on Neural
Networks, vol. 3, pp. 672682, Sept. 1992.

126 Image Segmentation


[92] J. S. Lin, K. S. Cheng, and C. W. Mao, Segmentation of multispectral magnetic
resonance image using penalized fuzzy competitive learning network, Computers
& Biomedical Research, vol. 29, pp. 31426, Aug 1996.

[93] M. Ozkan,
B. M. Dawant, and R. J. Maciunas, Neural-network-based segmentation of multi-modal medical images: A comparative and prospective study, IEEE
Transactions on Medical Imaging, vol. 12, pp. 534544, Sept. 1993.
[94] D. Pantazopoulos, P. Karakitsos, A. Iokim-Liossi, A. Pouliakis, E. Botsoli-Stergiou,
and C. Dimopoulos, Back propagation neural network in the discrimination of
benign from malignant lower urinary tract lesions, Journal of Urology, vol. 159,
pp. 161923, May 1998.
[95] W. E. Polakowski, D. A. Cournoyer, S. K. Rogers, M. P. DeSimio, D. W. Ruck, J. W.
Hoffmeister, and R. A. Raines, Computer-aided breast cancer detection and diagnosis of masses using difference of gaussians and derivative-based feature saliency,
IEEE Transactions on Medical Imaging, vol. 16, pp. 8119, Dec 1997.
[96] W. E. Reddick, J. O. Glass, E. N. Cook, T. D. Elkin, and R. J. Deaton, Automated
segmentation and classification of multispectral magnetic resonance images of brain
using artificial neural networks, IEEE Transactions on Medical Imaging, vol. 16,
pp. 9118, Dec 1997.
[97] H. Sujana, S. Swarnamani, and S. Suresh, Application of artificial neural networks
for the classification of liver lesions by image texture parameters, Ultrasound in
Medicine & Biology, vol. 22, no. 9, pp. 117781, 1996.
[98] G. D. Tourassi and C. E. Floyd Jr., Lesion size quantification in spect using an artificial neural network classification approach, Computers & Biomedical Research,
vol. 28, pp. 25770, Jun 1995.
[99] O. Tsujii, M. T. Freedman, and S. K. Mun, Automated segmentation of anatomic
regions in chest radiographs using an adaptive-sized hybrid neural network, Medical Physics, vol. 25, pp. 9981007, Jun 1998.
[100] A. J. Worth, S. Lehar, and D. N. Kennedy, A recurrent cooperative/competitive
field for segmentation of magnetic resonance brain images, IEEE Transactions on
Knowledge and Data Engineering, vol. 4, pp. 156161, Apr. 1992.
[101] A. P. Zijdenbos, B. M. Dawant, R. A. Margolin, and A. C. Palmer, Morphometric analysis of white matter lesions in MR images: Method and validation, IEEE
Transactions on Medical Imaging, vol. 13, pp. 716724, Dec. 1994.
[102] J. M. Zurada, Introduction to Artificial Neural Systems. St. Paul, MN: West Publishing Company, 1992.
[103] S. E. Fahlman and C. Lebeire, The cascade-correlation learning architecture, tech.
rep., School of Computer Science, Carnegie Mellon University, Feb. 1990.
[104] A. Rosenfeld, R. A. Hummel, and S. W. Zucker, Scene labeling by relaxation operations, IEEE Transactions on Systems, Man, and Cybernetics, vol. 6, pp. 420433,
June 1976.
[105] R. A. Hummel and S. W. Zucker, On the foundation of relaxation labeling proceses, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, no. 3,
pp. 259288, 1983.

References 127
[106] S. Peleg, A new probabilistic relaxation scheme, IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 2, pp. 362369, July 1980.
[107] S. Geman and D. Geman, Stochastic relaxation, Gibbs distributions, and the
Bayesian restoration of images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 6, no. 6, pp. 721741, 1984.
[108] J. Besag, On the statistical analysis of dirty pictures, Journal of the Royal Statistical Society, Series B, vol. 48, no. 3, pp. 259302, 1986.
[109] J. Besag, Spatial interaction and the statistical analysis of lattice systems, Journal
of the Royal Statistical Society, Series B, vol. 36, no. 2, pp. 192236, 1974.
[110] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi, Optimization by simulated annealing, Science, vol. 220, pp. 671680, 1983.
[111] G. Bilbro, R. Mann, T. K. Miller, W. E. Snyder, D. E. Van den Bout, and M. White,
Optimization by mean field annealing, in Advances in Neural Information Processing Systems (D. S. Touretzky, ed.), vol. I, (San Mateo), Morgan-Kaufmann,
1989.
[112] R. C. Dubes, A. K. Jain, S. G. Nadabar, and C. C. Chen, MRF model-based algorithms for image segmentation, in Proceedings of the 10th International Conference
on Pattern Recognition, vol. 1, pp. 808814, 1990.
[113] H. S. Choi, D. R. Haynor, and Y. Kim, Partial volume tissue classification of multichannel magnetic resonance images - a mixel model, IEEE Transactions on Medical Imaging, vol. 10, pp. 395407, Sept. 1991.
[114] Z. Kato, J. Zerubia, and M. Berthod, Unsupervised parallel image classification
using markovian models, Pattern Recognition, vol. 32, pp. 591604, Apr. 1999.
[115] J. C. Rajapakse, J. N. Giedd, and J. L. Rapoport, Statistical approach to segmentation of single-channel cerebral MR images, IEEE Transactions on Medical Imaging, vol. 16, pp. 176186, Apr. 1997.
[116] Z. Wu, H. W. Chung, and F. W. Wehrli, A bayesian approach to subvoxel tissue
classification in nmr microscopic images of trabecular bone, Magnetic Resonance
in Medicine, vol. 31, pp. 3028, Mar 1994.
[117] M. X. H. Yan and J. S. Karp, An adaptive bayesian approach to three-dimensional
MR brain segmentation, in Information Processing in Medical Imaging (IPMI)
(Y. Bizais, C. Barillot, and R. D. Paola, eds.), pp. 201213, Kluwer, June 1995.

CHAPTER 3
Image Segmentation Using Deformable Models
Chenyang Xu
The Johns Hopkins University
Dzung L. Pham
National Institute of Aging
Jerry L. Prince
The Johns Hopkins University

Contents
3.1
3.2

3.3

3.4

3.5

Introduction
Parametric deformable models
3.2.1 Energy minimizing formulation
3.2.2 Dynamic force formulation
3.2.3 External forces
3.2.4 Numerical implementation
3.2.5 Discussion
Geometric deformable models
3.3.1 Curve evolution theory
3.3.2 Level set method
3.3.3 Speed functions
3.3.4 Relationship to parametric deformable models
3.3.5 Numerical implementation
3.3.6 Discussion
Extensions of deformable models
3.4.1 Deformable Fourier models
3.4.2 Deformable models using modal analysis
3.4.3 Deformable superquadrics
3.4.4 Active shape models
3.4.5 Other models
Conclusion and future directions

129

131
133
134
136
138
144
145
146
146
147
150
152
153
154
154
155
157
159
161
167
167

130 Image Segmentation Using Deformable Models


3.6
3.7

Further reading
Acknowledgments

168
168

3.8

References

168

Introduction 131
3.1

Introduction

In the past four decades, computerized image segmentation has played an increasingly important role in medical imaging. Segmented images are now used
routinely in a multitude of different applications, such as the quantification of tissue
volumes [1], diagnosis [2], localization of pathology [3], study of anatomical structure [4, 5], treatment planning [6], partial volume correction of functional imaging
data [7], and computer-integrated surgery [8, 9]. Image segmentation remains a
difficult task, however, due to both the tremendous variability of object shapes and
the variation in image quality (see Fig. 3.1). In particular, medical images are often
corrupted by noise and sampling artifacts, which can cause considerable difficulties when applying classical segmentation techniques such as edge detection and
thresholding. As a result, these techniques either fail completely or require some
kind of postprocessing step to remove invalid object boundaries in the segmentation
results.
To address these difficulties, deformable models have been extensively studied and widely used in medical image segmentation, with promising results. Deformable models are curves or surfaces defined within an image domain that can
move under the influence of internal forces, which are defined within the curve or
surface itself, and external forces, which are computed from the image data. The
internal forces are designed to keep the model smooth during deformation. The external forces are defined to move the model toward an object boundary or other desired features within an image. By constraining extracted boundaries to be smooth
and incorporating other prior information about the object shape, deformable models offer robustness to both image noise and boundary gaps and allow integrating
boundary elements into a coherent and consistent mathematical description. Such
a boundary description can then be readily used by subsequent applications. Moreover, since deformable models are implemented on the continuum, the resulting
boundary representation can achieve subpixel accuracy, a highly desirable property for medical imaging applications. Figure 3.2 shows two examples of using
deformable models to extract object boundaries from medical images. The result is
a parametric curve in Fig. 3.2(a) and a parametric surface in Fig. 3.2(b).
Although the term deformable models first appeared in the work by Terzopoulos and his collaborators in the late eighties [1215], the idea of deforming a template for extracting image features dates back much farther, to the work of Fischler and Elschlagers spring-loaded templates [16] and Widrows rubber mask
technique [17]. Similar ideas have also been used in the work by Blake and Zisserman [18], Grenander et al. [19], and Miller et al. [20]. The popularity of deformable models is largely due to the seminal paper Snakes: Active Contours by
Kass, Witkin, and Terzopoulos [13]. Since its publication, deformable models have
grown to be one of the most active and successful research areas in image segmentation. Various names, such as snakes, active contours or surfaces, balloons,
and deformable contours or surfaces, have been used in the literature to refer to

132 Image Segmentation Using Deformable Models

(a)

(b)

Figure 3.1: Variability of object shapes and image quality. (a) A 2D MR image of the heart
left ventricle and (b) a 3D MR image of the brain.

(a)

(b)

Figure 3.2: Examples of using deformable models to extract object boundaries from medical
images. (a) An example of using a deformable contour to extract the inner wall of the left
ventricle of a human heart from a 2D MR image. The circular initial deformable contour is
plotted in gray and the final converged result is plotted in white [10]. (b) An example of using
a deformable surface to reconstruct the brain cortical surface from a 3D MR image [11].

deformable models.
There are basically two types of deformable models: parametric deformable
models (cf. [13, 2123]) and geometric deformable models (cf. [2427]). Paramet-

Parametric deformable models 133


ric deformable models represent curves and surfaces explicitly in their parametric forms during deformation. This representation allows direct interaction with
the model and can lead to a compact representation for fast real-time implementation. Adaptation of the model topology, however, such as splitting or merging
parts during the deformation, can be difficult using parametric models. Geometric deformable models, on the other hand, can handle topological changes naturally. These models, based on the theory of curve evolution [2831] and the level
set method [32, 33], represent curves and surfaces implicitly as a level set of a
higher-dimensional scalar function. Their parameterizations are computed only
after complete deformation, thereby allowing topological adaptivity to be easily
accommodated. Despite this fundamental difference, the underlying principles of
both methods are very similar.
This chapter is organized as follows. We first introduce parametric deformable
models in Section 3.2, and then describe geometric deformable models in Section 3.3. An explicit mathematical relationship between parametric deformable
models and geometric deformable models is presented in Section 3.3.4. In Section 3.4, we provide an overview of several extensions to these deformable models.
Finally, in Section 3.5, we conclude the chapter and point out future research directions. We focus on describing the fundamentals of deformable models and their
application to image segmentation. Treatment of related work using deformable
models in other applications such as image registration and motion estimation is
beyond the scope of this chapter. We refer readers interested in these other applications to Section 3.6, where suggestions for further reading are given.
We note that although this chapter primarily deals with 2D deformable models
(i.e., deformable contours), the principles discussed here apply to 3D deformable
models (i.e., deformable surfaces) as well (cf. [23, 34]).
3.2

Parametric deformable models

In this section, we first describe two different types of formulations for parametric deformable models: an energy minimizing formulation and a dynamic force
formulation. Although these two formulations lead to similar results, the first formulation has the advantage that its solution satisfies a minimum principle whereas
the second formulation has the flexibility of allowing the use of more general types
of external forces. We then present several commonly used external forces that can
effectively attract deformable models toward the desired image features. A numerical implementation of 2D deformable models or deformable contours is described
at the end of this section. Since the implementation of 3D deformable models or
deformable surfaces is more sophisticated than those of deformable contours, we
provide several references in Section 3.2.4 for additional reading rather than presenting an actual implementation.

134 Image Segmentation Using Deformable Models

Figure 3.3: A potential energy function derived from Fig. 3.1(a).

3.2.1

Energy minimizing formulation

The basic premise of the energy minimizing formulation of deformable contours is to find a parameterized curve that minimizes the weighted sum of internal energy and potential energy. The internal energy specifies the tension or the
smoothness of the contour. The potential energy is defined over the image domain
and typically possesses local minima at the image intensity edges occurring at object boundaries (see Fig. 3.3). Minimizing the total energy yields internal forces
and potential forces. Internal forces hold the curve together (elasticity forces)
and keep it from bending too much (bending forces). External forces attract the
curve toward the desired object boundaries. To find the object boundary, parametric curves are initialized within the image domain, and are forced to move toward
the potential energy minima under the influence of both these forces.
Mathematically, a deformable contour is a curve
     ,
 , which moves through the spatial domain of an image to minimize the following energy functional:







(3.1)

The first term is the internal energy functional and is defined to be





 




 


 






 


(3.2)

The first-order derivative discourages stretching and makes the model behave like
an elastic string. The second-order derivative discourages bending and makes the

Parametric deformable models 135


model behave like a rigid rod. The weighting parameters   and   can be used
to control the strength of the models tension and rigidity, respectively. In practice,
  and   are often chosen to be constants.
The second term is the potential energy functional and is computed by integrat:
ing a potential energy function
  along the contour



 

(3.3)

The potential energy function


  is derived from the image data and takes
smaller values at object boundaries as well as other features of interest. Given
a gray-level image   viewed as a function of continuous position variables
 , a typical potential energy function designed to lead a deformable contour
toward step edges is

  

 

     

(3.4)

where  is a positive weighting parameter,     is a two-dimensional Gaussian function with standard deviation  ,  is the gradient operator, and  is the
2D image convolution operator. If the desired image features are lines, then the
appropriate potential energy function can be defined as follows:

          

(3.5)

where  is a weighting parameter. Positive   is used to find black lines on a white


background, while negative  is used to find white lines on a black background.
For both edge and line potential energies, increasing  can broaden its attraction
range. However, larger  can also cause a shift in the boundary location, resulting
in a less accurate result (this problem can be addressed by using potential energies
calculated with different values of  ; see Section 3.2.3).
Regardless of the selection of the exact potential energy function, the procedure for minimizing the energy functional is the same. The problem of finding
 that minimizes the energy functional  is known as a variational
a curve
problem [35]. It has been shown that the curve that minimizes  must satisfy the
following Euler-Lagrange equation [13, 22]:








 



 

 

  

(3.6)

To gain some insight about the physical behavior of deformable contours, we can
view Eq. (3.6) as a force balance equation

 



 

  

(3.7)

136 Image Segmentation Using Deformable Models


where the internal force is given by

 










 



 


(3.8)

and the potential force is given by

 







(3.9)

The internal force  discourages stretching and bending while the potential force
 pulls the contour toward the desired object boundaries. In this chapter, we
define the forces, derived from the potential energy function
  given in either
Eq. (3.4) or Eq. (3.5), as Gaussian potential forces.
To find a solution to Eq. (3.6), the deformable contour is made dynamic by
 as a function of time  as well as
i.e.,
 . The partial
treating
derivative of
with respect to  is then set equal to the left-hand side of Eq. (3.6)
as follows:











 


 

 



(3.10)

The coefficient  is introduced to make the units on the left side consistent with
  stabilizes, the left side vanishes and we
the right side. When the solution
achieve a solution of Eq. (3.6). We note that this approach of making the time
derivative term vanish is equivalent to applying a gradient descent algorithm to find
the local minimum of Eq. (3.1) [34]. Thus, the minimization is solved by placing
an initial contour on the image domain and allowing it to deform according to
Eq. (3.10). Figure 3.4 shows an example of recovering the left ventricle wall using
Gaussian potential forces.
3.2.2

Dynamic force formulation

In the previous section, the deformable model was modeled as a static problem,
and an artificial variable  was introduced to minimize the energy. It is sometimes
more convenient, however, to formulate the deformable model directly from a dynamic problem using a force formulation. Such a formulation permits the use of
more general types of external forces that are not potential forces, i.e., forces that
cannot be written as the negative gradient of potential energy functions. According
  must satisfy
to Newtons second law, the dynamics of a contour











 



 



(3.11)

where  is a coefficient that has a mass unit and


 is the damping (or viscous)
force defined as  , with  being the damping coefficient. In image segmentation, the mass coefficient  in front of the inertial term is often set to zero,

Parametric deformable models 137

(a)

(b)
Figure 3.4: An example of recovering the left ventricle wall using Gaussian potential forces.
(a) Gaussian potential forces and (b) the result of applying Gaussian potential forces to a
deformable contour, with the circular initial contour shown in gray and the final deformed
contour in white.

138 Image Segmentation Using Deformable Models


since the inertial term may cause the contour to pass over the weak edges. The
dynamics of the deformable contour without the inertial term becomes






 



 



(3.12)

The internal forces are the same as specified in Eq. (3.8). The external forces can
be either potential forces or nonpotential forces. We note, however, nonpotential
forces cannot be derived from the variational energy formulation of the previous
section. An alternate variational principle does exist (see [36]); however, it is not
physically intuitive.
External forces are often expressed as the superposition of several different
forces:

 









 



where  is the total number of external forces. This superposition formulation


allows the external forces to be broken down into more manageable terms. For
example, one might define the external forces to be composed of both Gaussian
potential forces and pressure forces, which are described in the next section.
3.2.3

External forces

In this section, we describe several kinds of external forces for deformable


models. These external forces are applicable to both deformable contours and deformable surfaces.
Multiscale Gaussian potential force
When using the Gaussian potential force described in Section 3.2.1,  must
be selected to have a small value in order for the deformable model to follow the
boundary accurately. As a result, the Gaussian potential force can only attract the
model toward the boundary when it is initialized nearby. To remedy this problem,
Terzopoulos, Witkin, and Kass [13, 15] proposed using Gaussian potential forces
at different scales to broaden its attraction range while maintaining the models
boundary localization accuracy. The basic idea is to first use a large value of  to
create a potential energy function with a broad valley around the boundary. The
coarse-scale Gaussian potential force attracts the deformable contour or surface
toward the desired boundaries from a long range. When the contour or surface
reaches equilibrium, the value of  is then reduced to allow tracking of the boundary at a finer scale. This scheme effectively extends the attraction range of the
Gaussian potential force. A weakness of this approach, however, is that there is
no established theorem for how to schedule changes in  . The ad hoc scheduling
schemes that are available may therefore lead to unreliable results.
Pressure force
Cohen [22] proposed to increase the attraction range by using a pressure force
together with the Gaussian potential force. The pressure force can either inflate or

Parametric deformable models 139


deflate the model; hence, it removes the requirement to initialize the model near
the desired object boundaries. Deformable models that use pressure forces are also
known as balloons [22].
The pressure force is defined as

  



(3.13)

 is the inward unit normal 1 of the model at the point


and  is a
where
constant weighting parameter. The sign of   determines whether to inflate or deflate the model and is typically chosen by the user. Recently, region information has
been used to define  with a spatial-varying sign based upon whether the model
is inside or outside the desired object (see [37, 38]). The value of   determines
the strength of the pressure force. It must be carefully selected so that the pressure
force is slightly smaller than the Gaussian potential force at significant edges, but
large enough to pass through weak or spurious edges. When the model deforms,
the pressure force keeps inflating or deflating the model until it is stopped by the
Gaussian potential force. An example of using deformable contour with an inflating pressure force is shown in Fig. 3.5. A disadvantage in using pressure forces is
that they may cause the deformable model to cross itself and form loops (cf. [39]).

Distance potential force


Another approach for extending attraction range is to define the potential energy
function using a distance map as proposed by Cohen and Cohen [40]. The value
of the distance map at each pixel is obtained by calculating the distance between
the pixel and the closest boundary point, based either on Euclidean distance [41]
or Chamfer distance [42]. By defining the potential energy function based on the
distance map, one can obtain a potential force field that has a large attraction range.
Given a computed distance map  , one way of defining a corresponding
potential energy, introduced in [40], is as follows:

   


   

(3.14)

The corresponding potential force field is given by 


  .
Gradient vector flow
The distance potential force is based on the principle that the model point
should be attracted to the nearest edge points. This principle, however, can cause
difficulties when deforming a contour or surface into boundary concavities [43].
A 2D example is shown in Fig. 3.6, where a U-shaped object and a close-up of
its distance potential force field within the boundary concavity is depicted. Notice
1

In the parametric formulation of deformable models, the normal direction is sometimes assumed
to be outward. Here we assume an inward direction for consistency with the geometric formulation
of deformable models introduced in Section 3.3.

140 Image Segmentation Using Deformable Models

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3.5: An example of pressure forces driven deformable contours. (a) Intensity CT
image slice of the left ventricle. (b) Edge detected image. (c) Initial deformable contour. (d)(f) Deformable contour moving toward the left ventricle boundary, driven by inflating pressure
force. Images courtesy of McInerney and Terzopoulos [23], The University of Toronto.

that at the concavity, distance potential forces point horizontally in opposite directions, thus preventing the contour from converging into the boundary concavity. To
address this problem, Xu and Prince [10, 43] employed a vector diffusion equation that diffuses the gradient of an edge map in regions distant from the boundary,
yielding a different force field called the gradient vector flow (GVF) field. The
amount of diffusion adapts according to the strength of edges to avoid distorting
object boundaries.
A GVF field is defined as the equilibrium solution to the following vector partial
differential equation:





          

(3.15)

where      ,   denotes the partial derivative of    with


respect to ,  is the Laplacian operator (applied to each spatial component of
separately), and  is an edge map that has higher value at the desired object

Parametric deformable models 141

(a)

(b)

(c)

Figure 3.6: An example of distance potential force field. (a) A U-shaped object, a close-up
of its (b) boundary concavity, and (c) the distance potential force field within the concavity.

boundary and can be derived using any edge detector. The definition of the GVF
field is valid for any dimension. Two examples of   and   are
  

 

       

where  is a scalar and  is a dummy variable, or


   
    

where  is a positive scalar. GVF has been shown to have a large attraction range
and improved convergence for deforming contours into boundary concavities [10,
43]. An example of using a GVF force field is shown in Fig. 3.7.
Dynamic distance force
An external force that is similar to distance potential force but does not possess the boundary concavity problem has been proposed [44, 45]. This approach
derives an external force by computing a signed distance at each point on the deformable contour or surface. This signed distance is calculated by determining the
closest boundary point or other image feature along the models normal direction.
The distance values are recomputed each time the model is deformed. Several criteria can be used to define the desired boundary point to be searched. The most
common one is to use image pixels that have a high image intensity gradient magnitude or edge points generated by an edge detector. A threshold is specified for
the maximum search distance to avoid confusion with outliers and to reduce the
computation time. The resulting force, which we refer to as the dynamic distance

142 Image Segmentation Using Deformable Models

(a)

(b)
Figure 3.7: An example of the gradient vector flow driven deformable contours. (a) A
gradient vector flow force field and (b) the result of applying gradient vector flow force to a
deformable contour, with the circular initial contour shown in gray and the final deformed
contour in white.

Parametric deformable models 143


force, can attract deformable models to the desired image feature from a fairly long
range limited only by the threshold.
on the contour or surface, its inward unit normal
,
Given a point
the computed signed distance  , and a specified distance threshold 
, a
typical definition for the dynamic distance force is

  







(3.16)

The weakness of this method is that a relatively time-consuming 1D search


along the normal direction must be performed each time the model deforms. Setting
the search distance threshold lower can reduce the run time but has the undesirable
side effect of decreasing the attraction range of the dynamic distance force.
Interactive force
In many clinical situations, it is important to allow an operator to interact with
the deformable model as it is deforming. This interaction improves the accuracy of
the segmentation result when automated external forces fail to deform the model
to the desired feature in certain regions. For example, the user may want to pull
the model toward significant image features, or would like to constrain the model
so that it must pass through a set of landmark points identified by an expert. Deformable models allow these kinds of user interactions to be conveniently modeled
as additional force terms.
Two kinds of commonly used interactive forces are spring forces and volcano
forces, proposed by Kass et al. [13]. Spring forces are defined to be proportional to
the distance between a point
on the model and a user-specified point :

  
(3.17)
Spring forces act to pull the model toward . The further away the model is from
, the stronger the pulling force. The point is selected by finding the closest
point on the model to  using a heuristic search around a local neighborhood of .
 



An example of using spring forces is shown in Fig. 3.8.


Volcano forces are designed to push the model away from a local region around
a volcano point . For computational efficiency, the force is only computed in a
neighborhood  as follows:

















(3.18)

 . Note that the magnitude of the forces is limited near 


where 
avoid numerical instability. Another possible definition for volcano forces is







 






where  is used to adjust the strength distribution of the volcano force.

to

(3.19)

144 Image Segmentation Using Deformable Models

(a)

(b)

Figure 3.8: Example of interactive forces. (a) A CT image slice of a canine left ventricle. (b)
A deformable contour moves toward high gradients in the edge detected image, influenced
by landmark points near the center of the image and a spring force that pulls the contour
toward an edge at the bottom right. Image courtesy of McInerney and Terzopoulos [23], The
University of Toronto.

3.2.4

Numerical implementation

Various numerical implementations of deformable models have been reported


in the literature. For examples, the finite difference method [13], dynamic programming [21], and greedy algorithm [46] have been used to implement deformable contours, while finite difference methods [15] and finite element methods [23, 34, 47]
have been used to implement deformable surfaces. The finite difference method requires only local operations and is efficient to compute. The finite element method,
on the other hand, is more costly to compute but has the advantage of being well
adapted to the irregular mesh representations of deformable surfaces. In this section, we present the finite difference method implementation for deformable contours as described in [13].
Since the numerical scheme proposed by [13] does not require external forces
to be potential forces, it can be used to implement deformable contours using either potential forces or nonpotential forces. By approximating the derivatives in
Eq. (3.12) with finite differences, and converting to the vector notation

Parametric deformable models 145


 
         , we can rewrite Eq. (3.12) as





 


  

 



 




 

 

 

 

 

  

 


 (3.20)

where  is the damping coefficient,    ,    ,  the step size in


space, and  the step size in time. In general, the external force  is stored as a
discrete vector field, i.e., a finite set of vectors defined on an image grid. The value
of  at any location can be obtained through a bilinear interpolation of the
external force values at the grid points near .
Equation (3.20) can be written in a compact matrix form as




 



(3.21)

,
 , and 
  are  matrices, and is an
  pentadiagonal banded matrix with  being the number of sample points.
Equation (3.21) can then be solved iteratively by matrix inversion using the following equation:
where 

  ,

        
 
(3.22)
The inverse of the matrix     can be calculated efficiently by LU decom

position2 . The decomposition needs only to be performed once for deformation


processes that do not alter the elasticity or rigidity parameters.
3.2.5

Discussion

So far, we have formulated the deformable model as a continuous curve or


surface. In practice, however, it is sometimes more straightforward to design the
deformable models from a discrete point of view. Example of work in this area
includes [4853].
Parametric deformable models have been applied successfully in a wide range
of applications; however, they have two main limitations. First, in situations where
the initial model and the desired object boundary differ greatly in size and shape,
the model must be reparameterized dynamically to faithfully recover the object
boundary. Methods for reparameterization in 2D are usually straightforward and
require moderate computational overhead. Reparameterization in 3D, however, requires complicated and computationally expensive methods. The second limitation
2

LU decomposition stands for Lower and Upper triangular decomposition, a well-known technique in linear algebra.

146 Image Segmentation Using Deformable Models


with the parametric approach is that it has difficulty dealing with topological adaptation such as splitting or merging model parts, a useful property for recovering either multiple objects or an object with unknown topology. This difficulty is caused
by the fact that a new parameterization must be constructed whenever the topology
change occurs, which requires sophisticated schemes [54, 55].
3.3

Geometric deformable models

Geometric deformable models, proposed independently by Caselles et al. [24]


and Malladi et al. [25], provide an elegant solution to address the primary limitations of parametric deformable models. These models are based on curve evolution
theory [2831] and the level set method [32, 33]. In particular, curves and surfaces are evolved using only geometric measures, resulting in an evolution that
is independent of the parameterization. As in parametric deformable models, the
evolution is coupled with the image data to recover object boundaries. Since the
evolution is independent of the parameterization, the evolving curves and surfaces
can be represented implicitly as a level set of a higher-dimensional function. As a
result, topology changes can be handled automatically.
In this section, we first review the fundamental concepts in curve evolution theory and the level set method. We next present three types of geometric deformable
models, the difference being in the design of speed functions. We then show a
mathematical relationship between a particular class of parametric and geometric
models. Next, we describe a numerical implementation of geometric deformable
models proposed by Osher and Sethian [32] in Section 3.3.5. Finally, at the end of
this section we compare geometric deformable models with parametric deformable
models. We note that although the geometric deformable models are presented
in 2D, their formulation can be directly extended to 3D. A thorough treatment on
evolving curves and surfaces using the level set representation can be found in [33].
3.3.1

Curve evolution theory

The purpose of curve evolution theory is to study the deformation of curves


using only geometric measures such as the unit normal and curvature as opposed
to the quantities that depend on parameters such as the derivatives of an arbitrary
parameterized curve. Let us consider a moving curve
        ,
where is any parameterization and  is the time, and denote its inward unit normal
and its curvature as , respectively. The evolution of the curve along its
as
normal direction can be characterized by the following partial differential equation:









(3.23)

 is called speed function, since it determines the speed of the curve


where
evolution. We note that a curve moving in some arbitrary direction can always be
reparameterized to have the same form as Eq. (3.23) [56]. The intuition behind this

Geometric deformable models 147


fact is that the tangent deformation affects only the curves parameterization, not
its shape and geometry.
The most extensively studied curve deformations in curve evolution theory are
curvature deformation and constant deformation. Curvature deformation is given
by the so-called geometric heat equation

 




where  is a positive constant. This equation will smooth a curve, eventually


shrinking it to a circular point [57]. The use of the curvature deformation has
an effect similar to the use of the elastic internal force in parametric deformable
models.
Constant deformation is given by

 




where  is a coefficient determining the speed and direction of deformation. Constant deformation plays the same role as the pressure force in parametric deformable
models. The properties of curvature deformation and constant deformation are
complementary to each other. Curvature deformation removes singularities by
smoothing the curve, while constant deformation can create singularities from an
initially smooth curve.
The basic idea of the geometric deformable model is to couple the speed of
deformation (using curvature and/or constant deformation) with the image data, so
that the evolution of the curve stops at object boundaries. The evolution is implemented using the level set method. Thus, most of the research in geometric
deformable models has been focused in the design of speed functions. We review
several representative speed functions in Section 3.3.3.
3.3.2

Level set method

We now review the level set method for implementing curve evolution. The
level set method is used to account for automatic topology adaptation, and it also
provides the basis for a numerical scheme that is used by geometric deformable
models. The level set method for evolving curves is due to Osher and Sethian [32,
58, 59].
In the level set method, the curve is represented implicitly as a level set of a 2D
scalar function referred to as the level set function which is usually defined
on the same domain as the image. The level set is defined as the set of points that
have the same function value. Figure 3.9 shows an example of embedding a curve
as a zero level set. It is worth noting that the level set function is different from the
level sets of images, which are sometimes used for image enhancement [60]. The
sole purpose of the level set function is to provide an implicit representation of the
evolving curve.

148 Image Segmentation Using Deformable Models

(a)

(b)

(c)

Figure 3.9: An example of embedding a curve as a level set. (a) A single curve. (b) The
level set function where the curve is embedded as the zero level set (in black). (c) The
height map of the level set function with its zero level set depicted in black.

Figure 3.10: From left to right, the zero level set splits into two curves while the level set
function still remains a valid function.

Instead of tracking a curve through time, the level set method evolves a curve by
updating the level set function at fixed coordinates through time. This perspective
is similar to that of an Eulerian formulation of motion as opposed to a Lagrangian
formulation, which is analogous to the parametric deformable model. A useful
property of this approach is that the level set function remains a valid function
while the embedded curve can change its topology. This situation is depicted in
Fig 3.10.
We now derive the level set embedding of the curve evolution equation (3.23).
Given a level set function !    with the contour
  as its zero level set,
we have
!

     

Differentiating the above equation with respect to  and using the chain rule, we

Geometric deformable models 149


obtain
!

 !
 



(3.24)

where ! denotes the gradient of !.


We assume that ! is negative inside the zero level set and positive outside.
Accordingly, the inward unit normal to the level set curve is given by

   !! 

(3.25)

Using this fact and Eq. (3.23), we can rewrite Eq. (3.24) as
!



! 

(3.26)

where the curvature  at the zero level set is given by




! ! !  ! ! !  ! !


!
!  ! 

(3.27)

The relationship between Eq. (3.23) and Eq. (3.26) provides the basis for performing curve evolution using the level set method.
Three issues need to be considered in order to implement geometric deformable
contours:
1. An initial function !      must be constructed such that its zero level
set corresponds to the position of the initial contour. A common choice is
to set !       , where    is the signed distance from each
grid point to the zero level set. The computation of the signed distance for an
arbitrary initial curve is expensive. Recently, Sethian and Malladi developed
a method called the fast marching method, which can construct the signed
distance function in "    , where  is the number of pixels. Certain
situations may arise, however, where the distance may be computed much
more efficiently. For example, when the zero level set can be described by
the exterior boundary of the union of a collection of disks, the signed distance
function can be computed in "   as

   
 
    

where   , # is the number of initial disks,
and 


  

are the center

and radius of each disk.

2. Since the evolution equation (3.26) is derived for the zero level set only, the
, in general, is not defined on other level sets. Hence,
speed function
we need a method to extend the speed function  to all of the level sets.

150 Image Segmentation Using Deformable Models


We note that the expressions for the unit normal and the curvature, however,
hold for all level sets. Many approaches for such extensions have been developed (see [33] for a detailed discussion on this topic). However, the level
set function that evolves using these extended speed functions can lose its
property of being a signed distance function, causing inaccuracy in curvature
and normal calculations. As a result, reinitialization of the level set function
to a signed distance function is often required for these schemes. Recently, a
method that does not suffer from this problem was proposed by Adalsteinsson and Sethian [61]. This method casts the speed extension problem as a
boundary value problem, which can then be solved efficiently using the fast
marching method.
3. In the application of geometric contours, constant deformation is often used
to account for large-scale deformation and narrow boundary indentation and
protrusion recovery. Constant deformation, however, can cause the formation
of sharp corners from an initial smooth zero level set. Once the corner is
developed, it is not clear how to continue the deformation, since the definition
of the normal direction becomes ambiguous. A natural way to continue the
deformation is to impose the so-called entropy condition originally proposed
in the area of interface propagation by Sethian [62]. In Section 3.3.5, we
describe an entropy satisfying numerical scheme, proposed by Osher and
Sethian [32], which implements geometric deformable contours.
3.3.3

Speed functions

In this section, we provide a brief overview of three examples of speed functions used by geometric deformable contours.
The geometric deformable contour formulation, proposed by Caselles et al. [24]
and Malladi et al. [25], takes the following form:
!
 $    ! 


(3.28)

where
$



   

(3.29)

Positive  shrinks the curve, and negative  expands the curve. The curve evolution is coupled with the image data through a multiplicative stopping term $. This
scheme can work well for objects that have good contrast. However, when the object boundary is indistinct or has gaps, the geometric deformable contour may leak
out because the multiplicative term only slows down the curve near the boundary
rather than completely stopping the curve. Once the curve passes the boundary, it
will not be pulled back to recover the correct boundary.

Geometric deformable models 151

Figure 3.11: Contour extraction of cyst form ultrasound breast image via merging multiple
initial level sets. Images courtesy of Yezzi [63], Georgia Institute of Technology.

To remedy the latter problem, Caselles et al. [26,64] and Kichenassamy et al. [63,
65] used an energy minimization formulation to design the speed function. This
leads to the following geometric deformable contour formulation:
!
 $    !  $ ! 


(3.30)

Note that the resulting speed function has an extra stopping term $ ! that can
pull back the contour if it passes the boundary. This term behaves in similar fashion
to the Gaussian potential force in the parametric formulation. An example of using
this type of geometrical deformable contours is shown in Fig. 3.11.
The latter formulation can still generate curves that pass through boundary
gaps. Siddiqi et al. [66] partially address this problem by altering the constant
speed term through energy minimization, leading to the following geometric deformable contour:
!

 % $!  $ !  $ 


$! 

(3.31)

In this case, the constant speed term  in Eq. (3.30) is replaced by the second
term, and the term  $ provides additional stopping power that can prevent
the geometrical contour from leaking through small boundary gaps. The second
term can be used alone as the speed function for shape recovery as well. Figure 3.12
shows an example of this deformable contour model. Although this model is robust
to small gaps, large boundary gaps can still cause problems.

152 Image Segmentation Using Deformable Models

Figure 3.12: Segmentation of the brain using only the second term in (3.31). Left to right
and top to bottom: iterations 1, 400, 800, 1200, and 1600. Images courtesy of Siddiqi [66],
McGill University.

At this time, there is no geometric deformable contour model possessing the


property of convergence to both perceptual boundaries (large boundary gaps) and
boundary concavities as there are in parametric deformable contours [43].
3.3.4

Relationship to parametric deformable models

In the previous section, we described three types of geometric deformable contours that behave similarly to the parametric deformable contours but have the advantage of being able to change their topology automatically. The relationship between parametric deformable contours and geometric deformable contours can be
formulated more precisely. Through an energy minimization formulation, Caselles
et al. [64] showed that the geometric deformable contour in Eq. (3.30) is equivalent to the parametric deformable contour without the rigidity term. This derivation
only permits the use of a speed function induced by a potential force, a property
shared by almost all the geometric deformable models. In this section, we derive an
explicit mathematical relationship between a dynamic force formulation of parametric deformable models and a geometric deformable model formulation, thus
permitting the use of speed functions derived from nonpotential forces, i.e., forces
that cannot be expressed as the negative gradient of potential energy functions.

Geometric deformable models 153


For the convenience of derivation, we consider a simplified but more commonly
used dynamic force formulation for parametric deformable contours:

   
(3.32)
Note that since the use of a pressure force     can cause singularities dur




 



ing deformation and requires special numerical implementation, we have separated


it from the rest of the external forces. To represent Eq. (3.32) using a level set
representation, we need to recast this formulation into the standard curve evolution
form defined in Eq. (3.23). The corresponding geometric deformable contour in
level set representation can then be obtained by using Eq. (3.26).
Since the contours tangential motion only affects its parameterization but not
its geometry, we modify Eq. (3.32) by considering only the normal components of
internal and external forces. Given a parameterized curve
 , where is the
and curvature
arc-length parameterization of the curve, its inward unit normal
, we can use the fact that       to rewrite Eq. (3.32) as follows:




 & 
    
(3.33)

   , and      . Here, we have divided through
where &   ,
by  so that both sides have units of velocity. If we let   &     ,
where  is given by Eq. (3.25), and substitute  into Eq. (3.26), we obtain the


following geometric deformable contour evolution equation:


!



!   ! 

!  & 



(3.34)

If we allow both & and  to be functions defined on the image domain, then
Eq. (3.34) generalizes Eq. (3.31) and can be used to implement almost any parametric deformable model as a geometric deformable model.
3.3.5

Numerical implementation

In this section, we provide a numerical implementation that is adapted from [33]


for Eq. (3.34), in which & and  are allowed to be functions. The spatial derivatives
are implemented using a special numerical scheme that can handle the formation
of sharp corners during deformation. The numerical implementation is given as
follows:

!

  

 
!

  
&          
 

  

   

   

  '
      '
   



  (       (      




(3.35)

154 Image Segmentation Using Deformable Models

where   ' ( , and 


 is the central difference approximation to the curvature expression given in Eq. (3.27). The first-order numerical derivatives and the
gradient of the level set function ! are given by
 

 





 




!


!

 !



 




 

  

 !

  


  



 !


 






 !









                 

 
     








                 

 

     

A detailed description of the principle behind this numerical method is described in [33]. We note that more efficient implementations of geometric deformable models have been developed, including the particularly noteworthy
narrow-band level set method described in [25, 67].
3.3.6

Discussion

Although topological adaptation can be useful in many application, it can sometimes lead to undesirable results. Geometric deformable models may generate
shapes that have inconsistent topology with respect to the actual object, when applied to noisy images with significant boundary gaps. In these situations, the significance of ensuring a correct topology is often a necessary condition for many
subsequent applications. For example, in the brain functional study using fMRI or
PET data, it is necessary to unfold the extracted cortical surface and create a flat or
spherical map so that a user can visualize the functional activation in deep buried
cortical regions (see [68, 69]). Parametric deformable models are better suited to
these applications because of their strict control on topology.
3.4

Extensions of deformable models

Numerous extensions have been proposed to the deformable models described


in the previous sections, particularly to extend the parametric deformable models. These extensions address two major areas for improving standard deformable
models. The first area is the incorporation of additional prior knowledge into the
models. Use of prior knowledge in a deformable model can lead to more robust
and accurate results. This is especially true in applications where a particular structure that requires delineation has similar shape across a large number of subjects.
Incorporation of prior knowledge requires a training step that involves manual interaction to accumulate information on the variability of the object shape being

Extensions of deformable models 155


delineated. This information is then used to constrain the actual deformation of the
contour or surface to extract shapes consistent with the training data.
The second area that has been addressed by various extensions of deformable
models is in modeling global shape properties. Traditional parametric and geometric deformable models are local models contours or surfaces are assumed
to be locally smooth. Global properties such as orientation and size are not explicitly modeled. Modeling of global properties can provide greater robustness to
initialization. Furthermore, global properties are important in object recognition
and image interpretation applications because they can be characterized using only
a few parameters. Note that although prior knowledge and global shape properties
are distinct concepts, they are often used in conjunction with one another. Global
properties tend to be much more stable than local properties. Therefore, if information about the global properties is known a priori, it can be used to greatly improve
the performance of the deformable model.
In this section, we review several extensions of deformable models that use
prior knowledge and/or global shape properties. We focus on revealing the fundamental principles of each extension and refer the reader to the cited literature for a
full treatment of the topic.
3.4.1

Deformable Fourier models

In standard deformable models, a direct parameterization is typically utilized


for representing curves and surfaces. Staib and Duncan [70] have proposed using
a Fourier representation for parameterizing deformable contours and surfaces. A
Fourier representation for a closed contour is expressed as




 
 




)
$

  ) *  




$

 +,
 +,




(3.36)

where )  $  )  *  $      are Fourier coefficients. The Fourier coefficients of


 are computed by

)

)

*


+


+ 


+ 

  +,

  +, 

and the coefficients of   are computed in analogous fashion. Open contours


can also be parameterized using a straightforward modification of Eq. (3.36), as
described in [70].
The advantages of the Fourier representation are that a compact representation
of smooth shapes can be obtained by truncating the series and that a geometric

156 Image Segmentation Using Deformable Models


description of the shape can be derived to characterize global shape properties.
From Eq. (3.36), the coefficients ) and $ define the translation of the contour.
Each subsequent term in the series expansion follows the parametric form of an
ellipse. It is possible to map the coefficients to a parameter set that describes the
object shape in terms of standard properties of ellipses [70]. Furthermore, like
the Fourier coefficients, these parameters follow a scale ordering, where low index
parameters describe global properties and higher indexed parameters describe more
local deformations.

Figure 3.13: Segmenting the corpus callosum from an MR midbrain sagittal image using

a deformable Fourier model. Top left: MR image (146106). Top right: positive magnitude
of the Laplacian of the Gaussian (

).

Bottom left: initial contour (six harmonics).

Bottom right: final contour on the corpus callosum of the brain. Images courtesy of Staib
and Duncan [70], Yale University.

Staib and Duncan apply a Bayesian approach to incorporating prior information into their model. A prior probability function is defined by first manually or
semi-automatically delineating structures of the same class as the structure to be
extracted. Next, these structures are parameterized using the Fourier coefficients,
or using the converted parameter set based on ellipses. Mean and variance statistics
are finally computed for each of the parameters.
Assuming independence between the parameters, the multivariate Gaussian

Extensions of deformable models 157


prior probability function is given by

 

+ 

 


(3.37)

 .  .   .  is the parameter vector derived by truncating the


where
Fourier coefficients3 ,  is the mean of the th parameter in the training data, and
  is the variance. A posterior probability function is defined that balances the
prior probability model and a data model, which measures the discrepancy between
boundary features in the image and the deformable contour. In [70], a gradient
ascent method was used to maximize the posterior probability function. More recently, a genetic algorithm was proposed in [71]. Figure 3.13 shows an example of
using the deformable Fourier model to recover the corpus callosum of the human
brain.

3.4.2

Deformable models using modal analysis

Another way to restrict the mostly unstructured motion associated with the standard deformable model is to use modal analysis (Pentland and Horowitz [72], Nastar and Ayache [53]). This approach is similar to the deformable Fourier model
except that both the basis functions and the nominal values of their coefficients are
derived from a template object shape.
Deformable models based on modal analysis use the theory of finite elements
[73]. An object is assumed to be represented by a finite set of elements whose
positions are defined by the positions of  nodes, which are points in -dimensional
space. The node positions can be stacked into a vector , which has length  , and
element interpolation characterizes the complete object shape on the continuum. If
the object moves or deforms, its new position is given by

, where
is a
vector of length  representing the collection of nodal displacements.
The equation governing the objects motion can be written as a collection of
ordinary differential equations constraining the nodal displacements. This is compactly written as






         

where
, , and
are the mass, damping, and stiffness matrices of the system and is an  -dimensional vector of external forces acting on the nodes.
and are assumed to be functions of time. Derivation of
, , and
Both
are described in the literature (cf. Pentland and Horowitz [72], Terzopoulos and
Metaxas [47]).
Solution of the generalized eigenvalue problem



/ 

Two modified definitions for the parameter vector were also proposed in [70].

158 Image Segmentation Using Deformable Models


yields the modes
can be written as

and eigenvalues / , 

       .

The nodal displacements

   

where  is the (orthogonal) matrix whose columns comprise the modes and
vector of motion coefficients. The governing equation can then be written as

 




         

 is a
(3.38)

where  is a diagonal matrix having the eigenvalues corresponding to the modes


on its diagonal. It is customary to assume the Rayleigh condition, which implies
that   is also a diagonal matrix. This decouples the equations in (3.38), leaving  equations to solve for the  motion coefficients.
Shape variations are constrained and computation times are reduced by approximating the nodal positions using only the . lower-order modes (those corresponding to the larger eigenvalues in   ). This is conceptually equivalent to
keeping the lowest-order Fourier coefficients, but this approximation does not necessarily smooth the shape since sharp bends are still allowed if the lowest-order
modes possess such bends. In this case, the nodal positions become

   

(3.39)

where  is the matrix consisting of the first . columns of , and  is the vector comprising the first . motion coefficients from  . The governing equations
become

where
and

       


 



  

(3.40)

 ,  is the diagonal matrix comprising the first . eigenvalues,




   

(3.41)

To implement a deformable model using reduced modal analysis, we assume


that external forces have been specified exactly as in standard deformable models
(see Section 3.2.3). We also assume that the initial displacements (from the template object) are zero and the initial velocities are zero. Approximate integration
can then be accomplished using the explicit Euler scheme:

 
 



   


   



          

 

   



         


Extensions of deformable models 159


where  is a time step that must be chosen small enough for convergence and good
accuracy. The nodal displacements are given by Eq. (3.39) and the nodal positions
 . Using this information, the vector of external forces can be
are given by
recomputed from the image data after each time step. Solution of the explicit Euler
scheme equations is particularly easy because the equations are decoupled, and it
is very fast because only the . retained modes need be computed.

3.4.3

Deformable superquadrics

Another extension of deformable models that has been used for incorporating local and global shape features is the deformable superquadric, proposed by
Terzopoulos and Metaxas [47]. This is essentially a hybrid technique where a superquadric surface, which can be defined with a relatively small number of parameters, is allowed to deform locally for reconstructing the shape of an object.
Although the fitting of global and local deformations is performed simultaneously,
the global deformation is forced to account for as much of the object shape as
possible. The estimated superquadric therefore captures the global shape characteristics and can readily be used in object recognition applications, while the local
deformations capture the details of the object shape.
Terzopoulos and Metaxas consider models that are closed surfaces, denoted by
, where the parametric coordinates
 ' ( . This surface can be expressed
as

 
   
(3.42)
where
is a translation vector, and  is a rotation matrix. The vector function
  denotes the model shape irrespective of pose and can further be expressed as
        
(3.43)
where   is a reference shape consisting of the low parameter global shape
model, and   is a displacement function consisting of the local deformations.

The reference shapes in this case are superquadrics, which are an extension
of standard quadric surfaces. These surfaces have been used in a variety of applications for computer graphics and computer vision, because of their ability to
accommodate a large number of shapes with relatively few parameters. The kind
of superquadric of interest here is the superellipsoid, which can be expressed implicitly as [74]







)

  




0
)



 


 

(3.44)

where   )  )  )   are aspect ratio parameters, and &  &   control the
squareness of the shape. Using a spherical coordinate reference frame, Terzopou-

160 Image Segmentation Using Deformable Models


los and Metaxas employ the following expression for the superellipsoid:

) 1 1 
  )  ) 1 2   
) 2



(3.45)

+  '  + , +  (  +, 2    3  3, and 1 
  3   3  . The parameter )   controls the scale of the shape. Thus, the
where



 is characterized by a total of six parameters, which can be


reference shape
collected into a single vector  :

  ) )  ) ) &  & 


(3.46)
The displacement function   is decomposed into a linear combination of finite


element basis functions and can be written as

     
(3.47)
where  is a matrix of the basis functions and  is a vector of the local deforma

tion parameters [47].


We denote the vector of all the parameters required by the deformable superquadric to reconstruct a shape as , which consists of  ,  , as well as the
rotation and translation parameters of the model from Eq. (3.42). Terzopoulos and
Metaxas use a physics-based model based on the traditional parametric deformable
model to introduce a time variable and model the deformation process (see Section 3.2.2). Given some initialization for , a simplified dynamic force equation
can be written as

 

     

(3.48)

where the first term represents damping forces controlled by the damping matrix

 , the second term represents internal forces of the model controlled by the stiffness matrix  , and  are the external forces. As with the parametric deformable


model, the model deforms according to Eq. (3.48) until these forces reach equilibrium.
An important aspect in such a hybrid model is that the global reference shape
should account for as much of the shape to be reconstructed as possible. This is
accomplished in Eq. (3.48) by appropriately defining the stiffness matrix . In
particular, all entries of
that do not correspond to local deformations are set to
zero. This amounts to imposing no penalty on the evolution of the rotation, translation, and superquadric parameters. On the other hand, entries corresponding to
the local deformation parameters are selected such that their evolution is restricted
with respect to their magnitude and first derivative.

Extensions of deformable models 161


As with traditional parametric deformable models, the deformable superquadric
is also well suited to motion estimation tasks, as is described in [75]. For this reason, a popular application of models based on superquadrics has been in cardiac
imaging [74], where the simple shape of the heart can be readily modeled by a
superellipsoid. The deformable superquadric model has been extended by Vemuri
and Radisavljevic [76], who employed a wavelet parameterization of the local deformation process. The multiresolution nature of the wavelet decomposition allows
for a smooth transition between the global superquadric and the local descriptors.
They also present a method utilizing training data for obtaining a prior model of
the global parameters. The deformable superquadric model has also been adapted
to multilevel shape representation [77].
3.4.4

Active shape models

Active shape models (ASMs) proposed by Cootes et al. [78, 79] use a different
approach to incorporate prior shape information. Their prior models are not based
on the parameterization, but are instead based on a set of points defined at various
features in the image. In the following, we summarize how the prior model is
constructed and used to enhance the performance of a deformable model and how
the ASM paradigm can be extended to incorporate prior information on the image
intensity rather than on the shape alone.
Construction of the ASM prior model
The ASM prior model is constructed by first establishing a set of labeled point
features, or landmarks, within the class of images to be processed [see Figs. 3.14(a)
and (b)]. These points are manually selected on each of the images in the training
set4 . Once selected, the set of points for each image is aligned to one another
with respect to translation, rotation, and scaling. This is accomplished using an
iterative algorithm based on the Procrustes method [80]. This linear alignment
allows studying the object shape in a common coordinate frame, which we will
refer to as the model space of the ASM. After the alignment, there is typically still
a substantial amount of variability in the coordinates of each point. To compactly
describe this variability as a prior model, Cootes and Taylor developed the Point
Distribution Model (PDM), which we now describe.
Given  aligned shapes
     in the model space, where 
    
 
  is a -dimensional vector describing the coordinates
of the  points from the th shape, the mean shape,  , is defined to be

 

  
4





(3.49)

See the remarks at the end of this section for recent work on automated landmark labeling.

162 Image Segmentation Using Deformable Models

(a)

(b)

(c)
Figure 3.14: An example of constructing Point Distribution Models. (a) An MR brain image, transaxial slice, with 114 landmark points of deep neuroanatomical structures superimposed. (b) A 114-point shape model of 10 brain structures. (c) Effect of simultaneously
varying the models parameters corresponding to the first two largest eigenvalues (on a
bi-dimensional grid). Images courtesy of Duta and Sonka [81], The University of Iowa.

Extensions of deformable models 163

A covariance matrix, , is computed by

    




       

(3.50)

The eigenvectors corresponding to the largest eigenvalues of the covariance matrix describe the most significant modes of variation. Because almost all of the
variability in the model can be described using these eigenvectors, only  such
eigenvectors are selected to characterize the entire variability of the training set.
Note that in general,  is significantly smaller than the number of points in the
model.
Using a principal component analysis (PCA), any shape
in the training set
can be approximated by

     

  

(3.51)

is the matrix of the first  eigenvectors, and 


* * *  is a vector of weights, referred to as the shape parameters. The
change of shape can be made by varying accordingly. Limits on the values of
are imposed to constrain the actual amount of deviation from the mean shape.
Figure 3.14(c) shows a collection of shapes generated for several subcortical structures from similar transaxial MR brain images by using the two most significant
eigenvectors.
where



Model fitting procedure


The key idea of ASMs is to constrain the behavior of deformable models using the PDM obtained as described in the previous section (cf. [79, 81, 82]). At
each iteration, a standard deformation of the parametric deformable model is approximated by adjusting both the pose (translation, rotation, and scale) parameters
and the shape parameters of the model instance. Thus, only deformations that produce shapes similar to those in the training set are allowed. The iteration stops
when changes in both the pose and shape parameters are insignificant. Figure 3.15
shows an example of using active shape models to extract the heart wall from an
ultrasound image.
Let us denote the position of the model instance at the beginning of a deformation step as
      
  
  , and the required deformation computed from both internal and external forces as a displacement vector
      
  
  . Then the position of the model instance,
, can be compactly represented by its pose and shape parameters, i.e.,




   and
#  4  

(3.52)


(3.53)

164 Image Segmentation Using Deformable Models

(a)

(b)

(c)

(d)

Figure 3.15: An example of Active Shape Models. (a) An echocardiogram image. (b) The
initial position of the heart chamber boundary model. The location of the model after (c) 80
and (d) 200 iterations. Images courtesy of Cootes et al. [79], The University of Manchester.

where is the scaling factor, 4 is the rotation angle, #  4  is a linear transformation that performs scaling and rotation on , and       is the center
of the model instance.
First, a global fit is performed by adjusting the pose parameters so that the generated model instance aligns best with the expected model instance  . The
proper pose parameter adjustments, , 4, and  , can be estimated efficiently
using a standard least-squares approach (see [79] for details).

Extensions of deformable models 165


After adjusting the pose parameters, the remaining difference between the generated and expected model instances can be explained by varying the shape parameters. To calculate the adjustment to the shape parameters, first we need to find
the corresponding residue, , in the model space, which is required to satisfy the
following constraint:

#

    

   4  4 

 yields



(3.54)

Solving the above equation for

  #     4  4   
(3.55)
where   #  4   . Note that to derive Eq. (3.55), both Eq. (3.52)
and #
 4   #
 4  are used.
Having solved for  , we next find the adjustments, , to the shape parame

ters such that

          

(3.56)

A solution can be obtained using a least-squares approximation [83], yielding

   

  

(3.57)

Note that   .
To summarize, an iteration step of the ASM consists of first finding a displacement of the model instance in the image space, then calculating the corresponding
adjustments to both the pose and shape parameters, and updating the parameters
accordingly. Note that in practice, weighted adjustments are usually used to update
both the pose and shape parameters [79]. When the shape parameters are updated,
their values are limited within a specified range so that the shape of the model
instance remains similar to the shapes of the training examples.
Active appearance models
A limitation of the ASM is that its prior model does not consider gray-level variation of the object instance across images. To overcome this difficulty, Edwards,
Cootes, and Taylor [8486] proposed an extension to the ASM, called active appearance models (AAM). In AAM, a new prior model is constructed using both
shape and grey-level information. Because the objects represented by AAMs are
more specific than those represented by ASMs, in many applications, AAMs can
lead to more robust results than ASMs.
We will now describe how AAMs are constructed. First, the shape difference
of each object instance is compensated by warping the instance image in such a
way that the warped instance shape matches the mean shape obtained through the
PDM procedure of the ASM. The warping step is implemented using a triangulation

166 Image Segmentation Using Deformable Models


algorithm (see [87]). The resulting shape-normalized images can then be used to
analyze grey-level variations seen from various example images.
Next, a PCA is applied to the shape-normalized images, yielding a linear model
that characterizes the grey-level variation, i.e.,

     

 

(3.58)

where  is the mean normalized gray-level vector,  is a matrix consisting of


significant modes of gray-level variations, and  is the gray-level parameters that
weight the contribution from different modes of gray-level variations in  . As
described previously in Eq. (3.52), the instance shape is given by

     
Here, for consistency with Eq. (3.58),  and 

 

(3.59)


 are used to denote the significant
modes of shape variation and the shape parameters, respectively. Thus, given any
instance image of the object of interest, its shape and gray-level pattern can be
represented compactly using the vectors  and  .
Because the shape and grey-level parameters may be correlated, a further PCA
is applied to these combined shape and grey-level vectors        , where
 is a diagonal matrix of weights to compensate the difference in units between
the shape and grey-level parameters. The PCA yields another linear model

  

  



 




(3.60)

where is a set of orthogonal modes, 5 and 5 are the corresponding submatrices for the shape and gray-level parameters, respectively, and is referred to as the
appearance parameters that regulate the variations of both the shape and gray-level
pattern of the model.
The final representation of the shape and graylevels in terms of is given by







    

   



(3.61)
(3.62)

Despite the fact that the number of the appearance parameters is less than the total
number of the parameters in the original gray-level vector, matching the appearance
model to an unseen image can be a time-consuming task. In [85], Cootes, Edwards,
and Taylor proposed a fast matching algorithm that first learns a linear relationship
between matching errors and desired parameter adjustments from training examples, then uses this information to predict the parameter adjustments in the real
matching process.
Remarks
In addition to the AAM extension to the ASM, there are many other extensions.
Duta and Sonka [81] applied the ASM to segment subcortical structures from MR

Conclusion and future directions 167


brain images. They improved the overall reconstruction accuracy of the ASM algorithm by incorporating an outlier-detection algorithm during each deformation
step. Wang and Staib [82] incorporated an additional smoothness prior into the
PDM models to allow the generation of more flexible shape instances. They reformulated the ASM as a Bayesian problem and solved the problem by maximizing
the a posteriori probability.
A major limitation of the ASM is the requirement to place landmarks on the
training images. This procedure is a laborious task for annotating 2D images and
becomes even more demanding for annotating 3D images. This limitation, however, has been partially alleviated by the recent automatic labeling work [8890].
3.4.5

Other models

Additional extensions have also been proposed to use global shape information
or prior shape information. For example, Ip and Shen [91] incorporated prior shape
information by using an affine transformation to align a shape template with the
deformable model and guide the models deformation to produce a shape consistent
with the template.
The deformable Fourier model, active shape model, and other extensions we
discussed so far are all parametric deformable models. Guo and Vemuri [92] have
proposed a framework for incorporating global shape prior information into geometric deformable models. Like the deformable superquadric, their hybrid geometric deformable model uses a combination of an underlying, low parameter,
generator shape that is allowed to evolve. Their model thus retains the advantages
of traditional geometric deformable models, such as topological adaptivity.
External forces for deformable models are typically defined from edges in the
image. Fritsch et al. [93] have developed a technique called deformable shape loci,
which uses information on the medial loci or cores of the shapes to be extracted (see
Section 14.3.11). The incorporation of cores provides greater robustness to image
disturbances such as noise and blurring than purely edge-based models. This allows
their model to be fairly robust to initialization as well as imaging artifacts. They
also employed a probabilistic prior model for important shape features as well as
for the spatial relationships between these features.
3.5

Conclusion and future directions

In this chapter, we have described the fundamental formulation of both parametric and geometric deformable models and shown that they can be used in recovering shape boundaries. We have also derived an explicit mathematical relationship between these two formulations that allows one to share the design of
external forces and speed functions. This may lead to new, improved deformable
models. Finally, we give a brief overview of several important extensions of deformable models that use application-specific prior knowledge and/or global shape

168 Image Segmentation Using Deformable Models


properties to obtain more robust and accurate results.
We expect that further improvements in deformable models will be made by the
continued research in both external force and speed function design, model representation, model training and learning, and model performance validation. Another
challenging research direction is to develop deformable models that have a greater
control in topology. For example, models that can both constrain or change topology depending on the requirements of an application would be extremely useful.
Promising approaches have been proposed recently, such as the work by McInerney
and Terzopoulos [94], who developed a hybrid method that maintains both implicit
and explicit representation for a given model to allow more effective control of the
topology. Finally, integrating deformable models with existing medical systems,
such as surgical simulation, planning, and treatment systems, can further validate
the application of deformable models in a clinical setting and may in turn stimulate
the development of better deformable models.
3.6

Further reading

Several current texts deal with deformable models. The book by Blake and
Yuille [95] contains an excellent collection of papers on the theory and practice
of deformable models. Application of deformable models in motion tracking is
covered in great depth in two recent books [96] by Metaxas and [97] by Blake and
Isard, respectively. The book edited by Singh, Goldgolf, and Terzopoulos [98] consists of a valuable collection of papers on deformable models and their application
in medical image analysis. The book by Sethian [33] on level set methods is a
comprehensive resource for geometric deformable models. A recent survey paper
by McInerney and Terzopoulos [99] provides an excellent source for learning the
application of deformable models in medical image analysis.
3.7

Acknowledgments

The authors would like to thank Milan Sonka, Michael Fitzpartrick, and David
Hawkes for reading and commenting upon the draft of this chapter. The work was
supported in part by an NSF Presidential Faculty Grant (MIP93-50336) and an NIH
Grant (R01NS37747).
3.8

References

[1] S. M. Larie and S. S. Abukmeil, Brain abnormality in schizophrenia: a systematic


and quantitative review of volumetric magnetic resonance imaging studies, J. Psych.,
vol. 172, pp. 110120, 1998.
[2] P. Taylor, Invited review: computer aids for decision-making in diagnostic radiology
a literature review, Brit. J. Radiol., vol. 68, pp. 945957, 1995.
[3] A. P. Zijdenbos and B. M. Dawant, Brain segmentation and white matter lesion detection in MR images, Critical Reviews in Biomedical Engineering, vol. 22, pp. 401
465, 1994.

References 169
[4] A. J. Worth, N. Makris, V. S. Caviness, and D. N. Kennedy, Neuroanatomical segmentation in MRI: technological objectives, Intl J. Patt. Recog. Artificial Intell.,
vol. 11, pp. 11611187, 1997.
[5] C. A. Davatzikos and J. L. Prince, An active contour model for mapping the cortex,
IEEE Trans. Med. Imag., vol. 14, pp. 6580, 1995.
[6] V. S. Khoo, D. P. Dearnaley, D. J. Finnigan, A. Padhani, S. F. Tanner, and M. O.
Leach, Magnetic resonance imaging (MRI): considerations and applications in radiotheraphy treatment planning, Radiother. Oncol., vol. 42, pp. 115, 1997.
[7] H. W. Muller-Gartner, J. M. Links, J. L. Prince, R. N. Bryan, E. McVeigh, J. P. Leal,
C. Davatzikos, and J. J. Frost, Measurement of radiotracer concentration in brain
gray matter using positron emission tomography: MRI-based correction for partial
volume effects, J. Cereb. Blood Flow Metab., vol. 12, pp. 571583, 1992.
[8] N. Ayache, P. Cinquin, I. Cohen, L. Cohen, F. Leitner, and O. Monga, Segmentation of complex three-dimensional medical objects: a challenge and a requirement
for computer-assisted surgery planning and performance, in Computer-Integrated
Surgery: Technology and Clinical Applications (R. H. Taylor, S. Lavallee, G. C. Burdea, and R. Mosges, eds.), pp. 5974, MIT Press, 1996.
[9] W. E. L. Grimson, G. J. Ettinger, T. Kapur, M. E. Leventon, W. M. Wells, et al., Utilizing segmented MRI data in image-guided surgery, Intl J. Patt. Recog. Artificial
Intell., vol. 11, pp. 13671397, 1997.
[10] C. Xu and J. L. Prince, Generalized gradient vector flow external forces for active
contours, Signal Processing An International Journal, vol. 71, no. 2, pp. 131139,
1998.
[11] C. Xu, D. L. Pham, M. E. Rettmann, D. N. Yu, and J. L. Prince, Reconstruction
of the human cerebral cortex from magnetic resonance images, IEEE Trans. Med.
Imag., vol. 18, pp. 467480, 1999.
[12] D. Terzopoulos, On matching deformable models to images. Technical Report 60,
Schlumberger Palo Alto research, 1986. Reprinted in Topical Meeting on Machine
Vision, Technical Digest Series, Vol. 12, 1987, 160-167.
[13] M. Kass, A. Witkin, and D. Terzopoulos, Snakes: active contour models, Intl J.
Comp. Vis., vol. 1, no. 4, pp. 321331, 1987.
[14] D. Terzopoulos and K. Fleischer, Deformable models, The Visual Computer, vol. 4,
pp. 306331, 1988.
[15] D. Terzopoulos, A. Witkin, and M. Kass, Constraints on deformable models: recovering 3D shape and nonrigid motion, Artificial Intelligence, vol. 36, no. 1, pp. 91
123, 1988.
[16] M. A. Fischler and R. A. Elschlager, The representation and matching of pictorial
structures, IEEE Trans. on Computers, vol. 22, no. 1, pp. 6792, 1973.
[17] B. Widrow, The rubber-mask technique, Pattern Recognition, vol. 5, pp. 175
211, 1973.
[18] A. Blake and A. Zisserman, Visual Reconstruction. Boston: MIT Press, 1987.

170 Image Segmentation Using Deformable Models


[19] U. Grenander, Y. Chow, and D. M. Keenan, Hands: A Pattern Theoretic Study of
Biological Shapes. New York: Springer-Verlag, 1991.
[20] M. I. Miller, G. E. Christensen, Y. Amit, and U. Grenander, Mathematical textbook of deformable neuroanatomies, Proc. National Academy of Science, vol. 90,
pp. 1194411948, 1993.
[21] A. A. Amini, T. E. Weymouth, and R. C. Jain, Using dynamic programming for
solving variational problems in vision, IEEE Trans. Patt. Anal. Mach. Intell., vol. 12,
no. 9, pp. 855867, 1990.
[22] L. D. Cohen, On active contour models and balloons, CVGIP: Imag. Under., vol. 53,
no. 2, pp. 211218, 1991.
[23] T. McInerney and D. Terzopoulos, A dynamic finite element surface model for segmentation and tracking in multidimensional medical images with application to cardiac 4D image analysis, Comp. Med. Imag. Graph., vol. 19, no. 1, pp. 6983, 1995.
[24] V. Caselles, F. Catte, T. Coll, and F. Dibos, A geometric model for active contours,
Numerische Mathematik, vol. 66, pp. 131, 1993.
[25] R. Malladi, J. A. Sethian, and B. C. Vemuri, Shape modeling with front propagation:
a level set approach, IEEE Trans. Patt. Anal. Mach. Intell., vol. 17, no. 2, pp. 158
175, 1995.
[26] V. Caselles, R. Kimmel, and G. Sapiro, Geodesic active contours, in Proc. 5th Intl
Conf. Comp. Vis., pp. 694699, 1995.
[27] R. T. Whitaker, Volumetric deformable models: active blobs, Tech. Rep. ECRC-9425, European Computer-Industry Research Centre GmbH, 1994.
[28] G. Sapiro and A. Tannenbaum, Affine invariant scale-space, Intl J. Comp. Vis.,
vol. 11, no. 1, pp. 2544, 1993.
[29] B. B. Kimia, A. R. Tannenbaum, and S. W. Zucker, Shapes, shocks, and deformations I: the components of two-dimensional shape and the reaction-diffusion space,
Intl J. Comp. Vis., vol. 15, pp. 189224, 1995.
[30] R. Kimmel, A. Amir, and A. M. Bruckstein, Finding shortest paths on surfaces using
level sets propagation, IEEE Trans. Patt. Anal. Mach. Intell., vol. 17, no. 6, pp. 635
640, 1995.
[31] L. Alvarez, F. Guichard, P. L. Lions, and J. M. Morel, Axioms and fundamental equations of image processing, Archive for Rational Mechanics and Analysis, vol. 123,
no. 3, pp. 199257, 1993.
[32] S. Osher and J. A. Sethian, Fronts propagating with curvature-dependent speed: algorithms based on Hamilton-Jacobi formulations, J. Computational Physics, vol. 79,
pp. 1249, 1988.
[33] J. A. Sethian, Level Set Methods and Fast Marching Methods: Evolving Interfaces in
Computational Geometry, Fluid Mechanics, Computer Vision, and Material Science.
Cambridge, UK: Cambridge University Press, 2nd ed., 1999.

References 171
[34] I. Cohen, L. D. Cohen, and N. Ayache, Using deformable surfaces to segment 3D images and infer differential structures, CVGIP: Imag. Under., vol. 56, no. 2,
pp. 242263, 1992.
[35] R. Courant and D. Hilbert, Methods of Mathematical Physics, vol. 1. New York:
Interscience, 1953.
[36] J. L. Prince and C. Xu, Nonconservative force models in active geometry, in
Proc. IEEE Image and Multidimensional Signal Processing Workshop (IMDSP98),
pp. 139142, 1998.
[37] R. Ronfard, Region-based strategies for active contour models, Intl J. Comp. Vis.,
vol. 13, no. 2, pp. 229251, 1994.
[38] C. S. Poon and M. Braun, Image segmentation by a deformable contour model incorporating region analysis, Phys. Med. Biol., vol. 42, pp. 18331841, 1997.
[39] H. Tek and B. B. Kimia, Volumetric segmentation of medical images by threedimensional bubbles, Comp. Vis. Imag. Under., vol. 65, pp. 246258, 1997.
[40] L. D. Cohen and I. Cohen, Finite-element methods for active contour models and
balloons for 2-D and 3-D images, IEEE Trans. Patt. Anal. Mach. Intell., vol. 15,
no. 11, pp. 11311147, 1993.
[41] P. E. Danielsson, Euclidean distance mapping, Comp. Graph. Imag. Proc., vol. 14,
pp. 227248, 1980.
[42] G. Borgefors, Distance transformations in arbitrary dimensions, Comp. Vis. Graph.
Imag. Proc., vol. 27, pp. 321345, 1984.
[43] C. Xu and J. L. Prince, Snakes, shapes, and gradient vector flow, IEEE Trans. Imag.
Proc., vol. 7, no. 3, pp. 359369, 1998.
[44] H. Delingette, Simplex meshes: a general representation for 3D shape reconstruction, Tech. Rep. TR2214, INRIA, Sophia-Antipolis, France, 1994.
[45] D. MacDonald, D. Avis, and A. C. Evans, Multiple surface identification and matching in magnetic resonance images, in SPIE Proc. Visualization in Biomedical Computing, vol. 2359, pp. 160169, 1994.
[46] D. J. Williams and M. Shah, A fast algorithm for active contours and curvature estimation, CVGIP: Imag. Under., vol. 55, no. 1, pp. 1426, 1992.
[47] D. Terzopoulos and D. Metaxas, Dynamic 3D models with local and global deformations: deformable superquadrics, IEEE Trans. Patt. Anal. Mach. Intell., vol. 13,
pp. 703714, 1991.
[48] A. Gupta, L. von Kurowski, A. Singh, D. Geiger, C.-C. Liang, M.-Y. Chiu, L. P. Adler,
M. Haacke, and D. L. Wilson, Cardiac MR image segmentation using deformable
models, in Proc. IEEE Conf. Computers in Cardiology, pp. 747750, 1993.
[49] H. Delingette, Adaptive and deformable models based on simplex meshes, in Proc.
IEEE Workshop on Motion of Non-Rigid and Articulated Objects, pp. 152157, 1994.
[50] S. Kumar and D. Goldgof, Automatic tracking of SPAMM grid and the estimation of
deformation parameters from cardiac MR images, IEEE Trans. Med. Imag., vol. 13,
pp. 122132, 1994.

172 Image Segmentation Using Deformable Models


[51] D. Geiger, A. Gupta, L. A. Costa, and J. Vlontzos, Dynamic programming for detecting, tracking, and matching deformable contours, IEEE Trans. Patt. Anal. Mach.
Intell., vol. 17, pp. 294402, 1995.
[52] S. Lobregt and M. A. Viergever, A discrete dynamic contour model, IEEE Trans.
Med. Imag., vol. 14, pp. 1224, 1995.
[53] C. Nastar and N. Ayache, Frequency-based nonrigid motion analysis: application
to four dimensional medical images, IEEE Trans. Patt. Anal. Mach. Intell., vol. 18,
pp. 10671079, 1996.
[54] R. Durikovic, K. Kaneda, and H. Yamashita, Dynamic contour: a texture approach
and contour operations, The Visual Computer, vol. 11, pp. 277289, 1995.
[55] T. McInerney and D. Terzopoulos, Topologically adaptable snakes, in Proc. 5th
Intl Conf. Comp. Vis., pp. 840845, 1995.
[56] B. B. Kimia, Conservation Laws and a Theory of Shape. Ph.D. thesis, McGill Centre
for Intelligent Machines, McGill University, Montreal, Canada, 1990.
[57] M. A. Grayson, Shortening embedded curves, Annals of Mathematics, vol. 129,
pp. 71111, 1989.
[58] J. A. Sethian, Curvature and evolution of fronts, Commun. Math. Phys., vol. 101,
pp. 487499, 1985.
[59] J. A. Sethian, A review of recent numerical algorithms for hypersurfaces moving
with curvature dependent speed, J. Differential Geometry, vol. 31, pp. 131161,
1989.
[60] G. Sapiro, Geometric partial differential equations in image analysis: past, present,
and future, in Proc. IEEE Intl Conf. Imag. Proc., vol. 3, pp. 14, 1995.
[61] D. Adalsteinsson and J. A. Sethian, The fast construction of extension velocities in
level set methods, J. Computational Physics, vol. 148, pp. 222, 1999.
[62] J. A. Sethian, An Analysis of Flame Propagation. Ph.D. thesis, Dept. of Mathematics,
University of California, Berkeley, CA, 1982.
[63] A. Yezzi, S. Kichenassamy, A. Kumar, P. Olver, and A. Tennenbaum, A geometric
snake model for segmentation of medical imagery, IEEE Trans. Med. Imag., vol. 16,
pp. 199209, 1997.
[64] V. Caselles, R. Kimmel, and G. Sapiro, Geodesic active contours, Intl J. Comp.
Vis., vol. 22, pp. 6179, 1997.
[65] S. Kichenassamy, A. Kumar, P. Olver, A. Tennenbaum, and A. Yezzi, Conformal
curvature flows: from phase transitions to active vision, Arch. Rational Mech. Anal.,
vol. 134, pp. 275301, 1996.
[66] K. Siddiqi, Y. B. Lauzi`ere, A. Tannenbaum, and S. W. Zucker, Area and length
minimizing flows for shape segmentation, IEEE Trans. Imag. Proc., vol. 7, pp. 433
443, 1998.
[67] D. L. Chopp, Computing minimal surfaces via level set curvature flow, J. of Comp.
Phys., vol. 106, pp. 7791, 1993.

References 173
[68] P. C. Teo, G. Sapiro, and B. A. Wandell, Creating connected representations of cortical gray matter for functional MRI visualization, IEEE Trans. Med. Imag., vol. 16,
pp. 852863, 1997.
[69] C. Xu, D. L. Pham, and J. L. Prince, Finding the brain cortex using fuzzy segmentation, isosurfaces, and deformable surface models, in Proc. Information Processing
in Medical Imaging (IPMI97), pp. 399404, 1997.
[70] L. H. Staib and J. S. Duncan, Boundary finding with parametrically deformable models, IEEE Trans. Patt. Anal. Mach. Intell., vol. 14, no. 11, pp. 10611075, 1992.
[71] K. Delibasis, P. E. Undrill, and G. G. Cameron, Designing Fourier descriptor-based
geometric models for object interpretation in medical images using genetic algorithms, Comp. Vis. Imag. Under., vol. 66, pp. 286300, 1997.
[72] A. Pentland and B. Horowitz, Recovery of nonrigid motion and structure, IEEE
Trans. Patt. Anal. Mach. Intell., vol. 13, pp. 730742, 1991.
[73] K. H. Huebner, E. A. Thornton, and T. G. Byrom, The Finite Element Method for
Engineers. New York: John Wiley & Sons, 3rd ed., 1994.
[74] E. Bardinet, L. D. Cohen, and N. Ayache, A parametric deformable model to fit
unstructured 3D data, Comp. Vis. Imag. Under., vol. 71, pp. 3954, 1998.
[75] D. Metaxas and D. Terzopoulos, Shape and nonrigid motion estimation through
physics-based synthesis, IEEE Trans. Patt. Anal. Mach. Intell., vol. 15, pp. 580591,
1993.
[76] B. C. Vemuri and A. Radisavljevic, Multiresolution stochastic hybrid shape models
with fractal priors, ACM Trans. Graph., vol. 13, pp. 177207, 1994.
[77] D. Metaxas, E. Koh, and N. J. Badler, Multi-level shape representation using global
deformations and locally adaptive finite elements, Intl J. Comp. Vis., vol. 25, pp. 49
61, 1997.
[78] T. F. Cootes, A. Hill, C. J. Taylor, and J. Haslam, Use of active shape models for locating structures in medical images, Imag. Vis. Computing J., vol. 12, no. 6, pp. 355
366, 1994.
[79] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, Active shape models their
training and application, Comp. Vis. Imag. Under., vol. 61, no. 1, pp. 3859, 1995.
[80] J. C. Gower, Generalized Procrustes analysis, Psychometrika, vol. 40, pp. 3351,
1975.
[81] N. Duta and M. Sonka, Segmentation and interpretation of MR brain images: an
improved active shape model, IEEE Trans. Med. Imag., vol. 17, pp. 10491062,
1998.
[82] Y. Wang and L. H. Staib, Boundary finding with correspondence using statistical
shape models, in Proc. IEEE Conf. Comp. Vis. Patt. Recog., pp. 338345, 1998.
[83] G. H. Golub and C. F. V. Loan, Matrix Computations. Baltimore, MD: The Johns
Hopkins University Press, 3rd ed., 1996.

174 Image Segmentation Using Deformable Models


[84] G. J. Edwards, C. J. Taylor, and T. F. Cootes, Interpreting face images using active
appearance models, in Proc. Intl Conf. Automatic Face Gesture Recog., pp. 300
305, 1998.
[85] T. F. Cootes, G. J. Edwards, and C. J. Taylor, Active appearance models, in Proc.
European Conf. Comp. Vis., pp. 484498, 1998.
[86] T. F. Cootes, C. Beeston, G. J. Edwards, and C. J. Taylor, A unified framework for
atlas matching using active appearance models, in Proc. Information Processing in
Medical Imaging (IPMI99), pp. 323333, 1999.
[87] G. J. Edwards, C. J. Taylor, and T. F. Cootes, Learning to identify and track faces in
image sequences, in Proc. British Mach. Vis. Conf., pp. 317322, 1997.
[88] A. Hill and C. J. Taylor, Automatic landmark identification using a new method
of non-rigid correspondence, in Proc. Information Processing in Medical Imaging
(IPMI97), pp. 483488, Springer-Verlag, 1997.
[89] A. C. W. Kotcheff and C. J. Taylor, Automatic construction of eigenshape models by
genetic algorithm, in Proc. Information Processing in Medical Imaging (IPMI97),
pp. 114, Springer-Verlag, 1997.
[90] N. Duta, M. Sonka, and A. K. Jain, Learning shape models from examples using
automatic shape clustering and Procrustes analysis, in Proc. Information Processing
in Medical Imaging (IPMI99), pp. 370375, 1999.
[91] H. H. S. Ip and D. Shen, An affine-invariant active contour model (AI-snake) for
model-based segmentation, Imag. Vis. Computing J., vol. 16, pp. 135146, 1998.
[92] Y. Guo and B. C. Vemuri, Hybrid geometric active models for shape recovery in
medical images, in Proc. Information Processing in Medical Imaging (IPMI99),
pp. 112125, 1999.
[93] D. Fritsch, S. Pizer, L. Yu, V. Johnson, and E. Chaney, Segmentation of medical image objects using deformable shape loci, in Proc. Information Processing in Medical
Imaging (IPMI97), pp. 127140, 1997.
[94] T. McInerney, Topologically Adaptable Deformable Models for Medical Image Analysis. Ph.D. thesis, Department of Computer Science, University of Toronto, 1997.
[95] A. Blake and A. Yuille, eds., Active Vision. Series: Artificial intelligence, Cambridge,
Massachusetts: The MIT Press, 1992.
[96] D. N. Metaxas, Physics-Based Deformable Models. Boston: Kluwer Academic Publishers, 1996.
[97] A. Blake and M. Isard, Active Contours: The Application of Techniques from Graphics, Vision, Control Theory and Statistics to Visual Tracking of Shapes in Motion.
New York: Springer-Verlag, 1998.
[98] A. Singh, D. Goldgof, and D. Terzopoulos, Deformable models in medical image
analysis. Los Alamitos, CA: IEEE Computer Society, 1998.
[99] T. McInerney and D. Terzopoulos, Deformable models in medical image analysis: a
survey, Med. Imag. Anal., vol. 1, no. 2, pp. 91108, 1996.

CHAPTER 4
Morphological Methods for Biomedical Image
Analysis
John Goutsias
The Johns Hopkins University
Sinan Batman
The Johns Hopkins University

Contents
4.1
4.2

Introduction
Binary morphological operators

177
182

4.2.1

Increasing and translation invariant operators

182

4.2.2
4.2.3
4.2.4

Erosion and dilation


Representational power of erosions and dilations
Opening and closing

182
185
187

4.2.5

Representational power of structural openings and closings


190
The hit-or-miss operator
192
Morphological gradients
194

4.2.6
4.2.7

4.3

4.4

4.2.8 Conditional dilation


4.2.9 Annular openings and closings
4.2.10 Morphological filters
Morphological representation of binary images
4.3.1 The discrete size transform

195
195
199
201
201

4.3.2

202

The pattern spectrum

4.3.3 The morphological skeleton transform


Grayscale morphological operators
4.4.1 Threshold decomposition
4.4.2 Increasing and translation invariant operators

204
206
207
209

4.4.3

209

Erosion and dilation

175

176 Morphological Methods for Biomedical Image Analysis


4.4.4
4.4.5

Representational power of erosions and dilations


Opening and closing

214
214

4.4.6

4.5
4.6

4.7

Representational power of structural openings and closings


4.4.7 Flat image operators
4.4.8 Morphological gradients
4.4.9 Opening and closing top-hat
4.4.10 Conditional dilation

218
218
220
221
223

4.4.11 Morphological filters


Grayscale discrete size transform
Morphological image reconstruction
4.6.1 Reconstruction of binary images
4.6.2 Reconstruction of grayscale images

223
225
227
227
230

4.6.3

233

Examples

Morphological image segmentation


4.7.1 The distance transform
4.7.2 Skeleton by influence zones (SKIZ)

237
237
240

4.7.3

Watershed-based segmentation of nonoverlapping particles


241

4.7.4
4.7.5
4.7.6

Geodesic SKIZ
242
Watershed-based segmentation of overlapping particles 245
Grayscale segmentation
246

4.7.7 Examples
4.8 Conclusions and further discussion
4.9 Acknowledgments
4.10 References

253
260
262
263

Introduction 177
This chapter is an introduction to an image processing and analysis tool known
as mathematical morphology. Our purpose is to provide an easy-to-read overview
of basic aspects of this area of research and illustrate its applicability in image
processing and analysis problems related to biomedical imaging.
4.1

Introduction

In most image processing and analysis applications, a given image is transformed


  .  is deinto an image  by means of an image operator  , such that 
signed to perform a specific task; for example, to remove noise from while preserving important image attributes (e.g., object boundaries). Usually, the image operator  is limited to satisfy two fundamental and intuitive properties: distributivity
and translation invariance. If is an image obtained by combining two images
and  with a given operation, then a distributive operator  applied on produces
an image  that is the result of combining images  and  , obtained by applying
 on
and  , respectively, with the same operation. This property guarantees
that the effect of operator  on the combined image can be deduced from applying
 on the individual images. On the other hand, if is an image obtained by translating an image , then  is a translation invariant operator if 
   implies
  . This property guarantees that operator  produces the same result
that 
(within a translation) when applied on translated versions of the same image.
The class of distributive image operators clearly depends on the way images
are combined together. When images are combined by standard addition (i.e., by
adding the graylevel values at each pixel), a distributive operator is called linear
(provided that it also satisfies the scaling property      , for every image
and any constant ). For a linear and translation invariant operator  , we have
that
   



 







 

 



(4.1)

at each pixel     , which is known as the convolution equation. Throughout


this chapter, will either be the two-dimensional Euclidean space  (for images
defined over a continuous space), or the two-dimensional discrete space   (for
images that have been digitized). In Eq. (4.1),    is the point spread function
of the linear operator  . In this case, the design of image operator  is equivalent
to constructing an appropriate point spread function   .
Image operators of the form of Eq. (4.1) have been extensively used in image
processing and analysis (e.g., see [13]). The main reason for their popularity is
that Eq. (4.1) implies
   

        



 

 

(4.2)

where      denotes the 2D Fourier transform of an image   , and


     is the frequency response of operator  . Equation (4.2) is much simpler

178 Morphological Methods for Biomedical Image Analysis


than Eq. (4.1) and dramatically simplifies mathematical calculations and numerical
implementations. However, the product form of Eq. (4.2) reveals a major drawback
for using operators of the form of Eq. (4.1) in image processing and analysis. We
illustrate this argument with a simple example.
Assume that
 
, where  is a noise-free image and
is additive noise.
We are interested in designing an image operator  that removes the noise
while
preserving the image  . From Eq. (4.2), and the fact that          
     , where     and      are the 2D Fourier transforms of
    and
  , respectively, it is clear that
   

        

                  

If
    

then
   




for          
otherwise

     for          

otherwise

If the Fourier transform      of


is zero at low frequencies (which occurs
in many practical situations), the resulting image  will be a lowpass version of the
noise-free image  and the most desirable attributes in  (e.g., object boundaries)
will be smoothed out!
Another drawback of image operators of the form of Eq. (4.1) is that they
cannot be applied on binary (black and white) images. The reason is that linear
operators of the form of Eq. (4.1) assume that images are combined by standard
addition, which is not possible in the binary case. An obvious question that arises
here is: how do we superimpose binary images? A binary image (object)   
can be either considered to be a function from into   or a set  defined by

      . In the latter case, the set-complement   of
 is given by  
      . Note that, when we look at
two overlapping objects  and  , we either perceive both objects as one (i.e., we
perceive their union    ), or we perceive the non-occluded part of each object
individually (i.e., we perceive their set difference   or   ). This is illustrated in Fig. 4.1. Since   
 , this suggests that binary images are
superimposed by either union or intersection. In this framework, a distributive set
operator  satisfies


  
 

(4.3)

Introduction 179
F2 \ F1 = F1c I F2

F1 U F2

F2

F1

F2

F1

(a)

(b)

Figure 4.1: When we look at two overlapping objects

and  : (a) we either perceive both


objects as one (i.e., we perceive their union
 ), or (b) we perceive the non-occluded
  ).
part of each object individually (i.e., we perceive their set difference  

or

       

(4.4)

Any set operator that satisfies Eq. (4.3) is called a binary erosion, whereas any set
operator that satisfies Eq. (4.4) is called a binary dilation. Erosions and dilations
are the most elementary operators of mathematical morphology [47]. We may
therefore argue that mathematical morphology is a natural algebraic framework
for binary image processing and analysis.
Mathematical morphology was first introduced in a seminal book by George
Matheron, entitled Random Sets and Integral Geometry [4]. This book laid down
the foundation of mathematical morphology and introduced it as a novel technique
for image processing and analysis. Mathematical morphology was subsequently
enriched and popularized by the highly inspiring book Image Analysis and Mathematical Morphology by Jean Serra [5]. Today, mathematical morphology is considered to be a powerful tool for image processing and analysis and has been used in
numerous applications, including industrial inspection, automatic target detection,
biomedical imaging, and remote sensing, just to mention a few.
From a theoretical point of view, mathematical morphology studies operators
between complete lattices (i.e., nonempty sets furnished with a partial order relationship for which every subset has an infimum and a supremum); see [6, 810] for
a lattice theoretic approach to mathematical morphology. Reasons why the theory
of complete lattices is the right algebraic framework for mathematical morphology
can be found in [11]. The power of mathematical morphology stems from the fact
that any translation invariant operator between complete lattices can be represented
by means of elementary morphological operators (e.g., see [12, 13]). An image operator can be built by composing elementary morphological operators. However,
this approach is not practical due to the need of using a prohibitively large number of elementary operators. Fortunately, most applications can be satisfactorily

180 Morphological Methods for Biomedical Image Analysis


dealt with by restricting the number of morphological operators. The key task of
the image analyst is to identify the family of morphological operators needed for a
particular problem, whether it is for shape detection, extraction, or filtering.
The concept of erosion (dilation) can be extended to the grayscale case in a
rather straightforward manner. As it will soon become clear, this is accomplished
by means of the so-called threshold decomposition, which uniquely represents a
grayscale image as a collection of binary images, known as the cross sections.
Using threshold decomposition, the grayscale analogues of union and intersection
are supremum and infimum, respectively (or maximum and minimum in the case
of a finite number of images). Therefore, and as far as grayscale morphology is
concerned, two grayscale images and  are combined by means of (pixelwise)
minimum or maximum, in order to produce images  and  , given by

  
  

   
    

where and denote infimum and supremum, respectively. In this framework, a


distributive grayscale operator  either satisfies






   

(4.5)







   

(4.6)

or

Any operator that satisfies Eq. (4.5) is called a grayscale erosion, whereas any
operator that satisfies Eq. (4.6) is called a grayscale dilation.
Later in this chapter, we will see that for a grayscale translation invariant erosion  , we have
 

 




 






 


(4.7)

whereas, for a grayscale translation invariant dilation  , we have


 

 




 
  

 




(4.8)

where    is a grayscale image known as the structuring function. Notice the
striking similarity between Eq. (4.1) and Eq. (4.7), Eq. (4.8). For example, by
comparing Eq. (4.1) with Eq. (4.8), it is clear that Eq. (4.8) is obtained by replacing
the integral in Eq. (4.1) with the supremum, the point spread function with the
structuring function, and the product with an addition. For this reason, Eq. (4.7)
and Eq. (4.8) are sometimes called morphological convolutions.

Introduction 181
Our purpose in this chapter is to provide an easy-to-read overview of basic
aspects of mathematical morphology and illustrate its use in image processing
and analysis problems related to biomedical imaging. The reader is referred to
[2, 3, 14, 15] for an elementary treatment and to [47, 1619] for a more advanced
exposition. The number of publications available on mathematical morphology has
grown substantially. Due to space limitation, we limit our references here to publications that complement our exposition. For works using mathematical morphology in biomedical imaging applications, the reader is referred to [2053], among
many other publications.
In this chapter, we first choose to discuss mathematical morphology on binary
images. This is the subject of Sections 4.2 and 4.3. We then extend our exposition
to mathematical morphology on grayscale images. This is the subject of Sections
4.4 and 4.5. Although such a route produces some redundancy, we believe that is
more pleasing to the reader for a very simple reason: binary morphology can be
easily visualized by means of geometry, making the exposition very intuitive. In
Section 4.6, we discuss the problem of binary and grayscale morphological image
reconstruction. In mathematical morphology, image reconstruction is the process
of extracting desirable parts from a given image, which have been marked by a
set of markers. Image reconstruction turns out to be very effective in problems of
object detection and image segmentation. The problem of binary and grayscale
image segmentation by means of morphological operators and, in particular, by
means of the watershed transform is discussed in Section 4.7. Finally, Section 4.8
summarizes our concluding remarks and provides a brief discussion on some recent
developments in mathematical morphology. Moreover, this section gives a list of
web sites where mathematical morphology software and toolboxes can be found.
Throughout this chapter, we illustrate basic concepts by means of examples
derived from biomedical imaging applications. Most of these examples are used for
illustration purposes only and they should not be thought of as validated solutions
to the biomedical imaging problems associated with them.
Before we proceed, we would like to make some remarks regarding our notation. Grayscale images (functions) are denoted with small letters      . Capital
letters      denote binary images (sets). For simplicity of presentation, we
limit ourselves to images defined over the two-dimensional Euclidean plane  or
the two-dimensional discrete plane   . However, most of the concepts discussed
here can be extended to higher dimensions (e.g., see [31]). The pair    will
denote a point (pixel) in  or   . However, we also use      to denote points
in . Finally, set operators (i.e., operators that apply on binary images and produce binary images) will be denoted with capital Greek letters      , whereas
grayscale operators (i.e., operators that apply on grayscale images and produce
grayscale images) will be denoted with small Greek letters      .

182 Morphological Methods for Biomedical Image Analysis


4.2

Binary morphological operators

Mathematical morphology is a tool for extracting geometric information from binary and grayscale images. A shape probe (known as a structuring element) is used
to build an image operator whose output depends on whether or not this probe fits
inside a given image. Clearly, the nature of the extracted information depends on
the shape and size of the probe used. To illustrate this concept, we initially restrict
our discussion to the case of binary images.
4.2.1

Increasing and translation invariant operators

The most elementary set operators of interest to mathematical morphology are operators that are increasing. A set operator  is increasing if


     

An increasing operator  forbids an object  that is occluded by an object   to


become visible after processing (i.e.,   will still be occluded by  ).
On the other hand, translation invariance plays an important role in image processing and analysis. In most applications, an image object should be processed
the same way no matter where it is located on the imaging plane. If   
        denotes the translation of a set  by   , then a set operator 
is translation invariant if

       

for every translation   . A set operator may be increasing but not translation
invariant and vice versa. Most frequently, however, we are interested in operators
that are both increasing and translation invariant.
4.2.2

Erosion and dilation

As we discussed in Section 4.1, a set operator   , for which

 
  
 

(4.9)

for every pair     of binary images, is a binary erosion, whereas, a set operator
 , for which

          
for every pair     of binary images, is a binary dilation.

(4.10)

As a direct consequence of Eq. (4.9) and Eq. (4.10), both erosion and dilation are increasing
 

operators (e.g., if   , then Eq. (4.9) implies   
  
   ).
Every translation invariant erosion is of the form

 



Binary morphological operators 183


F B

B+h

F 8B
F 8B

B:
(
B+h

B:

(a)

(b)

Figure 4.2: The effect of translation invariant erosion, in (a), and translation invariant dilation, in (b), on a shape

. Notice that

is shrunk in (a) whereas it is expanded in

(b).

for some elementary subset  of the two-dimensional space, known as a structuring


element (e.g., see [6]). In set theory, the translation invariant erosion is known as
the Minkowski subtraction. It can be shown that


          

This formula provides a geometric interpretation for the translation invariant erosion. It suggests that the translation invariant erosion of a set (shape)  by a structuring element  comprises all points  of such that the structuring element 
located at  fits entirely inside  .
The effect of the translation invariant erosion, using a disk structuring element,
is illustrated in Fig. 4.2(a). Notice that  is shrunk (i.e.,     ) in a manner
determined by the shape and size of the structuring element  . This is always true
when the structuring element contains the origin. In general, a set operator  for
which    , for every image  , is called anti-extensive.
Similarly, every translation invariant dilation is of the form

  





for some structuring element  . In set theory, the translation invariant dilation is
known as the Minkowski addition. It can be shown that

       
   
       is the reflection of  with respect to the origin.


where 
This
formula suggests that the dilation of a set (shape)  by a structuring element 

184 Morphological Methods for Biomedical Image Analysis


Table 4.1: Some properties of binary erosion.
F 8{h} = F - h
( F + h )8 B = F 8( B - h ) = ( F 8 B ) + h

Translation invariance

F 8 B1 F 8 B2 if B1 B2

Decreasingness with respect to


structuring element

F 8( B1 U B2 ) = ( F 8 B1 ) I( F 8 B2 )

Parallel composition

( F1 I F2 )8 B = ( F1 8 B ) I( F2 8 B )

Distributivity of intersection

F 8( B1 I B2 ) ( F 8 B1 ) U( F 8 B2 )

Parallel composition inequality

( F1 U F2 )8 B ( F1 8 B ) U( F2 8 B )

Parallel composition inequality

( F 8 B1 )8 B2 = F 8( B1 B2 )

Serial composition

F1 F2 F1 8 B F2 8 B

Increasingness with respect to shape

rF 8rB = r ( F 8 B )

Homogeneity

F 8 B F if B contains the origin


(
F 8 B = ( F c B )c

Anti extensitivity
Duality

comprises all points  such that the reflected structuring element  translated to

 hits (intersects)  . Moreover,         , which is a form of duality
between erosion and dilation. This relationship simply says that the dilation of a
shape  by a structuring element  is the set complement of the erosion of the set

complement of  with structuring element  .
The effect of the translation invariant dilation, using a disk structuring element,
is illustrated in Fig. 4.2(b). Notice that  is expanded (i.e.,     ) in a
manner determined by the shape and size of  . This is always true when  contains
the origin. In general, a set operator  for which     , for every image  , is
called extensive.
Some properties of erosion and dilation are listed in Tables 4.1 and 4.2. In these
tables, 
      is the scaled version of  . Throughout this chapter,
we simply refer to translation invariant erosion and dilation as erosion and dilation,
respectively.
The effect of erosion and dilation on a binarized image of a histological breast
sample is shown in Fig. 4.3. Notice that erosion by a disk structuring element
with a diameter of pixels, effectively removes debris, producing the clean image
depicted in Fig. 4.3(c). However, when a larger disk is used, significant portions of
the image are removed as well. This is depicted in Fig. 4.3(d). Clearly, the erosion
is very sensitive to pepper noise (i.e., small black dots) in the image, which tends

Binary morphological operators 185


Table 4.2: Some properties of binary dilation.
F {h} = F + h
F B = B F

Commutativity

( F + h) B = F ( B + h) = ( F B) + h

Translation invariance

F B1 F B2 if B1 B2

Increasingness with respect to


structuring element

F ( B1 U B2 ) = ( F B1 ) U( F B2 )

Parallel composition

( F1 U F2 ) B = ( F1 B ) U( F2 B )

Distributivity of union

F ( B1 I B2 ) ( F B1 ) I( F B2 )

Parallel composition inequality

( F B1 ) B2 = F ( B1 B2 )

Serial composition

F1 F2 F1 B F2 B

Increasingness with respect to shape

rF rB = r ( F B )

Homogeneity

F B F if B contains the origin


(
F B = ( F c 8 B )c

Extensitivity
Duality

to get larger in size. Reciprocally, dilation effectively expands the main features
in an image, as depicted in Figs. 4.3(e),(f).
4.2.3

Representational power of erosions and dilations

A celebrated theorem by George Matheron states that every translation invariant


and increasing set operator can be expressed as a union of erosions or intersection
of dilations [4]. To be more precise, for any translation invariant and increasing set
operator , we have

 





 




(4.11)

where  is the kernel of operator , defined by

     contains the origin 


(4.12)
and  is the dual of operator , given by      
 . This result, however,


is mainly of theoretical interest since, in practice, the kernel is highly redundant.


Usually, a subset  of the kernel  is found, known as the basis of ,
which is much smaller than the kernel and can still represent the operator [54]. We
illustrate this for the case of a median filter.
The median filter is an increasing and translation invariant operator, and as
such, it is amenable to the representation dictated by Eq. (4.11). Indeed, the median

186 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.3: Binary erosion and dilation: (a) original grayscale image of the histologic appearance of fibrocystic changes in breast; (b) original image after binarization; (c) erosion

 pixels; (d) erosion of (b) by a disk


 pixels; (e) dilation of (c) by a disk structuring el-

of (b) by a disk structuring element with a diameter of


structuring element with a diameter of
ement with a diameter of

 pixels; (f) dilation of (c) by a disk structuring element with a

diameter of  pixels. Data courtesy of K. C. Klatt, Department of Pathology, University of


Utah. Used with permission.

Binary morphological operators 187


filter over the cross mask           , centered at the
origin  , has a basis that consists of the following ten structuring elements [55]:











 

 














 

 













 

 











 

 
















where the underlined digit denotes the center of the structuring element. Notice
that the cross mask contains five elements. The output of the median filter, with
the mask placed at a pixel    of an image  , is  if at least three of the pixels
in the mask have value  and is zero otherwise. This is equivalent to at least one
of the structuring elements in the basis, shifted at pixel   , being a subset of
 . According to Eq. (4.11), with  being replaced by , the median filter
over the cross mask can be implemented as the union of ten erosions with the
structuring elements listed above. Notice that the erosion of  by any of these
structuring elements can be implemented by an AND operation, whereas the union
of erosions can be implemented by an OR operation. Therefore, the representation
of Eq. (4.11) on one hand provides a link between erosions and median filters, and
on the other hand it provides a way of implementing median filters by means of
AND and OR operations.
4.2.4

Opening and closing

In general, the erosion and dilation operators do not enjoy an often desirable property, known as idempotence. An idempotent shape operator excerpts all information
from a single application, while consecutive applications of the same operator will
not have any effect. The ideal bandpass filter is an example of a linear operator
that shares this property. The erosion and dilation operators on the other hand keep
modifying the image at each successive application. However, when a translation
invariant erosion    is composed with a translation invariant dilation    , the
resulting operator is idempotent, regardless of the order of composition. The composition       is called an opening, whereas the composition      
is called a closing.
The notions of opening and closing are, however, more general: a set operator  that is increasing (i.e.,       ), anti-extensive (i.e.,
   ), and idempotent (i.e.,    ) is called an opening, whereas
a set operator  that is increasing, extensive (i.e.,   ), and idempotent is
called a closing. It can be shown that openings and closings are closed with respect
to union and intersection, respectively. This means that union of openings is an
opening, whereas intersection of closings is a closing.
The operator

 

     

(4.13)

188 Morphological Methods for Biomedical Image Analysis


F B

B+h

F B
F B
F

B:

B:
(
B+h

(a)

(b)

Figure 4.4: The effect of structural opening, in (a), and structural closing, in (b), on a shape
.

is referred to as the structural opening, since it involves a structuring element  .


Later, the reader will have the opportunity to encounter openings that are not of this
form. It can be shown that

 

            

This formula provides a geometric interpretation for the structural opening in Eq.
(4.13). It suggests that   is the union of all translated structuring elements
   that fit inside  . The effect of the structural opening   (using a disk
structuring element) is illustrated in Fig. 4.4(a). Notice that   attempts to undo
the effect of erosion    , by applying the associated dilation       .
Opening a shape  with a structuring element  removes all components of  that
are smaller than  , in the sense that they cannot contain any translated replica
of  . It therefore acts as a smoothing filter. The amount and type of smoothing is
determined by the shape and size of the structuring element used.
Similarly, the operator

 

     

is referred to as the structural closing. It can be shown that

 

           
   


This formula suggests that   is the collection of all pixels  such that all

translated structuring elements    which contain  intersect  . The effect
of the structural closing   (using a disk structuring element) is illustrated in
Fig. 4.4(b). Notice that   attempts to undo the effect of dilation    by

Binary morphological operators 189


Table 4.3: Some properties of the binary structural opening.
F {h} = F
( F + h) B = ( F B) + h

Translation invariance

F ( B + h) = F B

F1 F2 F1 B F2 B

Increasingness with respect to shape

rF rB = r ( F B )

Homogeneity

FB F

Anti extensitivity

( F B) B = F B
(
F B = ( F c B )c

Idempotence
Duality

applying the associated erosion      . Closing a shape  with a structuring




element  removes all components of   that are smaller than  . This is a direct
consequence of the duality between structural openings and closings, which says

that       .
Some properties of structural openings and closings are listed in Tables 4.3 and
4.4, respectively. The effect of these operators on a binarized image of a histological breast sample is depicted in Fig. 4.5 (see also Fig. 4.3).
Another useful opening operator is the so-called binary area opening. This
operator removes a grain from a binary image whose area is less than a given value.
By a grain of an image  , we mean a connected component  of  . A component
  is connected if, given two pixels in  , there exists at least one path
that connects these two pixels and lies entirely in  . Mathematically, the binary
area opening is expressed by

 
 
      
where   
      are the grains of  and  denotes the area of 
(when  is a discrete set,  denotes the number of its elements, or the so-called

cardinality). It is not difficult to see that, for a fixed value of , this operator is
increasing, anti-extensive, and idempotent and, therefore, is an opening. The area
opening is a morphological filter that filters-out grains of area less than a specified
value, whereas it passes grains of area larger than this value.
By duality, the binary area closing is defined as follows:

  
   
  
 
The area closing fills in the holes in a binary image, whose area is strictly smaller
than .

190 Morphological Methods for Biomedical Image Analysis


Table 4.4: Some properties of the binary structural closing.
F {h} = F
( F + h ) B = ( F B ) + h

Translation invariance

F ( B + h) = F B

4.2.5

F1 F2 F1 B F2 B

Increasingness with respect to shape

rF rB = r ( F B )

Homogeneity

F F B

Extensitivity

( F B ) B = F B
(
F B = ( F c B )c

Idempotence
Duality

Representational power of structural openings and closings

As we mentioned in the previous subsection, the union of openings is still an opening, whereas the intersection of closings is a closing. This simple observation is
very useful in practice, since it allows the design of openings and closings by taking the union or intersection of elementary openings or closings, respectively. This
observation, however, is more general. It can be shown (e.g., see [6]) that any translation invariant opening 
(i.e., a translation invariant operator that is increasing,
anti-extensive, and idempotent) can be written as a union of structural openings,
whereas any translation invariant closing  
(i.e., a translation invariant operator
that is increasing, extensive, and idempotent) can be written as an intersection of
structural closings, i.e.,


 

 

  

  
 

  


(4.14)

  

where  denotes the invariance domain of operator , given by 


     .
Recall that translation invariant openings and closings are increasing operators.
Therefore, they admit a decomposition in terms of a union of erosions or an intersection of dilations, as dictated by Eq. (4.11). However, the design of a translation
invariant opening or closing based on these types of decompositions is cumbersome
and less intuitive. On the other hand, the design of translation invariant openings
and closings by means of Eq. (4.14) is more intuitive and easier to implement. We
illustrate these remarks with a simple example.
Consider the problem of detecting slit-like cholesterol clefts of lipid material
in the aortic atheromatous plaque that lead to coronary atherosclerosis. A sample
of aortic atheromatous plaque is shown in Fig. 4.6(a). Notice that cholesterol

Binary morphological operators 191

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.5: An example of binary structural opening and closing: (a) original grayscale image of the histologic appearance of fibrocystic changes in breast; (b) original image after
binarization; (c) structural opening by a disk structuring element with a diameter of  pixels;
(d) structural opening by a disk structuring element with a diameter of

 pixels; (f) structural closing


 pixels. Data courtesy of K. C. Klatt,

tural closing by a disk structuring element with a diameter of


by a disk structuring element with a diameter of

 pixels; (e) struc-

Department of Pathology, University of Utah. Used with permission.

192 Morphological Methods for Biomedical Image Analysis


clefts are the randomly rotated white elongated structures on the left-hand side of
the image. The vertical bright streaks on the right-hand side belong to the artery
wall and are of no interest here. A union of structural openings, using linear
structuring elements (a horizontal, a left diagonal, and a right diagonal) of length
, is applied to the image. It is hoped that the structuring elements will fit inside
the majority of the clefts while eliminating noise due to binarization and unwanted
structures. The results are depicted in Figs. 4.6(c)(e). Observe that all unwanted
debris and external structures have been eliminated. The resulting opening does a
formidable job in isolating all but the vertically aligned cholesterol clefts. This
process could be augmented, by adding a vertical structuring element, to extract
more of the vertically oriented clefts that remain. However, this may erroneously
mark part of the coronary structure as clefts.
4.2.6

The hit-or-miss operator

Instead of only probing the inside, or the outside, of a given binary image with a
structuring element, it may be fruitful in certain applications to probe both the background and the foreground at the same time. The hit-or-miss operator formalizes
this idea. It is defined by

  

          

and       

where  and  are two structuring elements, such that 



. This says
that a point  lies in the hit-or-miss transformed set      if and only if the
translated structuring element    does not hit   (or, equivalently, it is contained
in  ) and the translated structuring element    misses  (or, equivalently, it is
contained in   ). It can shown that


      
     

Therefore, the hit-or-miss transformed set contains all points that simultaneously
belong to the erosion of the foreground  by the structuring element  and the
erosion of the background   by the structuring element  . Some properties of
this operator are listed in Table 4.5. It is required that 
  for the hit-or-miss
operator not to result in an empty set. The hit-or-miss operator is well suited to the
task of locating points inside an object with certain (local) geometric properties;
e.g., isolated points, edge points, corner points, etc. (e.g., see [6, 56]).
The hit-or-miss operator leads to an extraordinary result, due to Banon and Barrera [12]: any translation invariant set operator can be represented as a union of
hit-or-miss operators. This is clearly a more powerful result than the representation
by a union of erosions (or an intersection of dilations), which is limited to the case
of increasing and translation invariant operators.
To be formal, we first need two definitions. Given two sets  and  , the interval
 
is the collection of all sets  such that    . Given a set operator ,

Binary morphological operators 193

(a)

(b)

(c)

(d)

(e)
Figure 4.6: Filtering via a union of openings: (a) original grayscale image of aortic atheromatous plaque; (b) original image after binarization; (c) the result of applying a union of
openings, with  linear structuring elements (a horizontal, a left diagonal, and a right diagonal) of length , on the image in (b); (d) filter residue (the set difference between images (b)

and (c)); (e) the result in (c) overlaid on the original grayscale image in (a). Data courtesy
of E. C. Klatt, Department of Pathology, University of Utah. Used with permission.

194 Morphological Methods for Biomedical Image Analysis


Table 4.5: Some properties of the hit-or-miss operator.
( F + h ) ( A, B ) = [ F ( A, B )] + h

Translation Invariance

F c ( A, B ) = F ( B, A)

F ( A + h, B + h ) = [ F ( A, B )] - h
rF ( rA, rB ) = r[ F ( A, B )]

Homogeneity

A = F ( A, B ) = F 8 B

Decreasing operator

B = F ( A, B ) = F 8 A

Increasing operator

A I B F ( A, B ) =

the bi-kernel  of  is defined by

      
 
where  is the kernel of operator , defined by Eq. (4.12).
translation invariant set operator , we have that (e.g., see [6])
 

     

Then, for any

(4.15)

 
 

The design of binary morphological operators, based on the representation in


Eq. (4.15), has been investigated in [57]. However, this approach often needs a
large training set making its use in medical imaging applications difficult.
4.2.7

Morphological gradients

The set operator

 

  

(4.16)

where  is a structuring element that contains the origin, is known as the morphological gradient.    is a boundary peeler that estimates the boundary of an
object  . This operator is often used to obtain surface area estimates in 3D binary
images. The result depends on the size and shape of the structuring element  and
is less affected by boundary noise when compared to differential edge detectors.
By using variants of Eq. (4.16), it is possible to detect external or internal
boundaries. Indeed, the external morphological gradient operator

 

 

Binary morphological operators 195


calculates the external boundary of an object  , whereas the internal morphological gradient operator

 


 

(4.17)

calculates the internal boundary of  . Notice that

       




The morphological gradient, and its external and internal variants, are always
positive because dilations and erosions with structuring elements that contain the
origin are extensive and anti-extensive, respectively. An illustration of the morphological gradient operators is depicted in Fig. 4.7. The cross structuring element
            , centered at the origin  , is used
in this case. Notice that the internal gradient operator detects cellular boundaries
correctly, whereas the gradient and the external gradient operators may connect the
boundaries of some cells which are in close proximity to each other. For more
information on morphological gradients the reader is referred to [58, 59].
4.2.8

Conditional dilation

If an image is dilated by a structuring element that contains the origin, it is expanded. If this type of dilation is repeated indefinitely, the original image will grow
out of bound. One way to avoid this problem is to restrict the dilation    within
a mask element  . This defines a new type of (nontranslation invariant) dilation,
known as conditional dilation, given by

        
 
It will become clear in the following (see Section 4.6) that the conditional dilation
plays a key role in defining a new morphological operator, known as opening by reconstruction, which turns out to be very useful in object detection and segmentation
problems.
4.2.9

Annular openings and closings




If  is a symmetric structuring element (i.e., if 


the origin, then the operator

 ), which does not contain


     

defines an opening (i.e., it is increasing, anti-extensive, and idempotent). This is
known as the annular opening. The dual

 
       

196 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.7: Morphological gradients: (a) peripheral blood smear of a

-year-old anemic

female with a diagnosis of hereditary spherocytosis; (b) original image after binarization; (c)
gradient using the cross structuring element  ; (d) external gradient; (e) internal gradient;
(f) gradient overlaid on the original grayscale image in (a). Data courtesy of K. C. Klatt,
Department of Pathology, University of Utah. Used with permission.

Binary morphological operators 197


2D

2D

(a)
Figure 4.8:

   

(a) A grain

F \ (F B) I F

(F B) I F

(b)
such that




 ;

(b) the internal marker

obtained by means of an annular opening by the circular structuring el-

ement  .

of an annular opening is a closing (i.e., an operator that is increasing, extensive,


and idempotent), known as the annular closing.
The annular opening is a useful operator for marking grains in an image of a
particular geometry and size. We illustrate this with an example. Given a grain  ,
let  be a disk of radius  such that   , where  denotes
the interior of , obtained by removing its boundary; see Fig. 4.8(a). Figure
4.8(b) depicts the result of the annular opening    
 , where  is the
boundary of  (i.e., it is a circle of radius ). The set difference     

corresponds to a hole punched through  . We use the result of this set difference
as a marker for grain  . However, if   , then 
   and, therefore,
 
 
 , whereas, if   , then 
 
 and, therefore,
 
  . In these two cases,  will not be marked.
We now apply these observations to the problem of extracting normal blood
cells from microscopic images of a blood smear. The image in Fig. 4.9(a) depicts the peripheral blood smear of a -year-old female with a diagnosis of acute
promyelocytic leukemia. Many fragmented red blood cells are evident, due to disseminated intravascular coagulation. The normal red blood cells have a zone of
central pallor of about  of their size and demonstrate minimal variation in size
(anisocytosis) and shape (poikilocytosis). An annular opening by a circular structuring element of appropriate size is expected to consistently punch a hole (marker)
in the center of these cells. Red blood cells that are too large, too small due to
fragmentation, or that are deformed are not marked.
To achieve this, the image in Fig. 4.9(a) is first binarized by means of thresholding. However, the pallors of some of the red blood cells manifest themselves as
holes after thresholding. To remove these holes, we apply an area closing operation,
which eliminates holes with area smaller than  pixels. The result is depicted in

198 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.9: Annular opening: (a) peripheral blood smear of a

-year-old female with a

diagnosis of acute promyelocytic leukemia demonstrates many fragmented red blood cells
due to disseminated intravascular coagulation; (b) original image after binarization and area
closing; (c) annular opening by a circular structuring element; (d) normal red blood cell markers extracted from (c); (e) marked normal red blood cells overlaid on the original grayscale
image in (a) (dark shapes); (f) marked fragmented, overlapping, and suspicious red blood
cells overlaid on the original grayscale image in (a) (dark shapes). Data courtesy of K. C.
Klatt, Department of Pathology, University of Utah. Used with permission.

Binary morphological operators 199


Fig. 4.9(b). In order to avoid possible interference among cells in close proximity
to each other, the grains are shrunk by means of an erosion by a disk structuring element with a diameter of  pixels. The resulting image is then processed by
means of an annular opening with a circular structuring element with diameter of
 pixels. The result is depicted in Fig. 4.9(c). Notice the presence of black holes
within grains associated with normal blood cells. The holes that do not touch the
boundary are extracted from the image in (c); see Fig. 4.9(d). This is done by means
of an area closing, which fills-in the holes in image (c), from which the image in
(c) is subtracted. The resulting markers are then used to identify the normal red
blood cells, which are depicted in Fig. 4.9(e) as dark shapes overlaid on the original image in (a) (the reconstruction of the actual cell shape from the marker will be
treated in Section 4.6). The remaining cells are depicted in Fig. 4.9(f). Notice that
the cells in (f) are either fragmented, overlapping, or have irregular shape.
Additional information about annular openings and closings, and about a more
general class of operators known as annular filters, may be found in [60].
4.2.10

Morphological filters

A set operator  is said to be a binary morphological filter if  is increasing (i.e.,


      ) and idempotent (i.e.,  
 ). Binary
morphological filters are most frequently used to smooth object boundaries or to
remove components from an image based on certain geometric properties. These
operators are constrained to preserve order (i.e., they need to be increasing) and
are required to remove components in a single application (i.e., they need to be
idempotent). According to this definition, openings and closings are morphological
filters, since they are increasing and idempotent. However, erosions and dilations
are not morphological filters, since they are not idempotent.
We can build binary morphological filters from other binary morphological filters by composition (e.g., see [6, 61]). If  and  are two binary morphological
filters, such that   or   , then

          

are binary morphological filters as well. Based on these compositions, the operators
(recall that     , for every structuring element  )

     
where




  

  


  
           dilations    

(4.18)

(4.19)

are binary morphological filters. In the literature, these filters are known as alternating filters (AF), since they alternate between opening and closing. The operators

       

     

200 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

Figure 4.10: Morphological filtering of the binary image depicted in Fig. 4.3(b): (a) original
binary image; (b) filtering by the AF 
; (c) filtering by the morphological filter 
; (d) filtering

by the ASF 
.

are binary morphological filters as well. Finally, we can compose alternating filters
to form another class of binary morphological filters known as alternating sequential filters (ASF), given by
  

     


  

  

      

Usually, the ASFs are more preferable in practice than the AFs. This is primarily due to the fact that, for a given value of  , an ASF filters out shape components
by gradually increasing the size of the structuring element, from  to  , whereas
an AF filters out shape components by only applying the structuring element  .
Figure 4.10 depicts the results obtained by filtering the binary image of Fig. 4.3(b)
with the AF  , in (b), the morphological filter   , in (c), and the ASF  , in (d).
In all cases,  is taken to be the cross structuring element. For more information
on binary morphological filters, the reader is referred to [6, 54, 6163].

Morphological representation of binary images 201


4.3

Morphological representation of binary images

Image representation is a very important problem whose aim is to transform image


data into a compact form so that information becomes easily accessible. Image
representation can also lead to techniques for image description and compression.
Although a large number of image representation techniques are available in the
literature (e.g., techniques based on the Fourier transform or the wavelet transform [64]), mathematical morphology has produced some very useful techniques,
especially for binary images. In this section, we discuss two of these techniques,
namely the discrete size transform and the morphological skeleton. The reader is
referred to [6570] for more details on these subjects.
4.3.1

The discrete size transform

Given a binary image  , its discrete size transform (DST) is defined by




                             




where
    

  

         
        
    


with  defined in Eq. (4.19). Notice that

            


     . Therefore,     
    , for   !, and the DST


for 
is an orthogonal shape decomposition scheme, in the sense that it decomposes a
binary image into disjoint components. Moreover,






    




     

Therefore, the DST is an invertible image decomposition scheme as well (i.e., an


image  can be uniquely reconstructed from the DST components       
 or        ).
     , the opening
When  consists of (nonoverlapping) grains  , 
  , for   , can be viewed as a sieve of mesh width  that allows only
grains of size less than  to pass through. In this case, we may say that a grain
 is of size  if 
    , but 
    
. This leads
to the observation that      is a multiresolution image decomposition
scheme which reduces resolution as  increases. However, the term resolution is
not associated here to the frequency content of  , as is customary in linear multiresolution techniques (e.g., see [64]), but to its size content (see [66] for more

202 Morphological Methods for Biomedical Image Analysis


Table 4.6: A comparison of the Fourier spectrum and the pattern spectrum.
FOURIER SPECTRUM

PATTERN SPECTRUM

A smooth image is characterized by


large values in the Fourier spectrum
at low frequencies and small or zero
values at high frequencies.

A shape with many large smooth objects


and few or no small objects is
characterized by large values in the
higher part of the pattern spectrum and
small or zero values in the lower part of
the pattern spectrum.

An image with fast grayscale variation


is characterized by small or zero
values in the Fourier spectrum at low
frequencies and large values at high
frequencies.

A shape with many small rough objects


and no large smooth objects is
characterized by small or zero values in
the higher part of the pattern spectrum
and large values in the lower part of the
pattern spectrum.

The Fourier spectrum is a histogram


of the distribution of complex
sinusoids representing an image.

The pattern spectrum is a histogram of


the distribution of the sizes of various
objects representing an image.

The original image cannot be


recovered from only the Fourier
spectrum knowledge of the phase
spectrum is also needed.

The original image cannot be recovered


from its pattern spectrum.

details on this subject). Notice that the notion of size is directly related to the structuring element  used: a grain  of  is of size  if there exists at least one
translated replica of  that fits inside  , whereas there is no translated replica of

   that fits inside  . Similar remarks hold for      ,
as it pertains to the holes of  .
Based on these remarks, it is now clear that the DST is a multiresolution image
decomposition scheme in terms of successive differences of openings and closings
with structuring elements of increasing size. For   , the   component of the
DST contains only grains that are of size  . On the other hand, for   , the
  component of the DST contains only holes that are of size  .
4.3.2

The pattern spectrum

The DST can be thought of as the morphological analogue of the Fourier transform. Both transforms decompose an image into orthogonal components that are
sufficient for reconstructing the image under consideration. It is quite common, in
Fourier-based image processing and analysis techniques, to characterize images by
means of the magnitude of the Fourier transform, known as the Fourier spectrum,
while discarding phase information. A similar approach applies in the morphological case as well. An image is characterized by the magnitude of the DST, known

Morphological representation of binary images 203


0 16
0 14

PF ; B ( k )

0 12
01
0 08
0 06
0 04
0 02
0
20

15

10

10

15

20

Figure 4.11: The (normalized) pattern spectrum    of a binary image. Notice that the
large peak at size  indicates the prominent presence of particles of size .

as the pattern spectrum, given by


 
 

    

          
         


where  is the area (or cardinality) of set . Notice that the pattern spectrum depends on the particular structuring element  used (i.e., for a given image, different
pattern spectra can be obtained for different structuring elements).
As in the case of the Fourier spectrum, the information conveyed by the pattern
spectrum of an image  is not sufficient for reconstructing  . However, the pattern
spectrum conveys some useful information regarding the shape/size content of a
binary image. For example:
(a) The boundary roughness of  , relative to the structuring element  , appears
in the lower part of the pattern spectrum, for a rough boundary, and in the
higher part of the pattern spectrum, for a smooth boundary.
(b) Long capes or bulky protruding parts in  of pattern  show up as isolated
impulses or jumps at the positive sizes of the pattern spectrum.
(c) Big jumps at negative sizes illustrate the existence of prominent intruding
gulfs or holes in  .
All these properties are the morphological analogues of similar properties of
the Fourier spectrum. Table 4.6 summarizes similarities between the Fourier and
the pattern spectrum. Figure 4.11 depicts the (normalized) pattern spectrum of the
binary image in Fig. 4.9(b), with the structuring element being a disk of diameter
. Normalization produces a pattern spectrum  
 from  
, such that

. Notice that the large peak at size  indicates the prominent
  
 
presence of particles of size .

204 Morphological Methods for Biomedical Image Analysis

This point does not


belong to the
skeleton

h
h

Maximal
disk at h

This point belongs


to the skeleton

Skeleton

D( h )

D( h ) D F

Figure 4.12: Construction of the (Euclidean) skeleton for a triangular shape.

The pattern spectrum is frequently used as a shape/size descriptor. It has been


employed in a number of image processing and analysis tasks, such as shape and
texture analysis [5], multiscale shape representation [66], morphological shape filtering and restoration [7174], and analysis, segmentation, and classification of
texture [7482]. Its computation, however, may be time consuming. A number
of efficient algorithms have been proposed in the literature for dealing with this
problem (e.g., see [83, 84]).
4.3.3

The morphological skeleton transform

The (Euclidean) skeleton of a shape is a thin caricature, obtained by creating an


archetypal stick figure that internally locates the central axis of the shape. This
structure was first introduced by Blum [85], who called it the medial axis. The
skeleton is a binary image representation technique that summarizes a shape and
conveys useful information about its size, orientation, and connectivity [86].
To define the skeleton of a shape  , for each point    , let  denote
the largest disk centered at  such that   . Then, the point  is a point
on the skeleton of  if there does not exist a disk , such that     .
In this case,  is called the maximal disk located at point . This is illustrated
in Fig. 4.12. If, in addition to the skeleton, the radii of the maximal disks located
at all points  on the skeleton of a shape  are known, then  can be uniquely
reconstructed from this information as the union of all such maximal disks. Therefore, the skeleton, together with the radius information associated with the maximal
disks, contains enough information to uniquely reconstruct the original shape.
In 1977, Lantuejoul showed that if "   is the skeleton of a shape  (a set that
contains all points of the skeleton of  ), then:
"  



"   

where
"  

       

(4.20)

Morphological representation of binary images 205


with  being a disk of radius  (see [5]). In this case,  is a disk of radius  and
 is a disk of an infinitesimal radius . The set "    contains the centers of
all maximal disks of radius . Since shape  equals the union of all maximal disks,
we have that



"     

Equation (4.20) can be discretized by setting    and  , for some integer


 . In this case,        (  dilations). If we also replace disk
 with an arbitrary structuring element, then we get the so-called morphological
skeleton transform of  .
The morphological skeleton transform of a binary image  is given by:


 #    #         #         

where
#    

with  given by Eq. (4.19).


order  . The union

      
The set #     is called the skeleton subset of


#    




#    

is called the morphological skeleton of  . When  is a disk structuring element,


then #     is a discrete approximation of the skeleton of  . When  contains
the origin,


for 

             

     . In this case,
#    
#     

for 

!

and the morphological skeleton transform is an orthogonal shape decomposition


scheme. Furthermore,



#       

(4.21)

making the morphological skeleton transform invertible.


The morphological skeleton transform can be uniquely represented by a grayscale image $
  , known as the skeleton function, given by
$
  

    #    
     #    



206 Morphological Methods for Biomedical Image Analysis

(b)

(a)

Figure 4.13: Binary morphological skeleton using the cross structuring element
skeleton of the binary image
the complement

depicted in Fig. 4.7(b), superimposed on

, superimposed on

 :

(a)

; (b) skeleton of

Notice that
#    

    $
 

 

(4.22)

Therefore, image  can be uniquely reconstructed from the skeleton function


$
  , by means of Eq. (4.21) and Eq. (4.22). The skeleton function has been
used for image compression and coding [65]. Moreover, the morphological skeleton transform has been used for shape analysis and recognition. Efficient algorithms for computing skeleton transforms can be found in [83, 87].
Figure 4.13 depicts an example of the morphological skeleton, by using the
cross structuring element   . Figure 4.13(a) depicts the morphological skeleton
of the binary image  of Fig. 4.7(b), superimposed on  . Notice that individual
grains can be compressed to small and thin line segments emanating from the center of the grains. On the other hand, Fig. 4.13(b) depicts the morphological skeleton
of the complement image   , superimposed on  . In this case, the morphological
skeleton may be used to segment the image into nonoverlapping partitions, each
containing a grain. However, it is clear from the result depicted in Fig. 4.13(b) that
the morphological skeleton may not produce partitions with connected boundaries.
The morphological skeleton of the complement   of a shape  is sometimes referred to as the exoskeleton.
4.4

Grayscale morphological operators

In order to extend mathematical morphology to the grayscale case, we need to


combine grayscale images in a way that is compatible with unions and intersections. This is because a binary image is a special case of a grayscale image, when
the number of gray levels is two. The tool that allows us to accomplish this task is

Grayscale morphological operators 207


f

t1
t2
F ( t2 )

F ( t1 )

(a)

(b)

Figure 4.14: Threshold decomposition of a grayscale function

: (a) cross section for a high value

 of .

  into its cross sections


 for a low value

of ; (b) cross section

known as threshold decomposition. Using threshold decomposition, we can show


that the grayscale analogues of union and intersection are the operations of supremum and infimum, respectively. Using this observation, the extension of many
binary morphological operators to the grayscale case is most often straightforward.
4.4.1

Threshold decomposition

 , we define its cross section  % at level % by


 %          % 
 %  %    %   of all non empty cross sections is

Given a grayscale image

The collection 
known as the threshold decomposition of . An image
by its threshold decomposition, since

 

is uniquely characterized

%       % 

at every pixel   . Figure 4.14 depicts two cross sections of a 1D signal for a
high and a low value of the threshold %. Notice that, as % increases,  % decreases.
In particular, if %  % , then  %   % . This is known as the stacking
property.
An image has a unique threshold decomposition  . However, given a collec %  %    %   of non empty sets, there might not exist an
tion 
image with threshold decomposition  . This is because the cross sections of an
image should satisfy the stacking property.
Threshold decomposition provides a useful link between grayscale and binary
images. This link is used as a way to combine grayscale images that is compatible
with mathematical morphology, by means of the following three steps:

208 Morphological Methods for Biomedical Image Analysis


1. Calculate the threshold decompositions  ,  of images
tively.

and

,

respec-

2. Generate the threshold decomposition

  %

 %   %  %    %

  

3. Set

 

%       % 

It is not difficult to see that

 

       

Alternatively, we may combine two images


three steps:

and

by means of the following

1. Calculate the threshold decompositions  ,  of images


tively.

and

,

respec-

2. Generate the threshold decomposition

  %

 %
 %  %    %

  

3. Set

 

%       % 

In this case,

 

       

Evidently, the grayscale analogues of union and intersection are the pixelwise
supremum and infimum, respectively. It is also true that

 %  %

for every % 

Therefore, the grayscale analogue of set inclusion is for a grayscale image


to be
smaller than another image  (i.e.,   ), in the sense that        ,
at every pixel   .
Additional information about threshold decomposition can be found in [88].

Grayscale morphological operators 209


4.4.2

Increasing and translation invariant operators

As we said before, the most elementary operators of interest to mathematical morphology are increasing and translation invariant. A grayscale image operator 
is said to be increasing if
  implies that     . On the other
hand, if      denotes the spatially translated image       , then
a grayscale image operator  is spatially translation invariant if


       


for every space translation     . If   &    denotes the grayscale translated image    & , then a grayscale image operator  is grayscale translation
 

invariant if


 & 



   & 

Notice that, here, we are dealing with two types of translation invariance, spatial and grayscale, as opposed to the binary case where we only deal with spatial
translation invariance. In the grayscale case, when we speak about translation invariance, we mean both spatial and grayscale translation invariance.
When is real-valued, the negative image  of is defined by

 
  
takes finite values in       , the negative image


When
by

 

(4.23)


of

is defined

  

Notice that when  (i.e., in the case of binary images),  is the set complement. Finally, a grayscale image operator  is anti-extensive, if     , extensive, if     , and idempotent, if      .
4.4.3

Erosion and dilation

As in the binary case, where it is opted to use operators that distribute over unions
and intersections, it is desirable to deal with grayscale image operators that distribute over suprema and infima. Any image operator   such that    
      , for every pair     of grayscale images, is called a grayscale
      , for
erosion. Any image operator  such that    
every pair     of grayscale images, is called a grayscale dilation. As a direct
consequence of these definitions, both grayscale erosion and dilation are increasing
operators.

210 Morphological Methods for Biomedical Image Analysis


When the grayscale erosion is translation invariant, then
 

 

    




 






 




 

    
where  is a grayscale image known as the structuring function (e.g., see [6]). In
the special case when
  




for some structuring element  , then


 

 

for     

otherwise

 

(4.24)

    
 

This is called flat (grayscale) erosion and is denoted (with a slight abuse of notation)
by   . A structuring function of the form of Eq. (4.24) is usually referred to
as a flat structuring function. The flat erosion replaces the value of an image
at a pixel    by the infimum of the values of over a structuring element  .
Some properties of the grayscale erosion are summarized in Table 4.7. In this

    is the reflection of the structuring function  around
table,    
the origin. An example of a flat grayscale erosion is depicted in Fig. 4.15.
When the grayscale dilation is translation invariant, then
 

 






   


 




 
  

 

    
for some structuring function . In the special case when  is flat, then
 

 

 



 

 

This is called flat (grayscale) dilation and is denoted by   . The flat dilation
replaces the value of an image at a pixel    by the supremum of the values


of this function over a structuring element  . Some properties of the grayscale


dilation operator are summarized in Table 4.8. An example of a flat grayscale
dilation is depicted in Fig. 4.16. For a geometric interpretation of flat grayscale
erosions and dilations, the reader is referred to Subsection 4.4.7 below.

Grayscale morphological operators 211

(a)

(b)

(c)

(d)

Figure 4.15: Flat grayscale erosion: (a) original grayscale image of a lateral pulmonary
artery angiogram; (b)(d) flat erosions by a disk structuring element with a diameter of , ,
and  pixels, respectively. Data courtesy of the University of Washington Digital Anatomist

Program. Used with permission.

212 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

Figure 4.16: Flat grayscale dilation: (a) original grayscale image of a lateral pulmonary

artery angiogram; (b)(d) flat dilations by a disk structuring element with a diameter of , ,
and  pixels, respectively. Data courtesy of the University of Washington Digital Anatomist

Program. Used with permission.

Grayscale morphological operators 213


Table 4.7: Some properties of grayscale erosion.
( fx

, y 0 8 b )( x ,

y ) = ( f 8b- x

, - y0

)( x , y )

= ( f 8b)( x - x0 , y - y0 )

Spatial translation invariance

( f + v )8b = f 8( b - v ) = ( f 8b) + v

Grayscale translation invariance

f 8b1 f 8b2 if b1 b2

Decreasingness with respect to


structuring function

f 8( b1 b2 ) = ( f 8b1 ) ( f 8b2 )

Parallel composition

( f1 f 2 )8b = ( f1 8b) ( f 2 8b)

Distributivity of minimum

f 8( b1 b2 ) ( f 8b1 ) ( f 8b2 )

Parallel composition inequality

( f1 f 2 )8b ( f1 8b) ( f 2 8b)

Parallel composition inequality

( f 8b1 )8b2 = f 8( b1 b2 )

Serial composition

f1 f 2 f1 8b f 2 8b

Increasingness with respect to image

f 8b f if b( 0) 0
(
f 8b = ( f * b )*

Anti extensitivity
Duality

Table 4.8: Some properties of grayscale dilation.


f b = b f
( fx

, y0

Commutativity

b)( x , y ) = ( f bx

, y0

)( x , y )

= ( f b)( x - x0 , y - y0 )

Spatial translation invariance

( f + v ) b = f ( b + v ) = ( f b) + v

Grayscale translation invariance

f b1 f b2 if b1 b2

Increasingness with respect to


structuring element

f ( b1 b2 ) = ( f b1 ) ( f b2 )

Parallel composition

( f1 f 2 ) b = ( f1 b) ( f 2 b)

Distributivity of maximum

f ( b1 b2 ) ( f b1 ) ( f b2 )

Parallel composition inequality

( f b1 ) b2 = f ( b1 b2 )

Serial composition

f1 f 2 f1 b f 2 b

Increasingness with respect to image

f b f if b( 0) 0
(
f b = ( f * 8b )*

Extensitivity
Duality

214 Morphological Methods for Biomedical Image Analysis


4.4.4

Representational power of erosions and dilations

Matherons representation theorem can be extended to the grayscale case as well.


In this case, every translation invariant and increasing operator  can be expressed
as a supremum of erosions or infimum of dilations; i.e.,


 



 




where   is the kernel of operator  , defined by

and  

       
is the dual of operator  , given by      

4.4.5

Opening and closing

(e.g., see [6]).

As in the binary case, an image operator  that is increasing (       


  ) is called
   ), anti-extensive (    ), and idempotent (  
an opening. On the other hand, an image operator that is increasing, extensive
(    ), and idempotent is called a closing.
A popular (translation invariant) grayscale opening of an image by a struc     . We refer to this operator as the
turing function  is given by
(grayscale) structural opening, since it involves a structuring function . This is
a smoothing filter that approximates an image from below, since
  . The
amount and type of smoothing is determined by the shape and size of the selected
structuring function. The structural opening attempts to undo the effect of erosion
  by applying the associated dilation     . Some properties of the
grayscale structural opening are summarized in Table 4.9. When  is a flat structuring function (i.e., when  is given by Eq. (4.24)), then the structural opening is
referred to as a flat structural opening and is denoted by
 . An example of a flat
structural opening is depicted in Fig. 4.17. Notice that, as the size of the structuring
element increases, certain bright features are eliminated from the image.
Similarly, a popular (translation invariant) grayscale closing of an image with
a structuring function  is given by
     . We refer to this operator
as the (grayscale) structural closing. This is a smoothing filter that approximates
a shape from above, since
  . The amount and type of smoothing is determined by the shape and size of the structuring function used. The structural closing
attempts to undo the effect of dilation   by applying the associated erosion
    . Some properties of the structural grayscale closing are summarized
in Table 4.9. When  is a flat structuring function, then the structural closing is
referred to as a flat structural closing and is denoted by
 . An example of a
flat structural closing is depicted in Fig. 4.18. For a geometric interpretation of flat
grayscale openings and closings, refer to Subsection 4.4.7 below.

Grayscale morphological operators 215


Table 4.9: Some properties of grayscale structural opening and closing.
( fx
( fx

, y0

b)( x , y ) = ( f b)( x - x0 , y - y0 )
y ) = ( f b)( x - x0 , y - y0 )

b)( x ,
0 , y0

Spatial translation invariance

( f + v ) b = ( f b) + v
( f + v ) b = ( f b) + v

Grayscale translation invariance

f1 f 2 f1 b f 2 b
f1 f 2 f1 b f 2 b

Increasingness with respect to


image

f b f

Anti extensitivity

f f b

Extensitivity

( f b) b = f b
( f b) b = f b
(
f b = ( f * b )*

Idempotence
Duality

Another useful opening is the so-called grayscale area opening. This operator
removes grains from the cross sections of a grayscale image with area below a given
value. Mathematically, a grayscale area opening is expressed by
 
    

%       %   %    (4.25)


      are the grains of the cross section  % of image

where  %  
. It is not difficult to see that, for a fixed value of , this operator is increasing,
anti-extensive, and idempotent; therefore, it is an opening.
By duality, the grayscale area closing is defined as follows:
  
  

 


 
 

where  is defined in Eq. (4.23). This operator fills in the holes of the cross
sections  % of image , whose area is strictly smaller than . Efficient algorithms
for the implementation of grayscale area openings and closings can be found in
[89].
An example, illustrating the grayscale area opening operator, is depicted in
Fig. 4.19. The blood smear, depicted in the first row of Fig. 4.19(a), is to be binarized in order to obtain the regions occupied by individual cells. The second row
of Fig. 4.19(a) depicts the result of such a binarization. Due to the fact that blood
cells have a zone of central pallor, which produces the bright regions within each
cell, the result of binarization is not satisfactory: most cells produce a region filled
with a black hole. Area opening can be effectively used to ameliorate this problem.
This is evident from the results depicted in Figs. 4.19(b),(c). As the value of , in
Eq. (4.25), increases, the bright regions within each cell are effectively suppressed
and binarization produces a more acceptable result.

216 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

Figure 4.17: Flat structural opening: (a) original grayscale image of a lateral pulmonary
artery angiogram; (b)(d) structural openings by a disk structuring element with a diameter
of , , and

 pixels, respectively.

Data courtesy of the University of Washington Digital

Anatomist Program. Used with permission.

Grayscale morphological operators 217

(a)

(b)

(c)

(d)

Figure 4.18: Flat structural closing: (a) original grayscale image of a lateral pulmonary
artery angiogram; (b)(d) structural closing by a disk structuring element with a diameter
of , , and

 pixels, respectively. Data courtesy of the University of Washington Digital

Anatomist Program. Used with permission.

218 Morphological Methods for Biomedical Image Analysis


4.4.6

Representational power of structural openings and closings

For a collection  of openings, 


 is an opening as well; i.e., the supremum of openings isalso an opening. On the other hand, if  is a collection of
closings, then 
 is a closing as well; i.e., the infimum of closings is also
a closing. This simple observation is very useful in practice, since it allows the
design of grayscale openings and closings by taking the supremum, or infimum, of
elementary grayscale openings or closings, respectively.
It can be shown (e.g., see [6]) that any translation invariant grayscale opening 
(i.e., a translation invariant operator that is increasing, anti-extensive,
and idempotent) can be written as a supremum of grayscale structural openings,
whereas any translation invariant grayscale closing  
(i.e., a translation invariant
operator that is increasing, extensive, and idempotent) can be written as an infimum
of grayscale structural closings; i.e.,



 

  

 


  


  

where   denotes the invariance domain of the grayscale operator  , given by
     .
4.4.7

Flat image operators

Threshold decomposition is used as an effective tool for building grayscale image


operators from increasing binary image operators. Indeed, consider a grayscale
image . After applying threshold decomposition, we obtain the cross sections
  %  %    %  . Now, consider an increasing set operator . For
every %, apply this set operator on the cross section  % in order to obtain a cross
section %. Since  is increasing, we have that (recall the stacking property)

 %   %   %  %   %   % % 


Clearly, the collection   %  %   %   of the resulting cross sections
satisfies the stacking
property. Therefore,  defines a grayscale image  by means

of    
%      %. The mapping    thus obtained is
called a flat image operator, generated by . Clearly,
%



 

%       % 

Flat image operators are increasing. This follows directly from the way these
operators are constructed. Moreover, they enjoy properties directly induced from
the set operator applied on each cross section. For example, if  is an erosion,
then  is an erosion as well. The same is true when  is a dilation, opening,
or closing; then,  is a dilation, opening, or closing, respectively. Moreover, if
     , then  
  . Thus, a flat (grayscale) erosion with

Grayscale morphological operators 219

(a)

(b)

(c)

Figure 4.19: Grayscale area opening: (a) original grayscale image of a blood smear (first
row) and the result obtained after binarization by means of thresholding (second row); (b)
the result of area opening (first row), applied on the grayscale image in (a), obtained by
means of Eq. (4.25) with

, and the result obtained after binarization by means of

thresholding (second row); (c) the result of area opening (first row), applied on the grayscale
image in (a), obtained by means of Eq. (4.25) with

, and the result obtained after

binarization by means of thresholding (second row). Data courtesy of SDC Information


Systems. Used with permission.

structuring element  can be viewed as the process of applying a binary erosion on


the cross sections of with structuring element  and stacking up the results. This
gives a geometric interpretation of what erosion does to a grayscale image: the flat
erosion of a grayscale image by a structuring element  removes all grains in
the cross sections of that do not accommodate  , whereas it shrinks all other
grains by eroding them with  . This is illustrated in Fig. 4.20(a). A similar (dual)
remark applies for the case of flat (grayscale) dilation.
If  
  , then   
 . Thus, a flat (grayscale) opening with
structuring element  can be viewed as the process of applying a binary opening on
the cross sections of with structuring element  and stacking up the results. This
again, gives a geometric interpretation of what opening does to a grayscale image:
the flat opening of a grayscale image by a structuring element  removes all

220 Morphological Methods for Biomedical Image Analysis


f

f
f 8B

f B

B:
F (t )8 B

F (t ) B
F (t )

F (t )
t

(a)

(b)

Figure 4.20: An example of flat grayscale erosion, in (a), and of flat opening, in (b).

grains in the cross sections of that do not accommodate  , whereas it smoothes


all other grains by opening them with  . This is illustrated in Fig. 4.20(b). Therefore, the flat opening of an image with a structuring element  can be viewed as
an operator that removes the peaks and ridges from the topographic surface of .
A peak, or regional maximum,  of a grayscale image is a connected component
of pixels in with a given graylevel value & , such that every pixel in a (properly
defined) neighborhood of  has a value strictly smaller than & . A ridge, or crest
line, of a grayscale image is a curve on the topographic surface of , such that,
as we walk along this line, the points to the right and left of us are lower than the
ones we are on. A similar (dual) remark applies for the case of flat (grayscale) closing, which removes the hollows and ravines from the topographic surface of . A
hollow, or regional minimum,  of a grayscale image is a connected component
of pixels in with a given graylevel value & , such that every pixel in a (properly
defined) neighborhood of  has a value strictly larger than & . A ravine, or valley,
of a grayscale image is a curve on the topographic surface of , such that, as we
walk along this line, the points to the right and left of us are higher than the ones
we are on.
4.4.8

Morphological gradients

The grayscale operator


 

   

 


Grayscale morphological operators 221


where  is a structuring element that contains the origin, is known as the grayscale
morphological gradient. If  is a disk structuring element with unit radius, then

 



     '   

'

'

'

  


  

 


provided that is continuously differentiable. In this case, the morphological gradient with a disk structuring element of radius  converges to the magnitude of the
of , as  decreases to zero.
gradient
Similar gradient operators are the external and internal gradients, given by




   





 

and


 


respectively. Notice that


 





    


Figure 4.21 depicts an example of applying the grayscale morphological gradient, and its variants, on a magnetic resonance (MR) image of the human brain.
In this case, the square structuring element  
  ,   ,   ,
 ,  ,  ,  ,  ,  , centered at the origin  , has been
used. Notice that, in this example, the internal gradient provides the best separation
between anatomical structures.
4.4.9

Opening and closing top-hat

Since the opening  of an image with a flat structuring element  removes


peaks and ridges from the topographic surface of , the operator

 

produces such peaks and ridges. This is known as the opening top-hat operator.
Dually, the operator  
  
 produces the hollows and ravines of
the topographic surface of . This is known as the closing top-hat operator. Figure
4.22 illustrates the use of this operator to extract and visualize the sulci from a lateral view of the right brain hemisphere. The procedure is shown for a 2D grayscale
image. However, it can be used for a 3D representation of the brain as well. Correct extraction and visualization of sulci is important for identifying landmarks to
achieve functional segmentation of the brain and to guide the surgeon during an
operation.

222 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

Figure 4.21: Grayscale morphological gradient: (a) an MR image of the human brain;
(b) histogram equalized grayscale morphological gradient with the square structuring element

  ; (c) histogram equalized grayscale internal morphological gradient; (d) histogram

equalized grayscale external morphological gradient. Data courtesy of the University of


Washington Digital Anatomist Program. Used with permission.

Grayscale morphological operators 223

(a)

(b)

Figure 4.22: Morphological closing top-hat: (a) lateral view of the right brain hemisphere;
(b) closing top-hat by a disk structuring element with a diameter of  pixels. Data courtesy

of the University of Washington Digital Anatomist Program. Used with permission.

4.4.10

Conditional dilation

If a grayscale image is dilated with a structuring element that contains the origin,
its subgraph grows. If this type of dilation is successively repeated, the subgraph
of the original image grows without bound. One way to restrict this growth is to
restrict the dilation   of an image by a structuring element  within a mask
image !. This defines the so-called grayscale conditional dilation, given by
 

 !     ! 

It will become clear in the following (see Section 4.6) that the grayscale conditional dilation plays a key role in defining a new morphological operator, known
as grayscale opening by reconstruction, which turns out to be very useful in image
segmentation problems.
4.4.11

Morphological filters

A grayscale operator  is said to be a morphological filter, if  is increasing (i.e.,


       ) and idempotent (i.e.,    ). Grayscale
morphological filters are most frequently used to smooth the topographic surface
of an image. According to this definition, grayscale openings and closings are morphological filters, since they are increasing and idempotent. However, grayscale
erosions and dilations are not morphological filters, since they are not idempotent.
As in the binary case, we can build grayscale morphological filters from other
grayscale morphological filters by composition (e.g., see [6, 61]). If  and  are
two grayscale morphological filters such that    or    , then
  

  

   



  

224 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

Figure 4.23: Morphological filtering: (a) a bone marrow image

to be segmented into two

regions of interest (from [90]); (b) detected edges, obtained by means of thresholding the
image in (a) and applying the internal morphological gradient operator [Eq. (4.17)] on the
binary result; (c) the result of applying the AF

on

;

(d) detected edges, obtained by

means of thresholding the image in (c) and applying the internal morphological gradient
operator [Eq. (4.17)] on the binary result.

Grayscale discrete size transform 225


are grayscale morphological filters as well. Based on these compositions, the operators (recall that    , for every structuring element  )
 

    

( 

    

(4.26)

where  is given by Eq. (4.19), are grayscale morphological filters. These filters are known as grayscale alternating filters (AF), since they alternate between
opening and closing. The operators
" 

     

) 

    

are grayscale morphological filters as well. Finally, we can compose grayscale


alternating filters to form another class of morphological filters known as grayscale
alternating sequential filters (ASF), given by
* 

  

      

+ 

( ( 

(  

Morphological filters are used to reduce grayscale variation in images. This


may be desirable in order to reduce noise or simplify grayscale variation in an image for subsequent processing. An example is illustrated in Fig. 4.23. The bone
marrow image , depicted in Fig. 4.23(a), is to be segmented into two regions of
interest. Towards this goal, is first thresholded and the internal morphological
gradient operator in Eq. (4.17) is then applied on the binary result. Figure 4.23(b)
depicts the detected edges, overlaid on the original image . Clearly, the segmentation result is rather noisy due to high grayscale variation in . However, application
of the AF  on , given by Eq. (4.26), with  being a disk structuring element
with a diameter of pixels, produces the simplified version     of depicted
in Fig. 4.23(c). The internal morphological gradient operator [Eq. (4.17)], applied
on the binary image obtained by thresholding    , produces the result depicted
in Fig. 4.23(d). Clearly, the result is smoother than the one in Fig. 4.23(b) and
provides a reasonable segmentation of the original image .
For more information on grayscale morphological filters, the reader is referred
to [6, 54, 6163].
4.5

Grayscale discrete size transform

In this section, we discuss the grayscale analogues of the discrete size transform
and the associated pattern spectrum. The reader is referred to [66] for more details
on these subjects. Although the skeleton transform can be defined for grayscale
images as well, the resulting image representation is not very useful in practice.
Therefore, we will not be discussing this representation here.
Given a grayscale image , its discrete size transform (DST) is defined by

                         




226 Morphological Methods for Biomedical Image Analysis


where


 

   
  


    
    

for 

for 

with  defined in Eq. (4.19). We limit our presentation here to flat openings and
closings, although extension to more general structural openings and closings is
possible [66]. Notice that

     

     


Therefore, the components     of the grayscale DST are


for 
     .
always nonnegative, for every  . Moreover,







 




  

where  is defined in Eq. (4.23). Therefore, the grayscale DST is an invertible


image decomposition scheme.
The grayscale DST is a multiresolution image decomposition scheme, which

decomposes an image into residual images 
  and 

  , obtained by successive approximations of by means of structural openings and closings. Notice that     
 is the opening
is the closing top-hat
top-hat transform of , whereas    

transform. In general,    , for   , contains a layer scrapped from the subgraph of by means of the difference 
   , whereas    ,
by means of the
for   , contains a layer scrapped from the subgraph of
difference  
 . Since both opening and closing are smoothing


(lowpass) filters, the DST can be thought of as the output of a filterbank comprising a collection of bandpass filters. However, the term band is not associated here
to the frequency content of , as is customary in linear filterbank techniques (e.g.,
see [64]), but to the particular layer scrapped from the subgraph of .
The pattern spectrum of a grayscale image , in terms of a structuring element
 , is given by
 
 

    

      
     


where

 




   


for   
for 

Morphological image reconstruction 227


Clearly, the pattern spectrum is the magnitude of the discrete size transform.
The pattern spectrum turns out to be a powerful tool for summarizing shape and
texture content in images. It has been effectively used for grayscale morphological
filtering and restoration [74], as well as for the analysis, segmentation, and classification of texture [74, 75, 77, 78, 80, 81, 91, 92]. Efficient algorithms for computing
the pattern spectrum can be found in [84, 93, 94].
4.6

Morphological image reconstruction

In this section, we discuss a powerful tool for object extraction from binary and
grayscale images, known as morphological image reconstruction. This is an iterative tool that extracts regions of interest from an image marked by a set of markers.
We start by limiting our exposition to the binary case. We then extend our discussion to the grayscale case by means of threshold decomposition.
4.6.1

Reconstruction of binary images

Consider a shape  , like the one depicted in Fig. 4.24(a), comprising several
(nonoverlapping) grains. Suppose that we are interested in an operator that automatically extracts all grains ! of  that are marked by a marker   (i.e., a set
   ). In practice,   may mark important targets of interests (objects) that
need to be extracted from image  .
Let          be the collection of all grains of  and let ! denote the
portion of  that contains all grains marked by   . Under certain conditions, !
can be computed by means of elementary morphological operators. Indeed, let

    

    


be the conditional dilation (of size 1) of   by a structuring element  (that


contains the origin) restricted inside  . This conditional dilation expands the
marker   by means of the structuring element  , while making sure that this
expansion remains inside  (see Fig. 4.24(b)). Iterating this operator  times
yields the conditional dilation of size  , given by



 

times




  
     

It can be shown that if  is a structuring element that contains the origin, such that

   
 

for every  

,

(4.27)

then binary morphological image reconstruction can be achieved by taking the


s; i.e.,
union of all

!

-
    

     

(4.28)

228 Morphological Methods for Biomedical Image Analysis

Shape F

Marker F m

Reconstructed Shape F$

(a)
F$ = RB ( F m | F )

B
dB1 ( F m | F )

Shape F

Marker F m

Reconstructed Shape F$

(b)
Figure 4.24: (a) The problem of binary morphological image reconstruction. (b) Binary
morphological image reconstruction implemented by means of conditional dilations.

Morphological image reconstruction 229


The required condition in Eq. (4.27) simply says that the grains         
of  should be far enough from each other, in the sense that, for every , expanding  by  does not hit any other grain of  . In the discrete case, however,
 is typically chosen as the cross structuring element   , or the square structuring element  , and the condition in Eq. (4.27) is automatically satisfied. The
union in Eq. (4.28) always exists, since  
      , and therefore

the sequence 

  is increasing and bounded from above by  . The resulting operator -


is called the conditional reconstruction operator. Notice that
     can be sequentially implemented using

    

        

for 

     

(4.29)

     
  . See Fig. 4.24(b) for an illustration.
with

In many applications, we are interested in a morphological filter that filters out


grains that are of a certain size and shape. For example, consider the problem of
removing grains with size smaller than a predefined size .  , in the sense that all
.   are completely
grains that cannot accommodate a structuring element 
eliminated, whereas all other grains remain intact. One may think that a simple
opening   of the image  by the structuring element  will do the job. This
is only partially true. The opening   will clearly eliminate all grains of 
that cannot contain structuring element .   , but it will also smooth out the other
grains. However, recall that    . Therefore, the structural opening  
can be used as a marker in a binary morphological reconstruction operator that
will reconstruct all those smoothed out grains of  that remain after the structural
opening is applied. It can be shown that the resulting operator


  

-
    

is an opening (i.e., an operator that is increasing, anti-extensive, and idempotent).


This operator is called opening by reconstruction. The dual operator

 
   -
      

is a closing (i.e., an operator that is increasing, extensive, and idempotent) and is
called closing by reconstruction.
The binary conditional reconstruction operator can be used to define another
useful morphological operator, known as the binary close-hole operator. This operator is given by

 

   -
   
 

(4.30)

where the binary marker  is the boundary of the image window. As illustrated
in Fig. 4.25, the close-hole operator fills in all holes in a binary image that do not
touch the image window boundary.

230 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

Figure 4.25: Close-hole operator: (a) a binary image


and the image window boundary marker

; (b) the set complement

of

 ; (c) the result of conditional reconstruction

notice that only grains that touch the image window boundary are recon-

structed; (d) the set complement of the image in (c) all holes in

that do not touch the

image window boundary are filled in.

4.6.2

Reconstruction of grayscale images

Morphological image reconstruction can be extended to the grayscale case via


threshold decomposition. Consider a signal , like the one depicted in Fig. 4.26.
The topographic surface of this signal comprises of several peaks. Suppose that we
are interested in designing an operator that automatically removes a pre selected
number of peaks, while leaving the rest of the signal intact. This operation can be
very useful in practice: in an image profile, peaks are usually attributed to objects,
and peak removal is equivalent to object removal.
Peak removal can be automatically and effectively done by means of the grayscale morphological image reconstruction operator, to be discussed here. Let us
assume that, given an image , we can find another image  , known as the marker
image (or simply as the marker), which identifies the portion of the image profile
that needs to be preserved (see Fig. 4.26). It is required that   . Let us denote
 %  %    %   and by    %  %     %  
by 
the threshold decompositions of and  , respectively. Notice that since   ,

Morphological image reconstruction 231

F m (t)

d1B (f m | f )

fm

F (t)

d1B ( F m ( t )| F ( t ))

(a)

(b)
f

f$ = rB (f m | f )
m
f m F$ ( t ) = RB ( F ( t )| F ( t ))

(c)

Figure 4.26: Grayscale morphological image reconstruction: (a) a signal

; (b) the result of the conditional dilation

reconstructed signal 

     .

    of the marker 


and its marker

within  ; (c) the

232 Morphological Methods for Biomedical Image Analysis


we have that   %  %, for every %. Therefore,   % can be considered
to be a binary marker for  %. Each binary marker   % marks a number of
grains of the cross section  % of image at level %. If we apply the binary
morphological image reconstruction operator -
  %   % on   %, given
 %, then all grains of  % marked by   % will be perfectly reconstructed. If
we repeat this process for every % and stack up the results, we obtain a grayscale
operator 
   , which is known as a grayscale conditional reconstruction
operator. Since the marker image  marks the portion of the image profile that
we want preserved, this portion will be perfectly reconstructed from the marker.
Notice that

! 


 

  
%      -
 %   % 
Because -
     is an increasing operator, given  , 
    is a flat
image operator generated by -
!   . Under certain conditions, 
    can
be computed by means of elementary grayscale morphological operators. Indeed,
let

 

  

 

be the conditional grayscale dilation (of size 1) of  by a structuring element 


(containing the origin) that is restricted by . This conditional dilation expands
the subgraph of marker  by means of the structuring element  making sure that
this expansion is below . Iterating this operator  times yields the conditional
grayscale dilation of size  , given by

times

 




  
 

 

It can be shown that if  is a structuring element that contains the origin, with

 %   
 %  for every   , and every % 
where  % is the  grain of the cross section  % of , then (grayscale) morpho-

s;
logical image reconstruction can be achieved by taking the supremum of all

i.e.,


 

 

 

 

Notice that
 

   
with

struction process.

 

    for       

(4.31)

 , which provides a sequential implementation of the recon-

Morphological image reconstruction 233


Recall now that   . Therefore,  can be used as a marker in a
morphological reconstruction operator. The resulting operator

 




  

(4.32)

is an opening (i.e., an operator that is increasing, anti-extensive, and idempotent)


and is called opening by reconstruction. The dual operator
 
 

 
   

(4.33)

where  is defined in Eq. (4.23), is a closing (i.e., an operator that is increasing,


extensive, and idempotent) and is called closing by reconstruction. Notice that
the opening  will remove peaks and ridges from the topographic surface of
and, therefore, these features will not be marked by . In this case, the
opening by reconstruction operator 
   will remove peaks and ridges from
the topographic surface of while preserving the topography of the remaining
surface. A dual statement applies for the case of the closing by reconstruction
operator.
The grayscale conditional reconstruction operator can be used to define another
useful morphological operator, known as the grayscale close-hole operator. This
operator is given by
 

 

 
   

(4.34)

where the grayscale marker  takes value  at pixels on the boundary of the
image window, and , otherwise. Clearly, this operator fills in all holes in the
cross sections of image that do not touch the image window boundary.
Morphological image reconstruction is a time consuming process. Sequential
implementation, by means of Eq. (4.29) or Eq. (4.31), is inefficient. However, a
number of alternative algorithms have been proposed in the literature that result in
faster implementation. For more information on this subject, the reader is referred
to [95].
4.6.3

Examples

Morphological image reconstruction is a powerful tool for extracting objects of interest from a given image. In the following, we illustrate this by means of two
examples: detection of the lateral ventricle in an MR image of the brain and extraction of filarial worms in a microscopic image of blood stream. Additional examples
may be found in [95].
Feature detection in MR imaging.
This example illustrates the use of grayscale morphological image reconstruction
for detecting the lateral ventricle in an MR image of the brain, depicted in Fig.

234 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.27: Detection of the lateral ventricle in an MR image of the brain: (a) original image
(from [96]); (b) grayscale structural opening of the image in (a) by a disk structuring element
with a diameter of

 pixels; (c) the result of grayscale morphological image reconstruction

of that part of the image in (a) marked by the image in (b); (d) the differences between the
images in (a) and (c); (e) the result of thresholding the image in (d); (f) the boundary of the
result of applying a binary area opening on the image in (e), overlaid on the original data in
(a).

Morphological image reconstruction 235


4.27(a). A grayscale structural opening, by a disk structuring element with a diameter of  pixels, is applied on this image in order to remove the feature to be
detected (i.e., the lateral ventricle). The result is depicted in Fig. 4.27(b). The
grayscale morphological image reconstruction of that part of the original image
marked by the image in (b) is depicted in Fig. 4.27(c). By subtracting this result
from the original image in (a), we obtain the image depicted in Fig. 4.27(d). Notice the prominent presence of the lateral ventricle in (d), as well as the amount
of image simplification obtained by means of the previous steps. Thresholding the
image in (d) produces the binary image depicted in Fig. 4.27(e). Clearly, thresholding produces a binary image comprising of many grains, the most prominent
one (in terms of area) being the grain associated with the lateral ventricle. An
area opening, applied on the image in (e), eliminates all grains with area smaller
than  pixels, while preserving the desirable feature (i.e., the lateral ventricle).
The boundary of the result of this area opening, overlaid on the original image, is
depicted in Fig. 4.27(f). The detected lateral ventricle boundary is shown in black.
We should point out here that the image depicted in Fig. 4.27(c) is obtained by
applying the opening by reconstruction operator [Eq. (4.32)] on the image depicted
in (a). Moreover, the image depicted in Fig. 4.27(d) is obtained by applying the
operator

  




  

on the image in (a). This is known as the opening-by-reconstruction top-hat operator.


Extraction of filarial worms.
In this example, the aim is to extract filarial worms from a microscopic image of
bloodstream by progressively eliminating all other objects present in the image.
These tiny worms are spread by mosquitos and circulate in the bloodstream. An
algorithm can be constructed, based on morphological image reconstruction, that
is very effective in accomplishing this task. The steps of this algorithm are illustrated in Fig. 4.28. Figure 4.28(a) depicts a microscopic image of filarial worms in
the bloodstream, indicated by the two white arrows. Although these objects could
be extracted by simple thresholding, this turns out to be very difficult, due to artifacts present in the image. We can use the observation that filarial worms show
up as dark, thin, and long objects in order to design a series of steps that eventually filter out all objects from the image, except those ones that fit this description.
We will be using a combination of morphological reconstruction operators and the
morphological skeleton transform in order to accomplish this task. Towards this
goal, we subtract the image depicted in Fig. 4.28(a) from the image obtained by
applying the closing by reconstruction operator [Eq. (4.33)], with  being an pixel-wide square structuring element, on the image depicted in (a). The result is
depicted in Fig. 4.28(b). This step regularizes the image background. We then apply a grayscale structural opening, by the cross structuring element   , in order to

236 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Figure 4.28: Extraction of filarial worms: (a) a microscopic image of filarial worms in the
bloodstream, indicated by the two white arrows; (b) the difference between the original image in (a) and a grayscale closing by reconstruction operator applied on (a); (c) the grayscale
structural opening of the image in (b) by the cross structuring element  ; (d) the grayscale
area opening of the image in (c); (e) the result of thresholding the image in (d); (f) the morphological skeleton of the image in (e); (g) the result of a binary area opening applied on
the image in (f); (h) binary morphological reconstruction of the filarial worms from the image
in (e), using the image in (g) as a marker; (i) the reconstruction result in (h), overlaid on the
original image in (a). Images courtesy of SDC Information Systems. Used with permission.

Morphological image segmentation 237


remove narrow objects from the image in (b). The result is depicted in Fig. 4.28(c).
The grayscale area opening operator [Eq. (4.25)] is then applied on the image in (c)
in order to remove small objects with area smaller than  pixels. This produces
the image depicted in Fig. 4.28(d). Now, thresholding the image in (d) produces
the result depicted in Fig. 4.28(e), which contains five grains. Notice that two of
the five grains are the filarial worms we are interested in. The length of the other
three grains is clearly smaller than the length of the filarial worms. We can use this
information to filter out the three unwanted grains. First, we calculate the morphological skeleton of the image in (e). The result is depicted in Fig. 4.28(f). Then,
a binary area opening removes skeleton branches with area smaller than  pixels, producing the image depicted in Fig. 4.28(g). The resulting image is used as
a marker, in a binary morphological reconstruction step, that reconstructs that part
of the image in (e) marked by this marker. The result of this step is depicted in
Fig. 4.28(h). Clearly, the reconstructed grains are the two filarial worms. Finally,
Fig. 4.28(i) depicts the detected filarial worms, overlaid on the original image in
(a).
We should point out here that the image depicted in Fig. 4.28(b) is obtained by
applying the operator
 
  

 
   

on the image depicted in (a). This is known as the closing-by-reconstruction tophat operator.
4.7

Morphological image segmentation

An important application of mathematical morphology is in image segmentation,


which deals with the problem of decomposing an image into different areas of interest (see also Chapters 2 and 3 in this volume). In this section, we discuss the
problem of segmenting binary and grayscale images by means of mathematical
morphology. The main tools associated with this problem are the skeleton by influence zones and the watershed transform.
4.7.1

The distance transform

The distance transform is a basic tool for the construction of morphological segmentation operators. It will soon become apparent that the distance transform is
intrinsically related to many morphological set operators, like translation invariant
erosions, dilations, and skeletons, to mention a few.
Let us consider images defined over the two-dimensional Euclidean space  .
A function ! ! from  "  into the set of nonnegative real numbers  is
called a distance function, if the following three properties are satisfied:
1.  & 

2.  & 

& , for every  &

if and only if 

& ,  &

.

.

238 Morphological Methods for Biomedical Image Analysis


3.  /   &   & /, for every  & /

.

One calls  &  the distance between points  and & . Examples of common distance functions between two points      and & &  &  in  are
 & 



   &,
cityblock distance
  & 
 &    &, Euclidean distance
 & 
" &   &, chessboard distance.
Given a distance function , the distance transform    of a binary image
  at a point    is defined by


&

 

 

 &  

By convention, we set  


, for every    . Notice that the distance
transform is a grayscale function that depends on the particular choice of the distance function . Its value is zero at all pixels    and increases as we move
away from  .
Figure 4.29(a) depicts a microscopic image of red blood cells from a patient
with hereditary spherocytosis. By binarizing the image in (a), we obtain the image  depicted in Fig. 4.29(b). Figure 4.29(c) depicts the distance transform
 , plotted as a 2D function, whereas Fig. 4.29(d) depicts the distance transform  . In both cases, the Euclidean distance is used. The nonzero values
of   give the minimum distance of a pixel in   from the red blood cells,
whereas the nonzero values of    give the minimum distance of a pixel within
a red blood cell from its boundary. Notice that    is zero within the red blood
cells, whereas   is zero outside the cells.
Given an image  , the collection of all points with equal distance to the foreground is obtained by extracting the level lines of the corresponding distance transform  , whereas the collection of all points with equal distance to the background is obtained by extracting the level lines of the distance transform   .
Examples of such level lines are depicted in Figs. 4.29(e) and 4.29(f), respectively.
Notice that the level lines in these figures coincide with the boundaries of successive translation invariant dilations and erosions of image  by a disk structuring
element  of increasing radius. Indeed, if  is a closed and bounded subset of  ,
then the dilated set    , where

    
(4.35)
is a disk structuring element with radius  centered at the origin, is given by
           
(4.36)


 

Morphological image segmentation 239

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.29: The distance transform: (a) a grayscale microscopic image of red blood cells
from a patient with hereditary spherocytosis; (b) binarized version

of the image in (a); (c)

  ; (d) the distance transform   ; (e) the level lines of the


distance function   ; (f) the level lines of the distance function   . Data courtesy

the distance transform

of K. C. Klatt, Department of Pathology, University of Utah. Used with permission.

240 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

Figure 4.30: Disk structuring elements with increasing radius, for three choices of the
distance function: (a) cityblock distance; (b) Euclidean distance; (c) chessboard distance.

This explains the fact that the boundaries of successive erosions with a disk structuring element coincide with the level lines in Fig. 4.29(e). Notice that Eq. (4.35)
and Eq. (4.36) depend on the particular choice for the distance function , which
leads to a particular choice for the disk structuring element  . Figure 4.30 depicts disk structuring elements  , with increasing value of , for three different
choices of the distance function.
By using the duality between erosions and dilations, the level lines of the distance transform   can be associated with the boundaries of erosions with a
disk structuring element. If  is a closed and bounded subset of  , we have that


   

   0  

(4.37)

The expressions in Eq. (4.35)Eq. (4.37) establish a clear relationship between


the distance transform and translation invariant erosions and dilations with disk
structuring elements. Different geometries for the structuring elements are achieved
by changing the choice of the associated distance function. Nonetheless, the importance of the distance transform lies on the fact that it aggregates distance information from a continuum of erosions and dilations in a single grayscale function.
As we will see in the following, this information can be used for segmentation and
for marking areas of interest in a given image.
4.7.2

Skeleton by influence zones (SKIZ)

Let us consider a binary image  , made up of (nonoverlapping) grains  , 


     . In this framework, the problem of image segmentation is to partition
image  into disjoint segments such that each partition contains only one grain and
the union of all partitions produces the entire image. This can easily be done by
means of the so-called skeleton by influence zones, or simply SKIZ.
The influence zone 1   of a grain  of  is defined as the collection of all

Morphological image segmentation 241


points in

that are closer to  than to any other grain of  . Evidently,

1  

 

   2  


for every  

, 

The set complement of the union of the influence zones of all grains of  form the
SKIZ of  ; i.e.,
#341  

1  



Clearly, the SKIZ is the collection of all points in  that do not belong to any
influence zone. Moreover, because the influence zone 1   is an open subset of
 , the SKIZ is a closed subset of  .
Interestingly, it can be shown that the SKIZ of a shape  follows the ridges, or
crest lines, of the distance transform   . Roughly speaking, a ridge, or crest
line, of a grayscale image is a curve on the topographic surface of , such that, as
we walk along this line, the points to the right and to the left are lower than the ones
we are on. A more precise mathematical characterization, based on elementary
concepts such as gradient, directional derivative, and vector inner product, can be
found in [97]. A detailed analysis of the crest points (i.e., the points forming a crest
line) in a discrete setting can be found in [98], with emphasis on the identification
of the crest points of the distance function.
An example of binary image segmentation based on the SKIZ is depicted in
Fig. 4.31. The distance transform   of the binary image  in Fig. 4.29(b)
is depicted in Fig. 4.31(a), plotted as a grayscale function and overlaid on image
 . The crest lines, extracted from   , produce the SKIZ, which is depicted in
Fig. 4.31(b), overlaid on  (refer to [99] for methods on how to extract crest lines
and ravines in grayscale images). Clearly, the SKIZ forms a continuous net that
effectively segments the image into disjoint partitions, each partition containing at
most one red blood cell.
4.7.3

Watershed-based segmentation of nonoverlapping particles

An alternative way to obtain the SKIZ of a collection of (nonoverlapping) grains is


to use the binary watershed transform. This procedure is illustrated in Fig. 4.32.
Given a binary image  , like the one depicted in Fig. 4.29(b), the distance transform   is calculated. The result is depicted in Fig. 4.32(a). If droplets of
water fall on the topographic surface of   , they will follow the steepest slope
towards a hollow, or regional minimum. Recall that a hollow, or regional minimum,  of a grayscale image is a connected component of pixels in with
a given grayscale value & , such that every pixel in the neighborhood of  has a
value strictly larger than & . In this case, water will accumulate in the so-called
catchment basins of  ; see Fig. 4.32(a). The catchment basin 5   of an
image , associated with a regional minimum  , is the collection of all points 6

242 Morphological Methods for Biomedical Image Analysis

(a)

(b)

Figure 4.31: Binary segmentation using the SKIZ: (a) the distance transform   of the
image

depicted in Fig. 4.29(b), overlaid on

crest lines of   , overlaid on

; (b) the SKIZ obtained by extracting the

of the topographic surface of such that a drop of water falling at 6 slides along
the surface until it reaches  . A small hole is now punched at the bottom of each
catchment basin of  , and the subgraph of   is slowly immersed into
water. While the subgraph is flooded, by water passing through the holes, the water level raises uniformly over the subgraph; see Fig. 4.32(b). At some moment,
water filling a catchment basin starts merging with water coming from an adjacent
catchment basin. At this particular moment, a dam is erected to prevent this from
happening. When no new dams need to be constructed, the procedure is stopped.
The collection of all erected dams are depicted in Fig. 4.32(c). When the subgraph
of   is totally immersed into water, the only visible structures at the water surface will be the top of the dams. These form the so-called watershed lines, which
are depicted in Fig. 4.32(d). It turns out that the crest lines of the distance function,
or equivalently the SKIZ, coincide with these watershed lines.
4.7.4

Geodesic SKIZ

Although the SKIZ is an effective tool for segmenting binary images of nonoverlapping particles, what happens when particles overlap? We answer this question
in this subsection. However, we first need to modify the notion of distance between
two points  and & in  .
Consider two points  & inside a binary image   . A path in  between 
and & is any curve joining these two points that is always in  . If the points  & can
be connected by a path in  , then there exists a path joining  and & whose length
is not greater than the length of any other path with the same endpoints. This path
is called a geodesic path. See Fig. 4.33 for an example. Given two points  &   ,
the length of the geodesic path connecting these two points is called the geodesic
distance between  and & and is denoted by   & . If there exists no geodesic

Morphological image segmentation 243

flooding
water

catchment
basin

(a)

(b)

(c)

(d)

dam

Figure 4.32: The binary watershed transform: (a) the distance transform
image

 

of the

depicted in Fig. 4.29(b); (b) the subgraph of   is immersed into water water

fills in from holes punched through the bottom of catchment basins; (c) the dams constructed
to avoid water from merging from two adjacent catchment basins; (d) the top of the dams in
(c) defines the watershed lines, which are overlaid on

244 Morphological Methods for Biomedical Image Analysis


F

u
v
geodesic path
a path that
is not a geodesic

Figure 4.33: Geodesic path between two points  and  in a binary image

path between  and & , we set   &  .


Let us now assume that we need to separate two overlapping grains  and  ,
like the ones depicted in Fig. 4.34(a). Let us also assume that we can mark the two
grains with two markers  and  , respectively. Based on the geodesic distance
  &  between two points  &  
   , we define the geodesic influence
zone 1   of a marker  in  as the collection of all points in  which are
closer, in terms of the geodesic distance, to  than to any other marker of  .
Clearly,
1  

      2  


for every 7 

 

where   denotes the geodesic distance transform in  of marker  at


a point    , given by
 

 

  &  

Subtracting from  the union of the influence zones of all markers in  , we obtain
the geodesic SKIZ, given by
#341  

1   

It is not difficult to see that the geodesic SKIZ can be computed by extracting the
ridges, or crest lines, of the geodesic distance transform    . Clearly, the
geodesic SKIZ can be used to segment partially overlapping particles, as illustrated
in Fig. 4.34(b). However, the segmentation result depends strongly on the choice
for the markers.
To illustrate a method for choosing appropriate markers, let us consider the
simple case of two partially overlapping disks  and  of the same radius, like

Morphological image segmentation 245


F = F1 U F2

F = F1 U F2

M1

M1

M2

F1

M2

F1

F2
Z F ( M1 )

F2
geodesic SKIZ

Z F (M 2 )

(b)

(a)
Figure 4.34: (a) Two overlapping grains
have been marked by two markers 

and  that need to be separated these grains


and  ; (b) segmentation obtained by means of the

geodesic SKIZ.

the ones depicted in Fig. 4.35(a). We can mark these two disks by their centers.
Then, the geodesic SKIZ, based on these markers, provides the desirable segmentation. The centers of the two disks can be easily obtained by means of the distance
transform: we first calculate the distance transform   , where 
   ;
the desirable disk centers can then be computed as the two points in  where  
assumes its maximum value. This simple example suggests that appropriate markers for segmenting a binary image  of overlapping particles based on the geodesic
SKIZ can be obtained by means of determining the peaks, or regional maxima, of
the distance transform  . Recall that a peak, or regional maximum,  of a
grayscale image is a connected component of pixels in with a given grayscale
value & , such that every pixel in the neighborhood of  has a value strictly smaller
than & . However, keep in mind that this approach may produce misleading results.
This is clear from the example depicted in Fig. 4.35(b).
4.7.5

Watershed-based segmentation of overlapping particles

An alternative technique for segmenting overlapping particles, which avoids use


of the geodesic SKIZ, is based on the watershed transform. Given a binary image
 of overlapping particles, like the one depicted in Fig. 4.36(a), we calculate the
negative distance transform   ; see Fig. 4.36(b). Notice that the regional
minima of this grayscale function appear at points in  that are the farthest away
from   . The catchment basins of   clearly mark the different regions into
which  is to be segmented.
Now, we pierce the topographic surface of    at the location of its regional minima, and we immerse it slowly into water. Water starts flooding the
catchment basins. To prevent merging of water coming from two adjacent catch-

246 Morphological Methods for Biomedical Image Analysis


F = F1 U F2

F = F1 U F2
F2

F1

!
M2

!
M1

geodesic SKIZ

(a)

F1

F2

!
M1

!
M2

geodesic SKIZ

desirable
segmentation

(b)

Figure 4.35: (a) Two overlapping disks with the same radius, marked by their centers, and
the segmentation obtained by means of the geodesic SKIZ. (b) The geodesic SKIZ produces
the wrong segmentation result when the radius of one of the two disks is reduced.

ment basins, we erect dams. Once the surface is totally immersed into water, the
top of the erected dams provides the watershed lines, which, in turn, segment the
image into the desirable regions; see Fig. 4.36(c). Notice that the watershed transform is not applied on the distance transform   , as it was done in the case
of nonoverlapping particles, but on the negative distance transform   . The
watershed lines thus produced do not necessarily coincide with the geodesic SKIZ.
4.7.6

Grayscale segmentation

The watershed transform can be applied to the problem of grayscale segmentation


as well, in order to partition a grayscale image into regions of interest. Usually,
these regions correspond to areas of relatively homogenous grayscale variation. In
this case, image segmentation is achieved by means of a watershed-based technique
similar to the one used in the binary case. For example, a function  may be
constructed from , with crest lines that coincide with the contours of the regions
of interest and catchment basins that mark these regions; see Fig. 4.37. Then, the
crest lines of  are extracted by means of the watershed transform applied on
 . By construction, the watershed lines will coincide with the desirable region
contours.
Despite the simplicity of the previous approach, construction of the appropriate
function  is a rather challenging and delicate matter. To say that a region of
interest in an image has a relatively homogeneous grayscale variation, we mean
that there is only slight variation in grayscale values inside the region, or that it
has a low gradient. On the other hand, high gradient values often indicate region
contours. Therefore,  can be taken to be the gradient of (or more precisely
its magnitude); see Fig. 4.37. The crest lines of  , extracted by means of the

Morphological image segmentation 247

regional
minima

(b)

(a)

(c)
Figure 4.36: Binary watershed segmentation: (a) a binary image
(b) the negative distance transform
(c) the watershed lines, overlaid on

of overlapping particles;

  notice the location of the regional minima;


.

248 Morphological Methods for Biomedical Image Analysis

f
region

crest lines

f o = gradient of f

catchment basin
Figure 4.37: Mapping regions and region contours in
lines of the gradient of  .

to the catchment basins and crest

Morphological image segmentation 249

(a)

(b)

(c)
Figure 4.38: Oversegmentation as a result of watershed-based segmentation using the

morphological gradient: (a) original image  ; (b) the result of the morphological gradient

applied on  ; (c) the resulting watershed lines, overlaid on  .

watershed transform, will then provide the desirable segmentation result.


Usually, watershed-based segmentation, using the gradient of the original image, leads to an oversegmentation problem. This is primarily due to the sensitivity
of the gradient operator to grayscale variation and noise, creating a large number
of irrelevant regional minima (catchment basins). This is illustrated in Fig. 4.38.
Notice that the relevant contours are buried inside a dense net of irrelevant ones.
One way to ameliorate this problem is to simplify the original image before the
gradient is calculated. This may reduce the number of regional minima while preserving the most relevant ones. This is illustrated in Fig. 4.39. The original image
, depicted in Fig. 4.39(a), is pre-filtered by means of the opening by reconstruction operator in Eq. (4.32) followed by the closing by reconstruction operator in
Eq. (4.33). The structuring element  in Eq. (4.32) and Eq. (4.33) is taken to be a
disk with a diameter of  pixels. The result is depicted in Fig. 4.39(b). This ef-

250 Morphological Methods for Biomedical Image Analysis


fectively flattens the image . The morphological gradient is now applied on the
image in Fig. 4.39(b) and produces the result depicted in Fig. 4.39(c). The result is
clearly crisper than the one depicted in Fig. 4.38(b). Subsequent watershed-based
segmentation produces the result depicted in Fig. 4.39(d), which contains far less
oversegmentation artifacts. However, the result is still not satisfactory.
Better segmentation may be obtained by constraining the regions where the
watershed lines might occur. This idea can be implemented by means of an algorithm that employs internal and external binary markers, which are used to constrain the implementation of the watershed transform, when applied on the gradient
image  . This approach is illustrated in Fig. 4.40. The first step of this algorithm
 , which marks the regions of interest
is to generate an internal binary marker   
in a grayscale image , as the one depicted in Fig. 4.38(a). Towards this goal, we
may extract the regional minima of . We choose to extract the regional minima,
since the regions of interest appear as dark cells in . However, due to noise in
, this step produces a large number of regional minima. If all these minima are
used in subsequent steps, they will result in oversegmentation. Therefore, before
we extract regional minima, we first simplify by means of the opening by reconstruction operator in Eq. (4.32) followed by the closing by reconstruction operator
in Eq. (4.33). The structuring element  in Eq. (4.32) and Eq. (4.33) is taken to
be a disk with a diameter of  pixels. The result is depicted in Fig. 4.40(a). The
 is now taken to be the regional minima of the grayscale image
internal marker  
in (a). This marker is depicted in Fig. 4.40(b).
 , which marks the
The next step is to generate an external binary marker  
highest crest lines of the original image that separate the regional minima in
(b). Such a marker can be obtained by piercing the topographic surface of at the
points of the regional minima depicted in Fig. 4.40(b), by immersing the subgraph
of into water, and by detecting the resulting watershed lines. The result of this
process is depicted in Fig. 4.40(c). Figure 4.40(d) depicts the combined binary
    , whereas Fig. 4.40(e) depicts   overlaid on the origmarker  
 

inal image. Notice that the cell boundaries are constrained between the external
and internal markers. These boundaries can now be detected as the highest crest
lines of the morphological gradient of the original image, depicted in Fig. 4.38(b),
constrained by the internal and external markers. To detect these lines, we pierce
the topographic surface of the morphological gradient   
 
at all

points of the combined marker  in Fig. 4.40(d), immerse the subgraph into water, and detect the resulting watershed lines. The result of this process is depicted in
Fig. 4.40(f). Clearly, the overall algorithm produces an excellent segmentation result, especially when compared to the results depicted in Figs. 4.38(c) and 4.39(d).
The watershed transform can be classified as a region-based segmentation approach. Its success relies heavily on the appropriate choice for the grayscale image
 and on the choice for the binary markers used for piercing the topographic surface of  . Clearly, these choices depend on the particular segmentation problem

Morphological image segmentation 251

(a)

(b)

(c)

(d)

Figure 4.39: The effect of pre-filtering on watershed-based segmentation, using the morphological gradient: (a) original image

;

(b) the result of pre-filtering

by means of an

opening by reconstruction followed by a closing by reconstruction; (c) the result of the morphological gradient applied on the image in (b); (d) the resulting watershed lines, overlaid
on  .

252 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4.40: Watershed-based segmentation constrained by internal and external markers:


(a) the result of pre-filtering the image  , depicted in Fig. 4.38(a), by means of an opening

by reconstruction followed by a closing by reconstruction; (b) the internal marker containing


the regional minima of the image in (a); (c) the external marker containing the watershed
lines obtained by piercing the topographic surface of

at the points of the regional minima

in (b) and by immersing the subgraph of  into water; (d) the combined internal and external
markers; (e) the combined internal and external markers overlaid on  ; (f) the watershed

lines obtained by piercing the topographic surface of the morphological gradient of  , de-

picted in Fig. 4.38(b), at the points of the combined marker in (d), and by immersing the
subgraph of the morphological gradient into water.

Morphological image segmentation 253


at hand. Another important issue is the fact that the watershed transform is one of
the most time consuming operations of mathematical morphology. In an attempt to
speed-up computations, a number of algorithms (serial and parallel) have been proposed in the literature that deal with specific implementation issues. The interested
reader is referred to [100] for a survey. For more information on watershed-based
segmentation techniques, the reader is referred to [80, 101108].
4.7.7

Examples

The watershed transform is a powerful tool for extracting regions of interest from
a given image. In the following, we illustrate this by means of two examples:
segmentation of MR images of the prostate and segmentation of the left ventricle
in tagged MR images of the heart.
Segmentation of MR images of the prostate.
This example illustrates the use of the watershed transform for segmenting MR
images of the prostate, like the one depicted in Fig. 4.41(a). Here, we are interested in segmenting major anatomical features, such as the prostate, the rectum,
and the Denonvillies fascia (which separates the prostate from the anterior surface
of the rectum). In this application, segmentation is very important for pre- and
post-operative assessment of the prostate. For more information on this imaging
technique, the reader is referred to [109]. Figure 4.41(a) depicts an MR image of
the prostate with labeling indicating the three regions of interest. Our main objective is to automatically determine a set of internal and external markers that can
be used for a successful watershed-based segmentation. Figure 4.41(b) depicts the
negative of the image in (a). Notice that, in this image, the regions of interest are
characterized by high graylevel values, as compared to the gray values associated
with the tissues surrounding the prostate and rectum. Thresholding will produce
appropriate markers for these regions. However, before thresholding is applied, we
need to reduce the graylevel variation within individual regions. To accomplish
this goal, the image in (b) is subjected to the opening by reconstruction operator
[Eq. (4.32)], followed by the closing by reconstruction operator [Eq. (4.33)]. The
structuring element  in Eq. (4.32) and Eq. (4.33) is taken to be a disk with a diameter of  pixels. The result is depicted in Fig. 4.41(c). Thresholding now produces
the binary image depicted in Fig. 4.41(d), from which the required markers will
be extracted. Notice that this image consists of connected components, labeled
5 , 5 and 5 . First, the holes in 5 are closed by means of the binary close-hole
operator [Eq. (4.30)]. Then, the resulting image is slightly shrunk, by means of a
binary erosion by a disk structuring element with a diameter of pixels, and the result is smoothed out by means of a structural opening by a disk structuring element
with a diameter of  pixels. Notice that the smoothed component 5 can serve as
an internal marker for the prostate, the smoothed component 5  can serve as an
external marker for the prostate, whereas the smoothed component 5  can serve as
an internal marker for the rectum. What is left to do is to obtain an external marker

254 Morphological Methods for Biomedical Image Analysis

prostate

Denonvillies
fascia
rectum

(b)

(a)

C1

C2
C3

(c)

(d)

(e)

(f)

Figure 4.41: Segmentation of an MR image of the prostate: (a) original image with three
labeled anatomical features; (b) the negative of the image in (a); (c) the result of applying
an opening by reconstruction followed by a closing by reconstruction on the image in (b); (d)
the result of thresholding the image in (c); (e) the combined internal and external markers (in
white) obtained from the image in (d), overlaid on the original image in (a); (f) the watershed
lines, overlaid on the original image in (a). Data courtesy of Clare Tempany, MD, Director
of Clinical MRI, Brigham and Womens Hospital, Harvard University. Used with permission.

Morphological image segmentation 255


for the rectum. This can be easily obtained from 5  . By applying area opening
operators, we can extract the smoothed internal rectum marker 5  from the image
in (d). The dilation of this marker by a disk structuring element with a diameter of
 pixels is subtracted from the dilation of the marker by a disk structuring element
with a diameter of # pixels. This produces a ring that surrounds the internal rectum
marker and lies outside the rectum, in close proximity to its boundary. This ring can
now serve as the desirable external rectum marker. Figure 4.41(e) depicts the combined marker (in white), overlaid on the original image in (a). Using this marker,
the topographic surface of the internal morphological gradient of the image in (a)
is pierced at the points determined by this marker, and the subgraph is immersed
into water. The detected watershed lines are depicted in Fig. 4.41(f), overlaid on
the original image in (a). Clearly, watershed-based segmentation is successful in
segmenting the original image into the three desirable regions of interest.
We should point out that the result depicted in Fig. 4.41(f) is not clinically validated. Further processing may be required in order to obtain a clinically acceptable
segmentation result. However, the segmentation result depicted in Fig. 4.41(f) is
obtained automatically in less than 1 second, when implemented using MatLab  on
a Windows NT based system, equipped with a Xeon MHz processor, and can
serve as an initialization for more accurate model-based segmentation techniques,
if necessary.
Segmentation of tagged MR images of the heart.
In this example, we illustrate the use of marker-driven watershed-based segmentation for the extraction of major anatomical features of the left ventricle from tagged
MR images of the heart. The main application concerns cardiac image analysis,
where estimation of cardiac motion and deformation is of interest (see also Chapter
12 in this volume, for more details on this subject). The example discussed here
is interesting in several respects. Primarily, it is concerned with segmentation in
a special class of MR imaging aiming at characterizing heart motion. Accurate
segmentation helps identify and track the location of key anatomical features associated with the left ventricle. Segmentation is applied to a sequence of images
tracking heart motion in its contraction cycle. In this respect, the time evolution of
boundaries in several short-axis slices spanning the entire left ventricle is investigated. The resulting procedure is iterative in that the initial segmentation is refined
during a second pass by using the results of the first pass to derive an improved
set of markers. Alternative segmentation techniques for tagged MR images of the
heart are discussed in Chapter 3 of this volume.
The tagged MR imaging technique has been introduced by Zerhouni et al. in
1988 [110]. The procedure is based on temporarily marking tissue by means of
electromagnetic modulation. Typically, few millimeter apart parallel sheets of tag
surfaces, orthogonal to the image plane, are utilized. The intersection of these surfaces from orthogonal directions provides marks in 3D space. Tracking the motion
of these marks helps visualize and characterize heart function abnormalities in a

256 Morphological Methods for Biomedical Image Analysis


myocardium

endocardium

right ventricle

epicardium
left ventricle

(a)

(b)

Figure 4.42: (a) A slice of a tagged MR image of the heart tag lines appear in black.

(b) The result of applying a grayscale structural closing by a vertically oriented -pixel-wide

linear structuring element to the image in (a) all tag lines have been successfully removed.
Data courtesy of E. R. McVeigh, Department of Biomedical Engineering and Radiology,
The Johns Hopkins University. Used with permission.

noninvasive fashion [111] (see also Chapter 12 in this volume).


A slice of a tagged MR image of the heart is depicted in Fig. 4.42(a). In this
image, tag lines appear as black. A short axis basal view of the left ventricle is used.
This makes the left ventricle appear as a round object. Here, we are interested in
segmenting three major anatomical features of the heart, namely the left ventricle,
and the inner and outer walls of the myocardium (i.e., the wall surrounding the
left ventricle), known as endocardium and epicardium, respectively. The image
in Fig. 4.42(a) is taken right after tagging, prior to any heart motion. Therefore,
the tag lines appear as straight parallel lines. However, the tag lines deform in
time according to the contraction of heart muscle. These marks are part of the
image and may interfere with a segmentation technique used for the extraction of
the left ventricle and its wall. To remedy this situation, we remove the tag lines
by means of a grayscale structural closing using a -pixel-wide linear structuring
element, orthogonal to the tag lines. As shown in Fig. 4.42(b), this operation is
quite successful in removing the tag lines while preserving the anatomical features
of interest (e.g., the left ventricle, the endocardium, and the epicardium).
The main purpose of this example is to demonstrate the use of watershed-based
segmentation for extracting the left ventricle, as well as the endocardium and epicardium, in a number of slices as a function of time. The data used for this example
consists of  slices, each slice producing  images (frames) as a function of time.
First, we need to come up with the appropriate set of markers: an internal marker,
located within the left ventricle, an intermediate one, buried inside the myocardium,
and an external marker circumscribing the epicardium. The highest crest line of the
gradient of the original image, located in between the external and the intermediate markers, will be the watershed approximation of the epicardium. Similarly, the

Morphological image segmentation 257


highest crest line of the gradient image located in between the internal and the intermediate markers will be the watershed approximation of the endocardium. Clearly,
the success of the watershed-based segmentation procedure strongly depends on
the accurate determination of these markers in each slice and all time frames.
We use a robust technique, based on mathematical morphology, to find these
markers. The approach is based on the observation that the myocardium appears
as a ring shaped object (see Fig. 4.42(a)). Despite a noticeable deformation during
the contraction cycle, the cavity within the myocardium is a persistent attribute in
all slices. The cavities present in the first time frame of each slice are extracted
by using a simple close-hole top-hat operator  

  , based on the grayscale
close-hole operator  

 in Eq. (4.34); namely,
 

  

 

 

(4.38)

The resulting image, depicted in Fig. 4.43(a), for a slice close to the base of the
heart, is processed by the grayscale area opening operator [Eq. (4.25)], with 
, which eliminates unwanted debris. The result is depicted in Fig. 4.43(b). Notice that, the only remaining object resides within the left ventricle cavity. Simple
thresholding, followed by an application of the binary alternating filter   in
 and  being a -pixel-wide square structuring element,
Eq. (4.18), with 
produces the binary image depicted in Fig. 4.43(c), overlaid on the image depicted
in (b). This binary image serves as an initial estimate of the endocardial marker. It
will be used to provide the actual internal, the intermediate, and the external markers needed by the watershed transform. The inner marker is obtained via eroding
this image by a disk structuring element with a diameter of  pixels. The intermediate and outer markers are obtained by extracting the boundaries of the dilations of
the same image by a -pixel-wide cross structuring element and by a -pixel wide
disk structuring element, respectively. The sizes of the structuring elements are
chosen such that, the internal marker always exists, the intermediate marker always
stays within the myocardium, and the external marker does not intersect with the
epicardium. The resulting markers are depicted in Fig. 4.43(d) as an overlay. The
watershed transform is now applied on the internal morphological gradient of the
image depicted in Fig. 4.43(e) (and not on the gradient of the image in (b) to avoid
oversegmentation) and produces the result depicted in Fig. 4.43(f). The image in (e)
is obtained by applying a variant of the grayscale alternating sequential filter (  ,
discussed in Subsection 4.4.11. The main difference between this filter and the traditional alternating sequential filter is that the opening is replaced by opening by
reconstruction, and the closing is replaced by closing by reconstruction. The filter
effectively reduces grayscale variation within homogeneous regions, as is evident
by comparing Figs. 4.42(b) and 4.43(e). We use a filter of order  and a disk
structuring element with a diameter of pixels. Notice that the watershed transform does a satisfactory job at locating the endocardium and epicardium. However,
the resulting segmentation starts to deteriorate for slices closer to the hearts apex

258 Morphological Methods for Biomedical Image Analysis

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

(i)

Figure 4.43: (a) The result of the close-hole top-hat operator [Eq. (4.38)], applied on the
image depicted in Fig. 4.42(b). (b) The result of grayscale area opening applied on the image in (a). (c) The result of applying a binary alternating filter on the thresholded version of
the image in (b). (d) The internal, intermediate, and external left ventricle markers obtained
from the binary image in (c). (e) A simplified version of the image depicted in Fig. 4.42(b)
obtained by means of a grayscale alternating sequential filter based on opening and closing
by reconstruction operators. (f) The watershed lines obtained by applying the watershed
transform on the internal morphological gradient of the image in (e) marked by the markers in (d). (g) A tighter set of markers obtained by a morphological manipulation of the
result in (f). (h) The watershed lines obtained by applying the watershed transform on the
internal morphological gradient of the image in (e) marked by the markers in (g). (i) Final
segmentation obtained by smoothing the watershed lines in (h).

Morphological image segmentation 259


and for time frames taken at a later stage of the contraction. The problem associated with extracting the epicardium is due to the contact points of the left and right
ventricles, as well as on the intermixing of the hearts boundary with the silhouettes
of other structures during the contraction phase. On the other hand, the problem
associated with extracting the endocardium is associated with the appearance of
papillary muscles in the left ventricle.
To improve the quality of segmentation, the initial watershed lines in Fig. 4.43(f)
are used to construct a tighter set of markers. This is achieved by first filling in the
interior and exterior portions of the watershed curves by using the binary close-hole
operator [Eq. (4.30)]. Then, the shape corresponding to the exterior watershed line
is dilated by a disk structuring element with a diameter of  pixels. The boundary
of the resulting shape is used as the new external marker. The new internal marker
is obtained by eroding the filled in internal watershed line by a disk structuring element with a diameter of pixels. The intermediate marker is kept the same. The
aggregation of the new internal, intermediate, and external markers is depicted in
Fig. 4.43(g). Notice how tight the new set of markers is, as compared to the ones
shown in (d). The result of the watershed transform, applied on the same gradient
image with the new set of markers, is depicted in Fig. 4.43(h). The improvement
in extracting the epicardium is evident. Extraction of the endocardium is very satisfactory, even during the first pass, and does not improve much during the second
pass.
Due to the inherent noise in the imaging system, tagging artifacts and overlapping anatomical features, the segmentation is not very smooth. We address this
problem by applying a morphological opening on the interior of the exterior watershed line, using a disk structuring element with a diameter of  pixels, and by
using the boundary of the result as the final segmentation of the epicardium. Similarly, the inner watershed line is smoothed by closing the result of the filling by a
disk structuring element with a diameter of # pixels and by extracting the boundary. The resulting segmentation is depicted in Fig. 4.43(i).
A close-up of the final segmentation results, for selected set of slices at different time frames, is depicted in Fig. 4.44. These results are comparable in quality
with results obtained by other methods (e.g., based on deformable contour models see Chapter 3 in this volume). However, watershed-based segmentation can
be implemented in a fraction of time than other comparable methods. In fact, our
implementation using MatLab on a Windows NT based system, equipped with a
Xeon MHz processor, took less than  seconds per frame. This is fast, considering the complexity of the problem under consideration and the use of a high-level
interpreted language, like MatLab 
Finally, we would like to point out here that although the algorithm discussed in
this example requires specification of the shape and size of a number of structuring
elements (which seems to be case dependent), in reality these parameters can be
extracted from the data by deriving bounds on the shape and size of the myocardium

Slice #5

Slice #3

Slice #1

260 Morphological Methods for Biomedical Image Analysis

Frame #1

Frame #5

Frame #9

Figure 4.44: A close-up of final segmentation results for selected tagged MR slices of the
heart, at different time frames. Data courtesy of E. R. McVeigh, Department of Biomedical
Engineering and Radiology, The Johns Hopkins University. Used with permission.

as appears in all slices and frames.


4.8

Conclusions and further discussion

In this chapter, we presented fundamental concepts and techniques of a nonlinear


tool for image analysis known as mathematical morphology and illustrated its use
with a number of examples. The current literature reveals extensive use of mathematical morphology in biomedical image analysis applications (e.g., see [2053]).
However, it has been mostly limited to simple morphological operators, like structural erosions, dilations, openings, and closings. Accurate and efficient solutions
to a number of biomedical imaging problems can be obtained by employing more
advanced morphological tools, like the morphological skeleton, the discrete size
transform and pattern spectrum, morphological image reconstruction operators, and
the watershed transform. To some extent, this has been demonstrated in this chapter
with a few examples.
Due to space limitation, we did not discuss extensions of 2D morphology to
3D (see [31]), morphological sampling and discretization [112114], mathematical
morphology on graphs [115,116], and mathematical morphology for vector-valued

Conclusions and further discussion 261


images [117], multivalued images [118], and image sequences [119]. Moreover,
we did not discuss some recent developments, like the theory of connected operators (which include, as a special case, the morphological image reconstruction
operators discussed in Section 4.6) [120125], multiscale morphological image
decomposition schemes (morphological pyramids and wavelets) [126131], and
morphological scale-spaces [132136]. These new tools show great potential for
solving complex biomedical image analysis problems.
Of particular interest to applying mathematical morphology methods to problems in biomedical image analysis are recent investigations on the relationship between curve evolution [137139], level set methods [140, 141], differential morphology [142, 143], and nonlinear partial differential equations (PDEs) (see also
[144147]). Curve evolution studies propagation of curves as a function of time
by using techniques from differential geometry. It is used for the development
of geometric models for active contours, which are important in medical image
segmentation problems (e.g., see [148] and Chapter 3 in this volume). Level set
methods lead to algorithms for the efficient implementation of curve evolution and
produce numerical schemes for solving the PDEs underlying geometric active contour models (see also Chapter 3 in this volume). Finally, differential morphology
studies the problem of describing morphological operators via nonlinear PDEs. In
a recent paper [147], the theoretical and algorithmic relationship between curve
evolution, level set methods, differential morphology, and nonlinear PDE theory
has been investigated. It has been identified that the distance transform can be
used to relate differential morphology and curve evolution to a particular type of a
nonlinear PDE, the so-called eikonal PDE. These results indicate that mathematical
morphology is directly related to various differential geometric models, which have
been extensively used in medical image analysis problems.
Virtually all currently available image processing and analysis software packages are capable, to a certain extent, of analyzing images by means of morphological operators. The following list provides a typical sample.
A PHELION
(http://www.aai.com)
A very sophisticated stand-alone image analysis and understanding software
with a large number of morphological operators (for Windows NT/98/95 systems).
CVIP TOOLS [96]
(http://www.ee.siue.edu/CVIPtools)
A free software package, developed at Southern Illinois University at Edwardsville (for UNIX and Windows NT/98/95 systems). Contains a limited
number of morphological operators.
M EGAWAVE
(http://www.cmla.ens-cachan.fr/Cmla/Megawave)

262 Morphological Methods for Biomedical Image Analysis


A free collection of C functions that, among other things, includes morphological filtering, affine morphological scale spaces, PDE-based segmentation, and snakes (for UNIX systems). This package has been developed
at CEREMADE (Centre de Recherches en Mathematiques de la Decision),
Universite de Paris, Dauphine, France.
M ICRO M ORPH
(http://cmm.ensmp.fr)
A highly sophisticated stand-alone software package for mathematical morphology (for Windows NT/98/95 systems). This package has been developed
at CMM (Centre de Morphologie Mathematique), Ecole des Mines de Paris,
Fontainebleau, France.
MM ACH T OOLBOX FOR KHOROS [149]
(http://www.khoral.com)
A highly sophisticated mathematical morphology toolbox for a powerful visual prototyping environment (for UNIX systems).
NIH I MAGE
(http://rsb.info.nih.gov/nih-image and
http://www.scioncorp.com)
A free and highly flexible stand-alone image processing and analysis software from the National Institutes of Health (for Mac and Windows NT/98/95
systems). Contains a limited number of morphological operators.
SDC M ORPHOLOGY T OOLBOX FOR M AT L AB 
(http://www.mmorph.com and http://www.mathworks.com)
A highly sophisticated mathematical morphology toolbox, similar to MMach,
for a popular prototyping environment (for UNIX and Windows NT/98/95
systems).
4.9

Acknowledgments

This work was supported by the Office of Naval Research, Mathematical, Computer, and Information Sciences Division, under ONR Grant N00014-90-1345, and
by the National Science Foundation, under NSF Award #9729576.
The authors would like to thank Professor Edward C. Klatt, Department of
Pathology, University of Utah, for kindly permitting use of the images depicted
in Figs. 4.3, 4.54.7, 4.9, and 4.29, and Clare Tempany, MD, Director of Clinical
MRI, Brigham and Womens Hospital, Harvard University, for kindly permitting
use of the prostate image depicted in Fig. 4.41. The authors also thank Professor Elliot R. McVeigh, Department of Biomedical Engineering and Radiology, The Johns
Hopkins University, for permitting use of the data associated with the segmentation of tagged MR images of the heart example of Subsection 4.7.7. Special thanks
to SDC Information Systems for permitting use of the images depicted in Figs. 4.19

References 263
and 4.28. The example depicted in Fig. 4.28 is a slightly modified version of the
mmdfila demonstration of the SDC Morphology Toolbox for MatLab  and has
been replicated here with permission. All simulations have been implemented in
MatLab 5.3 using the SDC Morphology Toolbox for MatLab  , version 0.9.
Finally, the authors are indebted to Henk Heijmans, Roberto Lotufo, and Milan
Sonka for providing suggestions on how to improve the manuscript.
4.10

References

[1] A. K. Jain, Fundamentals of Digital Image Processing. Englewood Cliffs, New


Jersey: Prentice Hall, 1989.
[2] R. C. Gonzalez and R. E. Woods, Digital Image Processing.
sachusetts: Addison-Wesley, 1992.

Reading, Mas-

[3] M. Sonka, V. Hlavac, and R. Boyle, Image Processing, Analysis, and Machine Vision. Pacific Grove, California: PWS Publishing, second ed., 1999.
[4] G. Matheron, Random Sets and Integral Geometry. New York City, New York: John
Wiley, 1975.
[5] J. Serra, Image Analysis and Mathematical Morphology. London, England: Academic Press, 1982.
[6] H. J. A. M. Heijmans, Morphological Image Operators. Boston, Massachusetts:
Academic Press, 1994.
[7] P. Soille, Morphological Image Analysis: Principles and Applications. Berlin, Germany: Springer, 1999.
[8] H. J. A. M. Heijmans and C. Ronse, The algebraic basis of mathematical morphology I. Dilations and erosions, Computer Vision, Graphics, and Image Processing,
vol. 50, pp. 245295, 1990.
[9] C. Ronse and H. J. A. M. Heijmans, The algebraic basis of mathematical morphology II. Openings and closings, Computer Vision, Graphics, and Image Processing:
Image Understanding, vol. 54, pp. 7497, 1991.
[10] H. J. A. M. Heijmans, Theoretical aspects of gray-level morphology, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 568582, 1991.
[11] C. Ronse, Why mathematical morphology needs complete lattices, Signal Processing, vol. 21, pp. 129154, 1990.
[12] G. J. F. Banon and J. Barrera, Minimal representations for translation-invariant set
mappings by mathematical morphology, SIAM Journal of Applied Mathematics,
vol. 51, pp. 17821798, 1991.
[13] G. J. F. Banon and J. Barrera, Decomposition of mappings between complete
lattices by mathematical morphology, Part I. General lattices, Signal Processing,
vol. 30, pp. 299327, 1993.
[14] C. R. Giardina and E. R. Dougherty, Morphological Methods in Image and Signal
Processing. Englewood Cliffs, New Jersey: Prentice Hall, 1988.

264 Morphological Methods for Biomedical Image Analysis


[15] E. R. Dougherty, An Introduction to Morphological Image Processing. Bellingham,
Washington: SPIE Optical Engineering Press, 1992.
[16] J. Serra, ed., Image Analysis and Mathematical Morphology. Volume 2: Theoretical
Advances. London, England: Academic Press, 1988.
[17] J. Serra and P. Soille, eds., Mathematical Morphology and its Applications to Image
Processing. Dordrecht, The Netherlands: Kluwer, 1994.
[18] P. Maragos, R. W. Schafer, and M. A. Butt, eds., Mathematical Morphology and
its Applications to Image and Signal Processing. Boston, Massachusetts: Kluwer,
1996.
[19] H. J. A. M. Heijmans and J. B. T. M. Roerdink, eds., Mathematical Morphology
and its Applications to Image and Signal Processing. Dordrecht, The Netherlands:
Kluwer, 1998.
[20] F. Preteux, A. M. Laval-Jeantet, B. Roger, and M. H. Laval-Jeantet, New prospects
in CT image processing via mathematical morphology, European Journal of Radiology, vol. 5, pp. 313317, 1985.
[21] C. Lesty, M. Raphael, L. Nonnenmacher, V. Leblond-Missenard, A. Delcourt,
A. Homond, and J. L. Binet, An application of mathematical morphology to analysis of the size and shape of nuclei in tissue sections of non-Hodgkins lymphoma,
Cytometry, vol. 7, pp. 117131, 1986.
[22] F. Meyer, Automatic screening of cytological specimens, Computer Vision, Graphics, and Image Processing, vol. 35, pp. 356369, 1986.
[23] M. M. Skolnick, Application of morphological transformations to the analysis
of two-dimensional electrophoretic gels of biological materials, Computer Vision,
Graphics, and Image Processing, vol. 35, pp. 306332, 1986.
[24] J. W. Klingler, Jr., C. L. Vaughan, T. D. Fraker, Jr., and L. T. Andrews, Segmentation of echocardiographic images using mathematical morphology, IEEE Transactions on Biomedical Engineering, vol. 35, pp. 925934, 1988.
[25] C. Frances, M. C. Branchet, S. Boisnic, C. L. Lesty, and L. Robert, Elastic fibers
in normal human skin. Variations with age: a morphometric analysis, Archives of
Gerontology and Geriatrics, vol. 10, pp. 5767, 1990.
[26] C. Toumoulin, R. Collorec, and J. L. Coatrieux, Vascular network segmentation
in subtraction angiograms: a comparative study, Medical Informatics, vol. 15,
pp. 333341, 1990.
[27] M. C. Branchet, S. Boisnic, C. Frances, C. Lesty, and L. Robert, Morphometric analysis of dermal collagen fibers in normal human skin as a function of age,
Archives of Gerontology and Geriatrics, vol. 13, pp. 114, 1991.
[28] J. G. Thomas, R. A. Peters, and P. Jeanty, Automatic segmentation of ultrasound
images using morphological operators, IEEE Transactions on Medical Imaging,
vol. 10, pp. 180186, 1991.
[29] V. Conan, S. Gesbert, C. V. Howard, D. Jeulin, F. Meyer, and D. Renard, Geostatistical and morphological methods applied to three-dimensional microscopy, Journal
of Microscopy, vol. 166, pp. 169184, 1992.

References 265
[30] J. S. J. Lee, W. I. Bannister, L. C. Kuan, P. H. Bartels, and A. C. Nelson, A processing strategy for automated Papanicolaou smear screening, Analytical and Quantitative Cytology and Histology, vol. 14, pp. 415425, 1992.
[31] F. Meyer, Mathematical morphology: from two dimensions to three dimensions,
Journal of Microscopy, vol. 165, pp. 528, 1992.
[32] J. Pladellorens, J. Serrat, A. Castell, and M. J. Yzuel, Using mathematical morphology to determine left ventricular contours, Physics in Medicine and Biology,
vol. 37, pp. 18771894, 1992.
[33] M. E. Brummer, R. M. Mersereau, R. L. Eisner, and R. R. J. Lewine, Automatic detection of brain contours in MRI data sets, IEEE Transactions on Medical Imaging,
vol. 12, pp. 153166, 1993.
[34] Y. Chen, E. R. Dougherty, S. M. Totterman, and J. P. Hornak, Classification of
trabecular structure in magnetic resonance images based on morphological granulometries, Magnetic Resonance in Medicine, vol. 29, pp. 358370, 1993.
[35] A. Moragas, C. Castells, and M. Sans, Mathematical morphologic analysis of
aging-related epidermal changes, Analytical and Quantitative Cytology and Histology, vol. 15, pp. 7582, 1993.
[36] M. Sans and A. Moragas, Mathematical morphologic analysis of the aortic medial
structure: Biomechanical implications, Analytical and Quantitative Cytology and
Histology, vol. 15, pp. 93100, 1993.
[37] B. D. Thackray and A. C. Nelson, Semi-automatic segmentation of vascular network images using a rotating structuring element (ROSE) with mathematical morphology and dual feature thresholding, IEEE Transactions on Medical Imaging,
vol. 12, pp. 385392, 1993.
[38] J. Cardillo and M. A. Sid-Ahmed, An image processing system for locating craniofacial landmarks, IEEE Transactions on Medical Imaging, vol. 13, pp. 275289,
1994.
[39] F. Moreso, D. Seron, J. Vitria, J. M. Grinyo, F. M. Colome-Serra, N. Pares, and
J. Serra, Quantification of interstitial chronic renal damage by means of texture
analysis, Kidney International, vol. 46, pp. 17211727, 1994.
[40] W. Bocker, W.-U. Muller, and C. Streffer, Image processing algorithms for the automated micronucleus assay in binucleated human lymphocytes, Cytometry, vol. 19,
pp. 283294, 1995.
[41] J. A. Gimenez-Mas, M. P. Sanz-Moncasi, L. Rem o n, P. Gambo, and M. P. GallegoCalvo, Automated textural analysis of nuclear chromatin: A mathematical morphology approach, Analytical and Quantitative Cytology and Histology, vol. 17,
pp. 3947, 1995.
[42] C. Tsai, B. S. Manjunath, and R. Jagadeesan, Automated segmentation of brain MR
images, Pattern Recognition, vol. 28, pp. 18251837, 1995.
[43] G. Wolf, M. Beil, and H. Guski, Chromatin structure analysis based on a hierarchic
texture model, Analytical and Quantitative Cytology and Histology, vol. 17, pp. 25
34, 1995.

266 Morphological Methods for Biomedical Image Analysis


[44] M. Beil, T. Irinopoulou, J. Vassy, and J. P. Rigaut, Application of confocal scanning laser microscopy for an automated nuclear grading of prostate lesions in three
dimensions, Journal of Microscopy, vol. 183, pp. 231240, 1996.
[45] J.-P. Thiran and B. Macq, Morphological feature extraction for the classification of
digital images of cancerous tissues, IEEE Transactions on Biomedical Engineering,
vol. 43, pp. 10111020, 1996.
[46] A. Elmoataz, S. Schupp, R. Clouard, P. Herlin, and D. Bloyet, A segmentation
method combining mathematical morphology and a level set approach of active contours: Application to localization of objects in medical images, Acta Stereologica,
vol. 16, pp. 223231, 1997.
[47] S. Kumasaka and I. Kashima, Initial investigation of mathematical morphology for
the digital extraction of the skeletal characteristics of trabecular bone, Dentomaxillofacial Radiology, vol. 26, pp. 161168, 1997.
[48] A. Mojsilovic, M. Popovic, N. Amodaj, R. Babic, and M. Ostojic, Automatic segmentation of intravascular ultrasound images: A texture-based approach, Annals of
Biomedical Engineering, vol. 25, pp. 10591071, 1997.
[49] P. Bamford and B. Lovell, Unsupervised cell nucleus segmentation with active contours, Signal Processing, vol. 71, pp. 203213, 1998.
[50] A. Elmoataz, S. Schupp, R. Clouard, P. Herlin, and D. Bloyet, Using active contours and mathematical morphology tools for quantification of immunohistochemical images, Signal Processing, vol. 71, pp. 215226, 1998.
[51] P. Korn, N. Sidiropoulos, C. Faloutsos, E. Siegel, and Z. Protopapas, Fast and effective retrieval of medical tumor shapes, IEEE Transactions on Knowledge and
Data Engineering, vol. 10, pp. 889904, 1998.
[52] A. Moragas, M. Garcia-Bonafe, M. Sans, N. Toran, P. Huguet, and C. MartinPlata, Image analysis of dermal collagen changes during skin aging, Analytical
and Quantitative Cytology and Histology, vol. 20, pp. 493499, 1998.
[53] S. Baeg, S. Batman, E. R. Dougherty, V. G. Kamat, N. Kehtarnavaz, S. Kim,
A. Popov, K. Sivakumar, and R. Shah, Unsupervised morphological granulometric texture segmentation of digital mammograms, Journal of Electronic Imaging,
vol. 8, pp. 6575, 1999.
[54] P. Maragos and R. W. Schafer, Morphological filters - Part I: Their set-theoretic
analysis and relations to linear shift-invariant filters, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 35, pp. 11531169, 1987.
[55] J. Astola and E. R. Dougherty, Nonlinear filters, in Digital Image Processing
Methods (E. R. Dougherty, ed.), pp. 142, New York City, New York: Marcel
Dekker, 1994.
[56] D. Zhao and D. G. Daut, Morphological hit-or-miss transformation for shape
recognition, Journal of Visual Communication and Image Representation, vol. 2,
pp. 230243, 1991.
[57] E. R. Dougherty and R. P. Loce, Optimal mean-absolute-error hit-or-miss filters:
Morphological representation and estimation of the binary conditional expectation,
Optical Engineering, vol. 32, pp. 815827, 1993.

References 267
[58] J. S. J. Lee, R. M. Haralick, and L. G. Shapiro, Morphologic edge detection, IEEE
Journal of Robotics and Automation, vol. 3, pp. 142156, 1987.
[59] J.-F. Rivest, P. Soille, and S. Beucher, Morphological gradients, Journal of Electronic Imaging, vol. 2, pp. 326336, 1993.
[60] H. J. A. M. Heijmans and C. Ronse, Annular filters for binary images, IEEE Transactions on Image Processing, vol. 8, pp. 13301340, 1999.
[61] H. J. A. M. Heijmans, Composing morphological filters, IEEE Transactions on
Image Processing, vol. 6, pp. 713723, 1997.
[62] P. Maragos and R. W. Schafer, Morphological filters - Part II: Their relations to
median, order-statistic, and stack filters, IEEE Transactions on Acoustics, Speech,
and Signal Processing, vol. 35, pp. 11701184, 1987.
[63] J. Serra and L. Vincent, An overview of morphological filtering, Circuits, Systems
and Signal Processing, vol. 11, pp. 47108, 1992.
[64] M. Vetterli and J. Kovacevic, Wavelets and Subband Coding. Englewood Cliffs,
New Jersey: Prentice Hall, 1995.
[65] P. Maragos, Morphological skeleton representation and coding of binary images,
IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 34, pp. 1228
1244, 1986.
[66] P. Maragos, Pattern spectrum and multiscale shape representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 701716, 1989.
[67] I. Pitas and A. N. Venetsanopoulos, Morphological shape decomposition, IEEE
Transactions on Pattern Analysis and Machine Intelligence, vol. 12, pp. 3845,
1990.
[68] J. Goutsias and D. Schonfeld, Morphological representation of discrete and binary
images, IEEE Transactions on Signal Processing, vol. 39, pp. 13691379, 1991.
[69] J. M. Reinhardt and W. E. Higgins, Efficient morphological shape representation,
IEEE Transactions on Image Processing, vol. 5, pp. 89101, 1996.
[70] R. Kresch and D. Malah, Skeleton-based morphological coding of binary images,
IEEE Transactions on Image Processing, vol. 7, pp. 13871399, 1998.
[71] D. Schonfeld and J. Goutsias, Optimal morphological pattern restoration from
noisy binary images, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 1429, 1991.
[72] E. R. Dougherty, R. M. Haralick, Y. Chen, C. Agerskov, U. Jacobi, and P. H. Sloth,
Estimation of optimal morphological -opening parameters based on independent
observation of signal and noise pattern spectra, Signal Processing, vol. 29, pp. 265
281, 1992.
[73] R. M. Haralick, P. L. Katz, and E. R. Dougherty, Model-based morphology: The
opening spectrum, Graphical Models and Image Processing, vol. 57, pp. 112,
1995.

268 Morphological Methods for Biomedical Image Analysis


[74] K. Sivakumar and J. Goutsias, Discrete morphological size distributions and densities: Estimation techniques and applications, Journal of Electronic Imaging, vol. 6,
pp. 3153, 1997.
[75] E. R. Dougherty and J. B. Pelz, Morphological granulometric analysis of electrophotographic images-Size distribution statistics for process control, Optical Engineering, vol. 30, pp. 438445, 1991.
[76] V. Anastassopoulos and A. N. Venetsanopoulos, The classification properties of
the pecstrum and its use for pattern identification, Circuits, Systems and Signal
Processing, vol. 10, pp. 293326, 1991.
[77] E. R. Dougherty, J. T. Newell, and J. B. Pelz, Morphological texture-based
maximum-likelihood pixel classification based on local granulometric moments,
Pattern Recognition, vol. 25, pp. 11811198, 1992.
[78] E. R. Dougherty, J. B. Pelz, F. Sand, and A. Lent, Morphological image segmentation by local granulometric size distributions, Journal of Electronic Imaging, vol. 1,
pp. 4660, 1992.
[79] B. Li and E. R. Dougherty, Size-distribution estimation in process fluids by ultrasound for particle sizes in the wavelength range, Optical Engineering, vol. 32,
pp. 19671980, 1993.
[80] L. Vincent and E. R. Dougherty, Morphological segmentation for textures and particles, in Digital Image Processing Methods (E. R. Dougherty, ed.), pp. 43102,
New York City, New York: Marcel Dekker, 1994.
[81] E. R. Dougherty and Y. Cheng, Morphological pattern-spectrum classification of
noisy shapes: Exterior granulometries, Pattern Recognition, vol. 28, pp. 8198,
1995.
[82] S. Batman and E. R. Dougherty, Size distributions for multivariate morphological
granulometries: Texture classification and statistical properties, Optical Engineering, vol. 36, pp. 15181529, 1997.
[83] L. Vincent, Morphological algorithms, in Mathematical Morphology in Image
Processing (E. R. Dougherty, ed.), pp. 255288, New York City, New York: Marcel
Dekker, 1993.
[84] L. Vincent, Granulometries and opening trees, Fundamenta Informaticae, vol. 41,
pp. 5790, 2000.
[85] H. Blum, A transformation for extracting new descriptors of shape, in Models for
the Perception of Speech and Visual Forms (W. Wathen-Dunn, ed.), pp. 362380,
Cambridge, Massachusetts: MIT Press, 1967.
[86] H. Blum, Biological shape and visual science (Part I), Journal of Theoretical Biology, vol. 38, pp. 205287, 1973.
[87] L. Vincent, Efficient computation of various types of skeletons, in Proceedings
of the SPIE Conference on Medical Imaging V, pp. 297311, vol. 1445, San Jose,
California, 1991.

References 269
[88] F. Y.-C. Shih and O. R. Mitchell, Threshold decomposition of gray-scale morphology into binary morphology, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 11, pp. 3142, 1989.
[89] L. Vincent, Morphological area openings and closings for grey-scale images, in
Shape in Picture: Mathematical Description of Shape in Grey-level Images (Y.-L.
O, A. Toet, D. Foster, H. J. A. M. Heijmans, and P. Meer, eds.), pp. 197208, New
York City, New York: Springer-Verlag, 1994.
[90] J. C. Russ, The Image Processing Handbook. Second Edition. Boca Raton, Florida:
CRC Press, 1995.
[91] Y. Chen and E. R. Dougherty, Gray-scale morphological granulometric texture classification, Optical Engineering, vol. 33, pp. 27132722, 1994.
[92] K. Sivakumar and J. Goutsias, Morphologically constrained GRFs: Applications to
texture synthesis and analysis, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 21, pp. 99113, 1999.
[93] L. Vincent, Fast grayscale granulometry algorithms, in Mathematical Morphology
and its Applications to Image Processing (J. Serra and P. Soille, eds.), pp. 265272,
Dordrecht, The Netherlands: Kluwer, 1994.
[94] K. Sivakumar, M. J. Patel, N. Kehtarnavaz, B. Yoganand, and E. R. Dougherty, A
constant-time algorithm for erosions/dilations with applications to morphological
texture feature computation, Real-Time Imaging, To Appear, 2000.
[95] L. Vincent, Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms, IEEE Transactions on Image Processing, vol. 2,
pp. 176201, 1993.
[96] S. E. Umbaugh, Computer Vision and Image Processing: A Practical Approach
using CVIPtools. Upper Saddle River, New Jersey: Prentice Hall, 1998.
[97] R. M. Haralick and L. G. Shapiro, Computer and Robot Vision. Volume I. Reading,
Massachusetts: Addison-Wesley, 1992.
[98] F. Meyer, Skeletons in digital spaces, in Image Analysis and Mathematical Morphology. Volume 2: Theoretical Advances (J. Serra, ed.), pp. 257296, London, England: Academic Press, 1988.
[99] A. M. Lopez, F. Lumbreras, J. Serrat, and J. J. Villanueva, Evaluation of methods
for ridge and valley detection, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 21, pp. 327335, 1999.
[100] J. B. T. M. Roerdink and A. Meijster, The watershed transform: Definitions, algorithms and parallelization strategies, Fundamenta Informaticae, To Appear, 2000.
[101] F. Meyer and S. Beucher, Morphological segmentation, Journal of Visual Communication and Image Representation, vol. 1, pp. 2146, 1990.
[102] L. Vincent and P. Soille, Watersheds in digital spaces: An efficient algorithm based
on immersion simulations, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 13, pp. 583598, 1991.

270 Morphological Methods for Biomedical Image Analysis


[103] S. Beucher and F. Meyer, The morphological approach to segmentation: The watershed transformation, in Mathematical Morphology in Image Processing (E. R.
Dougherty, ed.), pp. 433481, New York City, New York: Marcel Dekker, 1993.
[104] F. Meyer, Topographic distance and watershed lines, Signal Processing, vol. 38,
pp. 113125, 1994.
[105] L. Najman and M. Schmitt, Watershed of a continuous function, Signal Processing, vol. 38, pp. 99112, 1994.
[106] L. Najman and M. Schmitt, Geodesic saliency of watershed contours and hierarchical segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol. 18, pp. 11631173, 1996.
[107] N. Michael and R. Arrathoon, Optoelectronic parallel watershed implementation
for segmentation of magnetic resonance brain images, Applied Optics, vol. 36,
pp. 92699286, 1997.
[108] J. M. Gauch, Image segmentation and analysis via multiscale gradient watershed
hierarchies, IEEE Transactions on Image Processing, vol. 8, pp. 6979, 1999.
[109] P. Ramchandani and M. D. Schnall, Magnetic resonance imaging of the prostate,
Seminars in Roentgenology, vol. XXVIII, pp. 7482, 1993.
[110] E. A. Zerhouni, D. M. Parish, W. J. Rogers, A. Yang, and E. P. Shapiro, Human
heart: Tagging with MR imaging A method for noninvasive assessment of myocardial motion, Radiology, vol. 169, pp. 5963, 1988.
[111] W. S. Kerwin and J. L. Prince, Cardiac material markers from tagged MR images,
Medical Image Analysis, vol. 2, pp. 339353, 1998.
[112] R. M. Haralick, X. Zhuang, C. Lin, and J. S. J. Lee, The digital morphological
sampling theorem, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, pp. 20672090, 1989.
[113] H. J. A. M. Heijmans and A. Toet, Morphological sampling, Computer Vision,
Graphics, and Image Processing: Image Understanding, vol. 54, pp. 384400,
1991.
[114] H. J. A. M. Heijmans, Discretization of morphological operators, Journal of Visual
Communication and Image Representation, vol. 3, pp. 182193, 1992.
[115] L. Vincent, Graphs and mathematical morphology, Signal Processing, vol. 16,
pp. 365388, 1989.
[116] H. J. A. M. Heijmans, P. Nacken, A. Toet, and L. Vincent, Graph morphology,
Journal of Visual Communication and Image Representation, vol. 3, pp. 2438,
1992.
[117] S. S. Wilson, Theory of matrix morphology, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 14, pp. 636652, 1992.
[118] J. Serra, Anamorphoses and function lattices (multivalued morphology), in Mathematical Morphology in Image Processing (E. R. Dougherty, ed.), pp. 483523, New
York City, New York: Marcel Dekker, 1993.

References 271
[119] J. Goutsias, H. J. A. M. Heijmans, and K. Sivakumar, Morphological operators for
image sequences, Computer Vision and Image Understanding, vol. 62, pp. 326
346, 1995.
[120] P. Salembier and J. Serra, Flat zones filtering, connected operators, and filters by
reconstruction, IEEE Transactions on Image Processing, vol. 4, pp. 11531160,
1995.
[121] C. Ronse, Set-theoretical algebraic approaches to connectivity in continuous or
digital spaces, Journal of Mathematical Imaging and Vision, vol. 8, pp. 4158,
1998.
[122] P. Salembier, A. Oliveras, and L. Garrido, Antiextensive connected operators for
image and sequence processing, IEEE Transactions on Image Processing, vol. 7,
pp. 555570, 1998.
[123] J. Serra, Connectivity on complete lattices, Journal of Mathematical Imaging and
Vision, vol. 9, pp. 231251, 1998.
[124] H. J. A. M. Heijmans, Connected morphological operators for binary images,
Computer Vision and Image Understanding, vol. 73, pp. 99120, 1999.
[125] J. Serra, Connections for sets and functions, Fundamenta Informaticae, vol. 41,
pp. 147186, 2000.
[126] A. Toet, A morphological pyramidal image decomposition, Pattern Recognition
Letters, vol. 9, pp. 255261, 1989.
[127] X. Kong and J. Goutsias, A study of pyramidal techniques for image representation and compression, Journal of Visual Communication and Image Representation,
vol. 5, pp. 190203, 1994.
[128] A. Morales, R. Acharya, and S.-J. Ko, Morphological pyramids with alternating
sequential filters, IEEE Transactions on Image Processing, vol. 4, pp. 965977,
1995.
[129] J. Goutsias and H. J. A. M. Heijmans, Nonlinear multiresolution signal decomposition schemes. Part 1: Morphological pyramids, IEEE Transactions on Image
Processing, To Appear, 2000.
[130] R. L. de Queiroz, D. A. F. Florencio, and R. W. Schafer, Nonexpansive pyramid for
image coding using a nonlinear filterbank, IEEE Transactions on Image Processing,
vol. 7, pp. 246252, 1998.
[131] H. J. A. M. Heijmans and J. Goutsias, Nonlinear multiresolution signal decomposition schemes. Part 2: Morphological wavelets, IEEE Transactions on Image
Processing, To Appear, 2000.
[132] M.-H. Chen and P.-F. Yan, A multiscaling approach based on morphological filtering, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11,
pp. 694700, 1989.
[133] R. van den Boomgaard and A. Smeulders, The morphological structure of images:
The differential equations of morphological scale-space, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 16, pp. 11011113, 1994.

272 Morphological Methods for Biomedical Image Analysis


[134] J. A. Bangham, P. D. Ling, and R. Harvey, Scale-space from nonlinear filters,
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, pp. 520
528, 1996.
[135] P. T. Jackway, Gradient watersheds in morphological scale-space, IEEE Transactions on Image Processing, vol. 5, pp. 913921, 1996.
[136] P. T. Jackway and M. Deriche, Scale-space properties of the multiscale morphological dilation-erosion, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 18, pp. 3851, 1996.
[137] L. Alvarez, F. Guichard, P. L. Lions, and J. M. Morel, Axioms and fundamental equations of image processing, Archive for Rational Mechanics and Analysis,
vol. 123, pp. 199257, 1993.
[138] G. Sapiro and A. Tannenbaum, Affine invariant scale-space, International Journal
of Computer Vision, vol. 11, pp. 2544, 1993.
[139] B. B. Kimia, A. R. Tannenbaum, and S. W. Zucker, Shapes, shocks, and deformations I: The components of two-dimensional shape and the reaction-diffusion space,
International Journal of Computer Vision, vol. 13, pp. 189224, 1995.
[140] S. Osher and J. A. Sethian, Fronts propagating with cutvature-dependent speed:
Algorithms based on Hamilton-Jacobi formulations, Journal of Computational
Physics, vol. 79, pp. 1249, 1988.
[141] J. A. Sethian, Level Set Methods and Fast Marching Methods Evolving Interfaces
in Computational Geometry, Fluid Mechanics, Computer Vision, and Materials Science, Second Edition. Cambridge, England: Cambridge University Press, 1999.
[142] R. W. Brockett and P. Maragos, Evolution equations for continuous-scale morphological filtering, IEEE Transactions on Signal Processing, vol. 42, pp. 33773386,
1994.
[143] P. Maragos, Differential morphology and image processing, IEEE Transactions on
Image Processing, vol. 5, pp. 922937, 1996.
[144] G. Sapiro, R. Kimmel, D. Shaked, B. B. Kimia, and A. M. Bruckstein, Implementing continuous-scale morphology via curve evolution, Pattern Recognition, vol. 26,
pp. 13631372, 1993.
[145] L. Alvarez and J. M. Morel, Formalization and computational aspects of image
analysis, Acta Numerica, pp. 159, 1994.
[146] J. Weickert, Anisotropic Diffusion in Image Processing.
Teubner-Verlag, 1998.

Stuttgart, Germany:

[147] P. Maragos and M. A. Butt, Curve evolution, differential morphology, and distance
transforms applied to multiscale and eikonal problems, Fundamenta Informaticae,
vol. 41, pp. 91129, 2000.
[148] V. Caselles, F. Catte, T. Coll, and F. Dibos, A geometric model for active contours,
Numerische Mathematik, vol. 66, pp. 131, 1993.
[149] J. Barrera, G. J. F. Banon, R. A. Lotufo, and R. Hirata Jr., MMach: a mathematical morphology toolbox for the KHOROS system, Journal of Electronic Imaging,
vol. 7, pp. 174210, 1998.

CHAPTER 5
Feature Extraction
Murray H. Loew
George Washington University

Contents
5.1

5.2

5.3
5.4

5.5

Introduction
5.1.1 Why features? Classification (formal or informal)
almost always depends on them
5.1.2 Review of applications in medical image analysis
5.1.3 Roots in classical methods
5.1.4 Importance of data and validation
Invariance as a motivation for feature extraction
5.2.1 Robustness as a goal
5.2.2 Problem-dependence is unavoidable
Examples of features
5.3.1 Features extracted from 2D images
Feature selection and dimensionality reduction for classification
5.4.1 The curse of dimensionality subset problem
5.4.2 Classification versus representation
5.4.3 Classifier-independent feature analysis for classification
5.4.4 Classifier-independent feature extraction
5.4.5 How useful is a feature: separability between classes
5.4.6 Classifier-independent feature analysis in practice
5.4.7 Potential for separation: nonparametric feature
extraction
5.4.8 Finding the optimal subset
5.4.9 Ranking the features
Features in practice
5.5.1 Caveats

273

275
275
276
278
279
279
279
280
280
280
286
286
286
287
291
291
295
296
304
306
308
308

274 Feature Extraction


5.5.2
5.5.3
5.6
5.7
5.8

Ultrasound tissue characterization


Breast MRI

Future developments
Acknowledgments
References

308
325
335
335
335

Introduction 275
5.1

Introduction

This chapter describes the need for image features, categorizes them in several
ways, presents the constraints that may determine which are used in a given application, defines some of them mathematically, and gives examples of their use in
research and in clinical settings. Features can be based on individual pixels (e.g.,
the number having an intensity greater than x; the distance between two points),
on areas (the detection of regions having specific shapes), on time (the flow in a
vessel, the change in an image since the last examination), and on transformations
(wavelet, Fourier, and many others) of the original data.
Excluded from this chapter are discussions of the many methods of image enhancement and preprocessing (for example, noise removal, contrast improvement,
edge detection) used in improving human visual understanding. A large literature
exists, and reference to it for those methods will be made as needed.
5.1.1

Why features? Classification (formal or informal) almost always depends


on them

Classification, comparison, or analysis of images is performed almost always


in terms of a set of features extracted from the images. Usually this is necessary for
one or more of the following reasons:
Reduction of dimensionality. An 8-bit-per-pixel image of size 256-by-256
pixels has   
  possible realizations. Clearly it is worthwhile to express structure within and similarities between images in ways
that depend on fewer, higher-level representations of their pixel values and
relationships. It will be important to show that the reduction nevertheless
preserves information important to the task.
Incorporation of cues from human perception. Much is known about the
effects of basic stimuli on the visual system. In many situations, moreover,
we have considerable insight into how humans analyze images (essential, for
example, in the training of radiologists and photo-interpreters). Use of the
right kinds of features would allow for the incorporation of that experience
into automated analysis.
Transcendence of the limits of human perception. Notwithstanding the great
facility that we have in understanding many kinds of images, there are properties (e.g., some textures) of images that we cannot perceive visually, but
which could be useful in characterizing them. Features can be constructed
from various manipulations of the image that make those properties evident.
The need for invariance. The meaning and utility of an image are often unchanged when the image itself is perturbed in various ways. Changes in one
or more of scale, location, brightness, and orientation, for example, and the

276 Feature Extraction


presence of noise, artifact, and intrinsic variation are image alterations to
which well-designed features are wholly or partially invariant.
The assumption made throughout this chapter is that feature extraction is automated, and we define and illustrate only those features that can be computed
without user interaction. In some cases, however, it is necessary for the user to
identify one or more regions of interest (ROIs) in the image; the features are then
extracted automatically within each of the ROIs. In other cases, the image will
have been segmented divided into regions each of which is internally homogeneous and different from its neighbors. For purposes of this chapter, we assume
that whatever is the subject of our feature extraction the entire image, a region
of interest, or a segment it has been identified independently of (and usually
prior to) the feature-extraction process. When relevant, the term subimages will be
used to include both segments and regions-of-interest. (Though not addressed here,
segmentation is a very important and intensively-studied subject; see Chapter 2.)
5.1.2

Review of applications in medical image analysis

Features play a number of roles in medical image analysis. Model-driven features, which incorporate knowledge of the anatomy and condition in which we are
interested, are introduced in Chapter 7. Such a top-down approach is useful in
many imaging applications. The categories below aim to describe briefly the tasks
in which features many of which are model-driven are used. Other chapters in
this volume contain numerous examples of the applications of features; this chapter
provides a brief taxonomy and some explanation of the origins of the features.
5.1.2.1

By purpose

a. Screening (detection)
The goal in screening is to detect conditions that may be disease or evidence
of disease, with the intent of conducting a detailed follow-up of suspicious
findings. Consequently, the emphasis usually is on sensitivity (the detection
of abnormalities when they exist) at the cost of an increased false-positive
rate (decreased specificity). And, because screening is performed on large
numbers of people, it is important that the procedures be low in cost and
provide results quickly. The features should therefore be easy to extract,
contribute to the sensitivity of the procedure, and require minimal user intervention.
Examples of images used in screening include x-ray mammograms for breast
cancer, structured visible light images for childhood scoliosis (curvature of
the spine), and fundus photography for diseases of the retina of the eye.
b. Diagnosis (classification) Diagnosis aims to make a specific identification of
a problem: is the suspicious region in the breast a fibroadenoma, a cyst, or a

Introduction 277
carcinoma? Are there microaneurysms in the retina? In cases such as those,
where screening preceded diagnosis, additional features may be required. In
many applications, however, diagnosis is the first step; examples include the
classification and counting of blood cells, the analysis of tissue cells, and the
characterization of chromosomes in the pathology laboratory.
Another application is comparison: of an image with an earlier version of
the same region (to describe changes in a condition), or of a given image to
an atlas or other standard, for diagnosis or reporting. The Visible Human
Project [1] provides data sets that could be useful in that respect.
c. Therapy and treatment planning
In radiation oncology it is often necessary to identify and align structures in
two images: one is the prescription image used to indicate the areas to be
treated; the other is the portal image, taken just prior to treatment. The portal image is intended to confirm that the treatment beam is aimed correctly.
Typically, however, image quality is low, because of greatly reduced contrast.
Various enhancement methods [2, 3] have been employed, but it is nevertheless often necessary to extract features (e.g., shapes, areas) to provide the
basis for identification of the treatment areas and their boundaries.
A solution of the more general problem of multi-modality image registration (as used, for example, in image-guided surgery [4, 5]) often depends on
the ability to recognize correspondence between equivalent structures in the
separate modalities. In cases where external fiducials are not used, successful registration may rely on the comparison of features extracted from each
modality [6].
5.1.2.2

By specialty

The features chosen will depend in part on the modality and intended use; the
clinical specialty will dictate that and impose its own requirements and conventions.
For that reason, the features must be capable of being shown to fit the clinicians
understanding of the pathology or disease process; in many cases, of course, it will
be the clinicians experience that suggests features in the first place.
Radiology (gray-scale images only): x-ray plain films (of chest, extremities),
computerized tomography (CT), magnetic resonance imaging (MRI), ultrasound, positron emission tomography (PET), single-photon emission computerized tomography (SPECT), nuclear medicine; fluoroscopic methods for
angiography and interventional techniques
Pathology (color plays an important role for clinicians): optical and electron
microscopy and effect of using stains of various kinds; fluorescence
Dermatology (color is important): still and video images of skin

278 Feature Extraction


Ophthalmology (color is important): still and video images of surface of eye;
retina
5.1.2.3

By representation

2D: the typical representation as a function of location in two dimensions,


intensity is mapped as a gray-scale or color value
3D: a set of 2D images as a function of time, or as a function of location
within the body (a stack of slices)
3D + time: typically, a set of 2D images (a stack of slices) taken repeatedly
over time
Form and function: is the image intended to convey information only about
shape, texture, area, etc., or to describe function (e.g., uptake of some material, or blood flow)?
5.1.3

Roots in classical methods

The techniques employed in feature extraction generally are motivated by wellunderstood methods from image processing (in turn, based largely on communication theory; e.g., [7] and signal processing [8]), and statistics [9]. The extension to
higher dimensionality of classical methods usually is straightforward in principle
(though the computational cost can be high), but even then, many methods lack the
capability to deal with geometric and topologic structure. The definitions of and
attempts to characterize connectivity, texture, and boundary, for example, and the
effect of scale changes, all required extensions and new developments. Part of that
effort derived from the goal of incorporating lessons from human vision, given that
it is so successful at integrating local features [10, 11].
When considering individual pixels, it is useful to look for neighbors, connectivity, gradients, and other one-dimensional descriptions (perhaps computed in
2D): examples are the histogram and measures of entropy. But much human image understanding depends on the description of regions, and feature extraction can
assume the regions are given or can attempt to find them. The latter task often is
called segmentation and is considered in Chapter 2. Examples of the ways regions
can be described include the following: statistics of amplitude; boundary description (including fractal); edge detection; topological measures; shape descriptors;
texture aggregation; co-occurrence; moments; correlation; skeletons and other medial measures; and the result of morphological operations.
Many kinds of transforms are usefully applied for feature extraction. The
Fourier transform, for example, provides information about spatial frequency for
the image or some subimage. But questions of scale, and the detection of properties that are global or local (and perhaps not knowing a priori which are needed),
have led to the use of a set of transforms that preserves locality in space (and in
time, for sequences of images).

Invariance as a motivation for feature extraction 279


It might seem reasonable to expect that if a feature or a set of features provides
a good description of an image, then there should be some relationship between
similarity of the features and similarity of the images they represent. The definition of a similarity measure is a vital issue. In the case of the features themselves
(generally represented as vectors), a Euclidean distance or other simple metric (absolute value, city-block distance) is often useful. Those measures effectiveness for
images, however, is not so evident. Usually, the similarity of images is evaluated
with respect to a task. And some of the most common measures of distance (e.g.,
mean-square error) can be shown to be unrelated to visual perception of similarity
for certain tasks. This means that features usually will have to be evaluated on the
basis of the tasks for which they are intended (e.g., diagnosis, screening, treatment
planning), rather than for their (visually perceived) representational accuracy.
5.1.4

Importance of data and validation

Once a candidate set of features is identified, it is essential to evaluate it, usually


as indicated in the previous section that is, with respect to the task. Even if the
set works well for a sample of problems for which the truth is known perfectly (a
gold standard), how can we have confidence that the feature set will perform well
in general? Can we appeal to first-principles (e.g., physics, psychophysics), or must
we perform great numbers of experiments? Only rarely will the task be defined so
narrowly and explicitly that we can expect a theoretical argument to suffice. More
often, it will be important to use reference data sets, composed of images for which
the correct diagnosis, screening decision, or treatment plan is known. To show that
the results are reproducible, we must have a data set of substantial size, determined
by the kinds and amount of clinical variation to be expected. Validation of a feature
sets utility is an important step towards clinical adoption. Chapter 10 provides a
methodology for validation.
5.2
5.2.1

Invariance as a motivation for feature extraction


Robustness as a goal

Ideally, features extracted from medical images should be robust; they should
be capable of providing the same information irrespective of noise, artifact, intrinsic variation in the underlying image, and parameter settings in the extraction
algorithms. In practice, this is difficult to achieve and to demonstrate. Noise models exist for the major imaging modalities [12, 13], so it is possible to simulate
(and sometimes to model) feature extraction under a variety of noise conditions
and thus to evaluate performance. Image artifacts (e.g., inhomogeneities of field
in MR, scintillation-screen variations in x ray) are quite varied [12]. Intrinsic variation the range of anatomic and physiologic variation in health and in disease
has some quantification [14, 15], but rare cases exist and cannot be considered
comprehensively. Algorithms for feature extraction preferably should not require
user-defined parameter settings; where unavoidable, those parameters variations

280 Feature Extraction


should be evaluated for their effect on feature values. The goal is to make the
extraction process minimally dependent on them.
5.2.2

Problem-dependence is unavoidable

The nature of the problem should be taken into account when features are being considered. The clinical users knowledge of and experience with the data,
in the context of the application, will suggest features that should be evaluated.
Although many case studies exist in the literature that offer examples of successful features, most new problems can benefit from individualized considerations of
their characteristics. It is unrealistic to expect that some general set of features will
always contain a useful subset; rather, the analyst should enlist the aid of the clinician and investigate the problem carefully. It will be necessary then to convert the
(often) qualitative expression of the features into quantitative and repeatable measures. Techniques in the following sections are likely to be useful, but should be
considered as starting points only.
5.3

Examples of features

Certain assumptions are embodied in the examples and techniques that follow.
In some practical cases, for example, it will be important to employ preprocessing
operations to remove noise from the images. A great variety of methods exists
[7, 9, 16] and will not be examined here. It should be noted, however, that the
choice of technique may depend on whether the image is to then be analyzed by a
human, a machine, or both. Similarly, it may be necessary to segment the image
into regions of interest prior to applying some of the approaches described here.
5.3.1

Features extracted from 2D images

5.3.1.1

Descriptive statistics for images and subimages

Computed using all pixels individually


A two-dimensional image    is defined as a set of gray levels (amplitudes) assigned to picture elements (pixels) occupying a space    
       . The amplitude range is quantized into  equal intervals  
      . Most medical images have  set to an integral
power of two, typically 256 (8 bits per pixel), 4,096 (12 bits), or 65,536 (16
bits). The number of pixels,   , having intensity  can be used to construct an estimate of the image histogram
      

where   is the total number of pixels.


A graph of a histogram (in which    is plotted against  for
      
  ) provides insight into the nature of the image. If the histogram is

Examples of features 281


narrow (occupies only a small portion of the gray-scale range), it indicates a
low-contrast image. If bimodal, the histogram may represent two regions
an object of one range of intensities on a background of another range, for
example.
Features describing the shape of a histogram are often quite useful [7, 9] and
are provided by the following statistical descriptors, among others:
a. Mean







   


b. Central Moments


 


     

The second central moment  is the variance,   , and is helpful in


describing uniformity of a given image or region.
The third central moment is a measure of noncentrality, and the fourth
measures relative flatness (or strength of the tails). With suitable parameters for scale and shift, they are known respectively as skewness
and kurtosis.
c. Energy


   




d. Entropy

 

      

Additional descriptors include the maximum, minimum, mode (the location


of the peak of the histogram most useful if there is only one), median (the
intensity at which half the pixels are of lower value a statistic affected
less by outliers than is the mean), and other measures (e.g., quartiles) of
aggregations of intensity values.

282 Feature Extraction


Computed using all pixels, in groups
A very useful characterization that uses subsets of pixels can be found
in the co-occurrence (gray level dependence) matrix, including quantizations of gray levels and characterizations of it. It is discussed at
length below. Spatial gray-level dependence matrices have been found
effective in a number of applications. Mammography has particularly
been able to use them well; in one case, 13 texture measures derived
from the matrix were found to be useful in the classification of microcalcifications [17].
Entropy, described above, is useful also when evaluated on subsets of
images.
5.3.1.2

Descriptions of regions

Regions typically are defined on the basis of their internal homogeneity in some
characteristic(s). Scale often is important in defining that homogeneity. And, because there is often self-similarity in the homogeneity, fractal features can provide
information that other kinds of features cannot. The following sections illustrate
those points.
Shape The shape of a subimage may be described in terms of its boundary (contour-based) and/or its interior (region-based).

Contour-based (see, for example, [18])


A comprehensive treatment of curvature of the contour at each point is contained in [16]; an example of effective application of radial edge-gradient
analysis for spiculations in mammography is in [19].

Region-based
a. Effective diameter (diameter of circle having the same area)







b. Circularity (   for a circle)




where P is perimeter.




Examples of features 283


c. Compactness (maximum for a circle)

 





d. Projections
Though useful mostly in binary image processing, projections can serve
as a basis for definition of related region descriptors. They can be defined in all directions; the most common, however, are the horizontal
and vertical:


 
 


  





   


   


They can be useful also in measuring homogeneity in gray-scale images, and height and width of regions.
e. Moments
When comparing images or their regions to one another or to a standard,
a set of moments derived by Hu [20] can be quite useful, as it has the
properties of being invariant to translation, rotation, and scale change.
For the image f(x,y), we define


 
  

         

where




    
  



    
  

and the normalized central moments

284 Feature Extraction

 


 

where





 

The invariant moments are then defined as

!     


!        

!           


!
          

!        


          

        
         

     
          
        

!         


          

       
          

Other sets of moments have been described by Flusser and Suk [2123]
(yielding descriptors that are invariant under general affine transformations) and by Gupta and Srinath [24] (which uses only the boundary
and can treat spiral and nonconvex contours).
Texture, and the co-occurrence matrix An example of the use of texture was
provided by Thiele et al. [25], who used texture analysis of the breast tissue surrounding microcalcifications on digitally acquired images during stereotactic biopsy to predict malignant versus benign outcomes. The analysis calculated statistical features from gray level co-occurrence matrices and fractal geometry for equalprobability and linear quantizations of the image data. That preliminary study,
using 54 cases, obtained a sensitivity of 89% and a specificity of 83%, and it was
expected that this would be useful in resolving problems of discordance between
pathological and mammographic findings, and might ultimately reduce the number
of benign biopsies.
One way to describe relationships among pixels is to choose a relationship and
examine the image to determine the ways in which the relationship appears. Let

Examples of features 285


be a relationship operator and let  be an  x  matrix. The operator can
be viewed as a displacement vector     that specifies the direction and
spacing from a given pixel to another. Each element   of  contains the count of
the number of times that such a pair of pixels occurs related by , and having,
respectively, gray levels  and  .
Let  be the number of point pairs in the image that satisfy . If a matrix  is
normalized by dividing every entry of  by , then   is an estimate of the joint
probability that a pair of points satisfying will have values     . The matrix
thus defined will in general not be symmetric because of the directionality of the
relationship between the pixels. The matrix  is called a gray-level co-occurrence
matrix.
An understanding of the properties of  may be developed by considering possible values of . If the texture in the image is coarse, i.e., is smaller than the
texture elements dimension, then pixels separated by will have similar gray levels
and there will be many counts along the main diagonal of the matrix. Conversely,
high-frequency variations within the image ( comparable to texture-element size)
will appear in  as substantial numbers of counts located far from the diagonal,
making the overall matrix more uniform.
In practice,  (or ) is computed for several values of . One way to define
the relationship more formally is to specify the angle ! and distance  from the
first to the second pixel [18]. Using  
   or  
   to denote the entries
in the matrix for gray levels
and  , we can extract several features from the cooccurrence matrix that will give insight into the textural nature of the image [18,26,
27]. Because  is a histogram, some of the features introduced in Section 5.3.1.1
are applicable here also.

a. Energy a direct measure of homogeneity


b. Entropy an inverse measure of homogeneity
c. Maximum probability
d. Contrast (a measure of local image variation; typically,  = 2 and  = 1);
smaller when values are concentrated near the main diagonal.
e. Inverse difference moment
f. Correlation (linear structures in direction f result in large correlation values
in this direction)
Among other desirable properties of the co-occurrence approach, invariance to
monotonic gray-level transformations is important in medical imaging. The reason
is that, although it may not be possible to control the absolute gray-level values

286 Feature Extraction


from image to image, their relative values are preserved. Thus, descriptors extracted from the co-occurrence matrices of similarly textured images will make
that similarity evident.
Co-occurrence methods yield good results in texture discrimination, but are
computationally expensive. A variety of modifications and computational strategies have been proposed to reduce the cost. A fast algorithm [28,29] exists for submatrices, as might be used in image segmentation. In one case [30], co-occurrence
array size varies with region size; in another, the gray levels are quantized (e.g.,
eight levels are combined into one to create only 32 distinct values) which reduces
the computation significantly without markedly affecting performance [31].
5.4
5.4.1

Feature selection and dimensionality reduction for classification


The curse of dimensionality subset problem

In any effort at designing a method for classifying images (e.g., into normal
and disease), it is essential that there be a training set of images of known classification. As illustrated above, many features could be computed and used in the
classifier. But misclassification probability tends to increase with the number of
features, and classifier structure is more difficult to interpret [32]. Further, the prediction variability tends to increase, and the classifier is sensitive to outliers. There
is no guarantee that the classifier will perform as well on a new set of samples.
Cover [33] has shown that if there are too many features and too small a training
set, a perfect classification may result, which is nevertheless meaningless in the
sense that performance on a test set may be very poor.
How, then, should a best subset of a set of candidate features be chosen? Only
an exhaustive search over all subsets of features can provide the answer [34]. For
dimensionality , the number of subsets is  , implying a large computing task.
Further, for the two-class case, it was found that for large  and small probability
of error, , the sample size is given by
"




and so it is easy to see the growth with dimensionality and with reduction of error. The remainder of this section examines the question of feature selection as a
practical problem.
5.4.2

Classification versus representation

The many applications and types of image features offer a number of opportunities to define and select those that will be most useful in a specific situation. A
pathologist who knows that nuclei of certain sizes are important may wish to have
the computer examine a series of cell images and display only those containing the
desired nuclei. The radiologist comparing several ultrasound images might want to

Feature selection and dimensionality reduction for classification 287


know whether the textures in corresponding anatomic regions are similar. In both
cases, the user knows what measurements (or kinds of measurements) on the image
are required.
In a large class of applications, however, the choice of features is not clear. Is
there diffuse disease in this ultrasound image? Does this mammogram have a mass
present? Is it benign or malignant? In those cases we wish to classify an image or
subimage into one of a finite number of mutually exclusive classes. Representing
a given image by a set of feature values is equivalent to associating it with a point
in a space of dimensionality equal to the number of features in the set. A classifier
can then be designed to separate (usually imperfectly) those points representing
the distinct classes. The underlying expectation is that samples of a given class
will tend to be near each other in feature space, while being separated well from
samples of the other classes. The invariance that we seek (Section 5.1.1) thus is
manifest in terms of the set of features.
The problem then becomes that of choosing the set of features. The criterion
used in selecting a set is often taken to be the probability of misclassification P(e).
The (sub)set of features yielding minimum P(e) is chosen as the best set. Implicit in
that evaluation, however, is the need for a classifier structure. A classifier uses the
feature set for a given sample to produce a class decision. If sufficient data exist, it
is possible to create a parametric classifier which uses knowledge of the underlying
probability structure of the data, class by class. When large amounts of data are
not available, nonparametric approaches are used. Though not examined here, numerous examples of the design of parametric (e.g., Bayes) and nonparametric (e.g.,
nearest-neighbor) classifiers can be found readily [35]. (Chapter 7 has additional
details.) The next section describes an approach which selects a feature set based
on its intrinsic ability to separate the classes rather than on a classifier structure.
5.4.3

Classifier-independent feature analysis for classification

Feature analysis for classification is based on the discriminatory power of features. This is a measure of the usefulness of the feature(s) in determining the class
of an object. Traditional feature analysis for classification addresses only classifierspecific discriminatory power: first a classifier is selected, then the discriminatory
power of a feature is proportional to the accuracy of the classifier when it uses that
feature. Traditional feature analysis for classification is thus classifier-driven. This
section presents recent work [36] that provides an alternative, data-driven approach
to feature analysis that is called classifier-independent feature analysis (CIFA). It is
based on the nonparametric discriminatory power of features, defined as the relative usefulness of a feature within a subset in the absence of classifier-specific assumptions. CIFA ranks features by the amount of separation each feature induces
between classes.
This approach is taken because the classifier-specific approach to feature analysis does not address adequately problems such as medical image analysis and

288 Feature Extraction


classification; it defines as most useful those features that yield the most accurate classification for a particular classifier. And even in more traditional problems
wherein designing a classifier is the goal, the classifier-specific approach to feature analysis has a number of limitations. Perhaps the most important limitation is
that optimizing overall classification performance generally is not feasible, because
there is often no way to determine whether a particular classifier is optimal for a
particular problem. Thus, the best a practitioner can do is choose a classifier, select the best features for that particular classifier from among those available, and
perhaps extract a smaller set of optimal features from the original ones.
In contrast, CIFA can address the medical-image diagnosis problem, as well as
a number of the other limitations of the classifier-specific approach. The method
can provide to the diagnostician valuable information concerning a given set of
features, using data provided by the diagnostician. In a more general sense, classification performance can be optimized using classifier-independent feature analysis
to gather information concerning the structure of the data rather than the requirements of a particular classifier. CIFA can be used to guide a search for features with
discriminatory power sufficient to meet some threshold requirement for classification accuracy, whether classification is performed by a classifier or a human being.
The method can also be used to select the classifier best suited to a particular classification problem. A classifier can be chosen which can best utilize the features
that have high discriminatory power.
When no automated classifier is involved, classifier-specific feature analysis is
not relevant. When an automated classifier is involved, both CIFA and classifierspecific feature analysis may be used; their roles are illustrated in Fig. 5.1.
5.4.3.1

Assisted versus automated classification

Classifier-independent feature analysis is especially critical when the goal is assisted classification, rather than automated classification. Most of the classification
work to date has focused on automated classification, i.e., replacing a human expert.
Automated classification is intended usually for applications such as assembly-line
quality control, in which an automated classifier can be used to identify defective
(or potentially defective) items. The role of a diagnostic system, however, should be
to assist rather than replace the diagnostician. An example of a system for assisted
diagnosis would be the use of image enhancement techniques to aid in medical
imaging.
Feature analysis for problems such as diagnosis must necessarily be based on
classification, since the goal is still classifying objects. Assumptions specific to a
particular automated classifier should not be made because the diagnostician serves
as the classifier in this system, and the classification information can be captured
only via a learning sample, i.e., a set of correctly classified objects.

Feature selection and dimensionality reduction for classification 289

Identify
Candidate Feature Set
ClassifierIndependent
Feature Analysis

Learning
Sample

Nonparametric
Discriminatory Power

e.g.
Medical
Diagnosis

STOP

Familiarity, etc
Classifier-Specific
Feature Analysis

Production
Classifier
System

Choose
Classifier

Data Collection
Measurements
Feature Extractor
Feature Values
Classifier
Assignment of Class Labels

Evaluate Classifier
Performance

STOP

Figure 5.1: Feature analysis for classification based on nonparametric discriminatory


power. Hatch marks indicate tasks performed by people. Classifier-independent feature
analysis allows the process to be data-driven.

290 Feature Extraction


5.4.3.2

Relative feature importance: a metric for nonparametric discriminatory


power.

A new metric for nonparametric discriminatory power has been developed that
is called relative feature importance (RFI). RFI uses nonparametric distribution estimation to avoid classifier-specific assumptions. RFI ranks features by their relative contribution to the potential for separation between class-conditional joint feature distributions. Thus the fundamental assumption underlying RFI is that proximity in feature space can be used to determine class membership. Note that this
assumption does not mean that proximity in feature space can be used as given
to determine class membership, but rather that it is possible to extract information
from the given features which can then be used to determine class membership
based on proximity.
Only those features within an optimal subset are ranked, to eliminate noise
and redundant features. The rankings within the optimal subset take into account
interactions between features, since the features are not assumed to be independent.
RFI assigns features outside the optimal subset a discriminatory power of zero.
Directly calculating RFI requires several forms of estimation: estimating the
shape of the class-conditional joint and marginal distributions, and estimating the
contributions of the initial set of features to separation between the classes. Accuracy of estimation is balanced against the minimization of the assumptions required
to calculate the metric. RFI is always based on the within-class and between-class
scatter matrices of the learning sample; there are alternatives, however, for the indicated distance measures, weighting algorithms, separation criteria, and scatter
matrix formats [36].
5.4.3.3

Local out-of-class mixture mean

The idea of nonparametric scatter matrices used by RFI to measure the separation between classes is based on an original extension of the idea of a nonparametric local mean [37], called the local out-of-class mixture mean. Though
that work was limited to two classes, the local out-of-class mixture mean us used
here to permit extrapolation of their technique to multiclass problems.
5.4.3.4

Calculating between-class scatter using the original data

While the nonparametric scatter matrices used by RFI are as proposed in [37],
with the exception of the local out-of-class mixture mean, the algorithm used to
calculate the matrices here is quite different. Fukunaga and Mantock calculated
within-class scatter first, whitened the data, and went on to calculate between-class
scatter using the whitened data. While this technique is theoretically sound for
parametric scatter matrices, the calculations do not hold for nonparametric scatter.
RFI calculates both within-class scatter and between-class scatter using the original
data, leading to a successful measure of separation.

Feature selection and dimensionality reduction for classification 291


5.4.4

Classifier-independent feature extraction

The potential for separation between the classes present in a subset of features is
measured via nonparametric discriminant analysis using the local out-of-class mixture mean and calculation technique mentioned above. This first step in calculating
RFI is useful as a stand-alone classifier-independent feature extraction technique.
Features are processed and extracted in assisted classification, as well as automated
classification; almost all measurements are quantized or processed to some degree.
As a final step in assisted classification, reducing the number of features presented
to the human expert can be helpful.
The algorithm used to estimate the contribution of each original feature to the
potential separation between the class-conditional joint feature distributions, the
Weighted Absolute Weight Size (WAWS), derives from the work of Mucciardi and
Gose [38]. Given the eigenvectors and eigenvalues used in discriminant analysis
(nonparametric or parametric), WAWS can be used to rank features within any
given set of features, not just the optimal subset of those features.
At the heart of classifier-independent feature analysis is the calculation of nonparametric discriminatory power. The metric relative feature importance (RFI),
combines the classifier-independent feature extraction algorithm with WAWS to
rank features by their nonparametric discriminatory power.
5.4.5

How useful is a feature: separability between classes

The goal of classifier-independent feature analysis for classification is to measure the usefulness of the features in the candidate feature set. Nonetheless, classification performance on the learning sample cannot be used in and of itself as a basis
for analyzing the features for several reasons. First, as noted above, it has been
shown that, in the general case, features that optimize classification performance
for one classifier may not perform at all well in another classifier [33]. Indeed,
because one of the uses of classifier-independent feature analysis is to guide the
choice of automated classifier, classification performance is not a good measure
since it would lead to a search through some candidate set of automated classifiers.
More fundamentally, though, classifier-independent feature analysis tries to measure the potential for discrimination between classes of the features in the candidate
feature set, which potential may not be realizable in practice.
Once classification performance has been eliminated as a measure of usefulness, what remains is the separability between the classes. Separability is not subject to the theoretical constraints of classification performance. When expressed
as Bayes error, the separation between class-conditional joint feature distributions
places a lower bound on classification error that is classifier-independent. Unfortunately, Bayes error is not calculable for many problems. Nonetheless, separation
between class-conditional joint feature distributions gives rise to the potential for
classification. Issues of calculation aside, classifier-independent feature analysis
uses separability between classes as the basis for the usefulness of a feature.

292 Feature Extraction


A theoretical constraint placed on feature analysis is that feature rankings are
subset-dependent. Even under the assumption of feature independence, feature
rankings can change as a function of adding and removing features [33]. Nonetheless, ranking the features is a critical component of feature analysis: in medical diagnosis, when test results are ambiguous, the physician needs guidance as to their
relative value for discrimination. Therefore, ranking is given within a subset, with
the critical ranking being that within the optimal subset. The optimal subset of the
candidate feature set is defined as the smallest subset with the maximum potential
for separability between classes.
5.4.5.1

Relative usefulness: discriminatory power

Traditional feature analysis techniques have either searched for the optimal subset of features (generally called feature selection), or measured the usefulness of the
features independently from one another (generally called feature ranking). A principal difficulty with the traditional approach is that the information given by feature
selection and feature ranking algorithms can be contradictory [33].
In measuring discriminatory power, the goal is to measure the relative usefulness of each feature within a subset (whether the whole candidate feature set or
some proper subset of it), given the use of the other features. Figure 5.2 illustrates
a series of related problems in which the relative usefulness (separability) of the
individual features change. Note that while these problems can be solved using the
marginal distributions of the features, problems which cannot be solved using the
marginals can be constructed. Figure 5.3 illustrates a series of problems which cannot be solved using the marginals. Due to the difficulty in determining the correct
ranks for the features, however, problems such as those in Fig. 5.3 are not useful in
designing and comparing metrics for nonparametric discriminatory power.
Discriminatory power aims at revealing the underlying structure of the data,
rather than simply optimizing classifier performance. A researcher in medical diagnosis, knowing that features based on blood tests generally rank high in diagnosing a certain condition, may want to focus limited resources on searching for
new, better features based on blood tests. These sorts of decisions are currently being made based on human interpretation of low-dimensional projections and mapping techniques [39]. Measuring discriminatory power provides a powerful tool for
managing high-dimensional feature spaces.
5.4.5.2

Nonparametric discriminatory power

Classification error is fundamentally a function of the separation between classconditional joint feature distributions. Increasing separation provides the opportunity to improve performance. A classifier can realize this opportunity only if its
specific structure fully exploits the increased separation, which makes classifierspecific discriminatory power dependent on that structure. For example, a Bayes
linear classifier will not be able to exploit the difference in discriminatory power

Feature selection and dimensionality reduction for classification 293

feature 2

1
300 samples

300 samples
1

feature 1

300 samples

300 samples
2

}}

2 1

}
}

(a)
feature 2

1
300 samples

300 samples
1
300 samples

feature 1

300 samples

2 1

}
2

}
2

}}

(b)

Figure 5.2: A problem with two features with multicluster uniform distribution. (a) Classes
1 and 2 are completely separable under feature 1. Feature 2 is redundant, because it
does not contribute any further discriminatory information to the feature set. (b) The center
clusters are ambiguous under feature 1; therefore, perfect discrimination between classes
is no longer possible. Feature 2 still does not contribute any information to the feature set
not already captured by feature 1, and thus remains redundant. (Continued in next page.)

294 Feature Extraction


feature 2

1
300 samples

75%
overlap

}}

300 samples

feature 1
300 samples

300 samples

50% overlap

(c)
feature 2

}
}

1
300 samples

2
300 samples
1

feature 1

300 samples

300 samples

25%
overlap
2

50% overlap

(d)

Figure 5.2: (Continued.) (c) The center clusters are ambiguous under both features, and
each feature adds some separability. Thus both features are useful, but feature 1 is more
useful (has higher nonparametric discriminatory power) than feature 2 since it induces more
separability. (d) The center clusters are ambiguous under both features, and each feature
adds some separability. Feature 2 now provides more separability than feature 1. The
discriminatory power of the features is a function of the relative percentage overlap of the
center cluster for each feature.

Feature selection and dimensionality reduction for classification 295

class 1:
class 2:

feature 2

feature 3

Figure 5.3: A relatively simple problem which cannot be solved using methods based on the
marginal distributions of the features. Two classes of two clusters each with three features
(feature 1, not shown, is noise, N(0,1) for all four clusters). Features 2 and 3 are Gaussian
within each cluster with means as shown. Feature 1 should be discarded. When features
2 and 3 have equal variance, they are equally useful. When features 2 and 3 have different
variances, their usefulness is different.

between features 1 and 2 in Fig. 5.3. In contrast, the nonparametric discriminatory power of a feature is defined as the amount it contributes to the potential for
separation between the class-conditional joint feature distributions.
5.4.6

Classifier-independent feature analysis in practice

In practice, classifier-independent feature analysis has a number of applications. The first step is to identify a set of candidate features. A learning sample
is collected using those features. The nonparametric discriminatory power of the
features is measured. In some applications, the discriminatory power is the desired
end result (e.g., the use of focus group information in product development). New
features may be generated on the basis of the discriminatory power of the old features in an iterative fashion. In applications requiring an automatic classifier, when
the separation between class-conditional joint feature distributions (the theoretical
lower limit to classification error) is reduced to an acceptable point, a classifier
which best exploits the useful features is chosen. The features may be further processed using classifier-specific feature selection and extraction techniques. A trial
application of the production classifier system is implemented, and classification
performance is estimated. If the performance of the system is not satisfactory, a
different classifier may be tried, or the whole process may iterate.
Because classifier-independent feature analysis does not make classifier-specific
assumptions, problems with a wide range of characteristics should be considered.
Features may have mixed distributions and multiple clusters. Noise features (features which contribute no classification information) and redundant features (fea-

296 Feature Extraction


tures which contribute no additional classification information, such as feature 2 in
Fig. 5.2) should be eliminated from the optimal subset. Features with different potentials for separation should have different nonparametric discriminatory power,
while features with the same potential should have the same discriminatory power.
More than two classes can be present, and the classes may overlap. Since the features are in general not assumed to be independent, multiple optimal subsets may
exist.
5.4.7
5.4.7.1

Potential for separation: nonparametric feature extraction


Potential for separation

The nonparametric discriminatory power of a feature is defined as the potential


for separation between classes induced by that feature. Thus the first step in designing RFI is to capture the features potential for separation. In a subset of the candidate feature set, it can be measured by extracting new, optimal features from the
candidate features and analyzing separability between classes using the extracted
features. The use of feature extraction reduces the sensitivity of our technique to
the representation chosen for the candidate features; i.e., makes the technique invariant to rotation, shift, and scaling of the candidate features. Feature extraction
extracts from the given features new features which are optimal in the sense that
they maximize the separation between the class-conditional joint feature distributions. By finding the optimal subset in the extracted space, rather than the original
space, RFI measures the potential of those original features for separation, while
minimizing the effect of their original representation.
5.4.7.2

Measuring separation: discriminant analysis

Discriminant analysis can be used to extract features that maximize the ratio
of the separation between classes to the spread within classes, as measured by the
between-class and within-class scatter matrices. Within-class scatter is a measure
of the scatter of a class relative to its own mean. Between-class scatter is a measure of the distance from each class to the mean(s) of the other classes. Within-class
and between-class scatter can be defined parametrically or nonparametrically. Parametric scatter matrices use the learning sample to estimate the distributions of the
features through estimation of parameters for an assumed distributional structure.
Nonparametric scatter matrices use the learning sample to perform local density
estimation around individual samples, and then measures scatter using the local
density estimates.
Parametric scatter matrices The parametric versions of the within-class and
between-class scatter matrices estimate the means of the classes based on the entire
learning sample. The parametric versions assume that a distribution can be characterized by its mean and covariance. Let  be the a priori probability of class #  , 
be the covariance matrix and $  be the mean of class # , be the total number of

Feature selection and dimensionality reduction for classification 297


samples, and  be the number of classes present in the learning sample. Parametric
within-class scatter is defined as the averaged covariance:


% 

  


The a priori probability is estimated from the learning sample as  , where


  , the sample covariance
 is the number of samples from # .  is estimated by 
matrix:

 


  






 $   $ 



#

where

  
$


 




Parametric between-class scatter is the scatter of the expected means around


the mixture means:


% 


 $  $ $  $ 

where $ is the mixture mean:




$ 

 $ 


The components of the between-class scatter matrix are estimated using the
learning sample in the same manner as the within-class scatter matrix.

298 Feature Extraction


local mean RFI uses nonparametric versions of the scatter matrices based
on versions proposed by Fukunaga and Mantock [40]. They based their nonparametric scatter estimates on local density estimates using the k-nearest neighbors (& ) technique. They defined the # local mean for a given class # and a
given sample  as

  




&





   



where    is the qth-nearest-neighbor in #  . When & is the number of samples in


# , the # -local mean reduces to the parametric mean





  


&





   $ 




  $




Because Fukunaga and Mantock experimented only with two-class problems,


they could use the # -local mean for calculating both within- and between-class
scatter.
Setting the parameter & While use of the local mean introduces the parameter
& , its behavior is well studied. With infinite sample size, the accuracy of the local
density estimation improves as & increases. With finite sample size, & is subject
to the problem of oversampling, otherwise known as Hughes phenomenon [41]. A
value of & which is too large for the sample size performs local density estimation
on nonlocal samples! A value of & which is too small for the sample size reduces
the accuracy of the local density estimation. In practice, & is generally set to a small
fraction of the number of samples [42].
Local out-of-class mixture mean To generalize Fukunaga and Mantocks approach to more than two classes, the local out-of-class mixture mean for each sample  is defined as

     &








  

where    is the qth-nearest-neighbor outside of #  . The local mixture mean


differs from the parametric mixture mean in that it excludes data from a samples
own class, and thus does not reduce to the parametric mixture mean.

Feature selection and dimensionality reduction for classification 299


Nonparametric scatter matrices Nonparametric within-class scatter is defined
as the averaged scatter, where scatter is around the local means:























 








  

When &   , the local mean reduces to the parametric mean, and therefore the
non-parametric within-class scatter matrix reduces to the parametric version. Nonparametric between-class scatter is measured as the scatter around the out-of-class
mixture means:














          










The between-class nonparametric scatter matrix does not reduce to its parametric form as does the within-class, because the out-of-class mixture means necessarily exclude same class samples, but the relationship is close when &   .
Choosing a distance metric The use of the & -nearest-neighbor local density estimates introduces the need to choose a distance metric for determining the distance
between a sample and its neighbors. Many distance measures have been proposed
for use with kNN error estimation [43]. Two commonly used metrics are the Euclidean distance:

   











and the Mahalanobis distance [42]:


   

  

 

 

Fukunaga and Mantock used Euclidean distance in their original work. Mahalanobis distance should also be considered (especially using Fukunaga and Mantocks original algorithm), since it incorporates information concerning the relative
variances of the features. Both metrics are considered as candidates in the design
of RFI.

300 Feature Extraction


5.4.7.3

Weighting factor

A further refinement introduced by Fukunaga and Mantock was the use of a


weighting factor to de-emphasize samples which lie far away from the classification
boundary:

' 





              






              

where  is the (Euclidean or Mahalanobis) distance from  to  , raised to the


( power. RFI uses the natural multiclass extension of Fukunaga and Mantocks
weighting factor:




      


' 

 



     

Using the weighting factor, the contribution of each   to scatter is inversely


proportional to its distance from the nearest classification boundary.
Setting the parameter ( Use of the weighting factor introduces the second parameter, (. Small values of ( allow samples far from the classification boundary to
overwhelm samples along the classification boundary. Large values of ( have the
effect of discarding samples with valuable information. It has been shown [36] that
the performance of RFI initially improves with increasing values for (, and then
decreases as ( passes an optimal point. Fukunaga and Mantock used (  for
their experiments. The results in [36] provide experimental data to support the use
of low values such as two or three for (.
Nonparametric scatter matrices (with weights) Thus, the final forms for nonparametric within-class and between-class scatter are (estimating components as
necessary using the learning sample):







' 













 








 

and







' 




           










Feature selection and dimensionality reduction for classification 301


The approach here emphasizes the between-class case, since the role of weights
is to emphasize the classification boundary, a concern for between-class scatter.
Indeed, it is shown elsewhere [36] that the use of weights for within-class scatter
will degrade the performance of RFI.
5.4.7.4

Calculating scatter in the whitened space

Unlike Fukunaga and Mantocks algorithm, RFI always calculates betweenclass scatter using the original data. Fukunaga and Mantock whiten the data first,
using  and  (the eigenvalues and eigenvectors, respectively, of % ), then calculate between-class scatter. For parametric discriminant analysis, calculating between-class scatter using the original data, ) , and calculating it using whitened
data:
* 



 $

 $ $  $ 

 )

yields the same results:




%



 



 % 



For nonparametric between-class scatter (even without weights), however, calculating in the whitened space rather than in the original space changes the results.
Consider










  












 

   






  



   

The equality holds when & 

   


 :


&

 








   

    






 

302 Feature Extraction


For & +   , the & -nearest neighbors in the ) -space are not necessarily
the & -nearest neighbors in the * -space, since distances are not preserved under the
whitening transform. Thus

     




   


and therefore

  




  



Scatter information is lost by calculating   in Y.


5.4.7.5

Optimality of the extracted features

The optimal extracted features are found by eigensystem decomposition of the


ratio of the between-class to within-class scatter matrices. Specifically, the optimality criterion used is the trace:
,  -%  %  

Thus, for both the parametric and the nonparametric forms, the eigenvectors
form the linear transform which maximizes , , the ratio of the between-class to
within-class scatter. The extracted features are optimal in the sense that they maximize separation between the class-conditional joint feature distributions in the rotated space.
Using the nonparametric scatter matrices, the extraction is based on local density estimation. Thus the results are a compromise between information provided
in the various clusters or regions belonging to a class.
5.4.7.6

Classifier-independent feature extraction

Although developed in the context of a classifier-independent approach to feature analysis, the feature extraction algorithm given below is useful and valuable in
its own right. When used as an extraction technique, the eigenvectors corresponding to the lowest eigenvalues are dropped, resulting in a reduction of computational
costs, and potentially an increase in computational performance, depending on the
classifier chosen. As illustrated in Fig. 5.1, a second round of classifier-specific
feature optimization may still be desirable once a classifier is chosen.

Feature selection and dimensionality reduction for classification 303


Algorithm 5.1: Algorithm for nonparametric feature extraction, using local
density estimation
a. Set the parameters: select & as a small fraction of the sample size; select (
as a small integer: usually 2 or 3.
b. Calculate within-scatter nonparametrically, estimating using the data sample
)






















 








 

where



  




&



   



c. Calculate between-scatter nonparametrically, estimating using the data sample )












' 


          






where

     &






  

and

' 




      

 






     




304 Feature Extraction


d. Calculate the eigenvectors  and eigenvalues  of    .
e. Discard the eigenvectors corresponding to the lowest eigenvalues, as the
eigenvalues are a measure of the separation captured by those eigenvalues.
f. Calculate the optimal extracted features as .    ) .

5.4.7.7

Invariance

Because nonparametric discriminatory power measures the potential of the features for inducing separability between classes, it is desirable that measures of nonparametric discriminatory power be invariant with regard to rotation, scaling, and
shift of features. Rotational and shift invariance eliminate the impact of irrelevant
details of the measurement method for the features. Scale invariance eliminates the
need for normalization of the features while preserving the critical information of
the ratio of between-class to within-class scatter. The invariance considered here
is in feature space, not in image space. And, although we seek image features
that are themselves invariant to changes in the image, it is equally valuable that
they maintain separability of image classes even when modifications are made in
feature space.
RFI is a function of the eigenvalues and eigenvectors of the parametric and
nonparametric scatter matrices. While the nonparametric scatter matrices are not
as well understood as the parametric scatter matrices, the nonparametric forms are
still symmetric. Therefore, functions of eigenvectors and eigenvalues retain the
same properties for both parametric and nonparametric scatter matrices.
Rotational invariance results from the extraction technique; since the optimal
features are extracted from the original features, rotation in the original feature
space has no impact. Scale invariance results from the use of the ratio of betweenclass to within-class scatter; since both within-class and between-class scatter are
equally affected by scaling a feature, the ratio removes the effects of scaling. Shift
invariance results from the use of scatter around the means, therefore the technique
is self-centering.
All three forms of invariance reduce to the issue of preserving class separability, which is invariant under any nonsingular transformation (including rotation,
scaling, and shift) [44]. Those transformations affect separability in the individual features (i.e., in the marginal feature distributions), but not between the classes
themselves. Thus, so long as none of the extracted features is discarded, RFI is
invariant.
5.4.8

Finding the optimal subset

The next step in calculating RFI is to find the optimal subset of the original
features. Finding the optimal subset is necessary because rankings are meaningful

Feature selection and dimensionality reduction for classification 305


only within a subset of features. Changing the elements of the subset can change the
rankings to an arbitrary degree, even if the features are assumed to be independent
[32]. Finding the optimal subset of the original features can also be a stopping point
for some applications of classifier-independent feature analysis, since the optimal
subset eliminates noise features and optimally selects among redundant features.
As defined above, the optimal subset of features is the smallest subset with the
maximum potential for separability between classes. The algorithm extracts a set
of optimal features from a set of original features, without the use of classifierspecific assumptions. The optimal subset of the original features can be found
by maximizing the separation induced between the class-conditional joint feature
distributions across all possible subsets of the original features, as measured using
the optimal extracted features. The optimal subset of features is thus the smallest
subset of original features which produces the maximum separation, measured in
the rotated space.
5.4.8.1

Multiple optimal subsets

Given the presence of redundant features, more than one subset of the same size
may produce the same amount of separation. When two or more smallest subsets
produce the same amount of separation, and that separation is the maximum separation found, then more than one optimal subset exists. The presence of more than
one optimal subset is not a problem; in both assisted and automatic classification,
it offers more options in the design of the classification system.
5.4.8.2

Concerning the monotonicity assumption

The criteria commonly used in parametric discriminant analysis to find the optimal subset of features are not generally applicable for the nonparametric case.
Criteria such as the trace of the ratio of the between-class to within-class scatter
matrices are based on the same simplifying assumptions as the parametric scatter
matrices. The trace, when calculated on parametric scatter matrices, is monotonic
as a function of subset size, reflecting the theoretical assumption that Bayes error
also decreases monotonically as a function of subset size.
Under conditions of limited sample size, however, the monotonicity assumption does not hold even for well-behaved data sets with unimodal Gaussian distributions, if the true distributions are not known and must be estimated. As the
number of features increases for a fixed sample size, so does the error in the estimation. A second concern is the cost of including each feature, in computer time,
in complexity, and sometimes in degree of invasiveness or risk, as can be the case
in medical diagnosis. In practice, whether for automatic classification or assisted
classification, having more features is not always better.

306 Feature Extraction


5.4.8.3

Nonparametric separability criteria

A nonparametric approach is to select the optimal subset based on the & -nearestneighbor error in the extracted space. The & error does not introduce any new
classifier-specific assumptions. Moreover, because & error is asymptotically at
most twice the Bayes error, calculating & error in the extracted space estimates
the theoretical lower limit on the potential classification error [45]. Using &
introduces a new parameter (& , the number of nearest neighbors used to calculate
). Fortunately, as discussed above, the behavior of & is well understood. Note that
while the value of & used to calculate the & error does not need to be the same
as the value of & used to calculate the local out-of-class mixture mean, the experiments presented in this research all have both & set to the same value. Thus the
& error is defined as

 

where the summation is over all  classes, is the number of samples, and    is
defined as the number of misclassified samples from class
when calculating class
based on the voting & -nearest neighbor procedure, using the transformed data. This
formulation allows for the introduction of a cost factor if errors in all classes are
not of equal consequence.
5.4.8.4

Exhaustive search

Finding the optimal subset requires exhaustive search, since any nonexhaustive technique can do arbitrarily poorly in the general case [46]. The assumption
of monotonicity, necessary for branch-and-bound algorithms to guarantee performance, is extremely restrictive and rarely justified in real problems [47]. Whenever
possible, exhaustive search should be done. For the purposes of evaluating different configurations of RFI, or for comparing estimators for nonparametric discriminatory power, exhaustive search is required. When applying RFI directly to real
problems which are too large to execute exhaustive search, suboptimal techniques
must be used. Both genetic algorithms and floating search offer promising paths
for suboptimal search [48]. The estimator as first proposed [49] uses a genetic
algorithm.
5.4.9

Ranking the features

The final step of RFI is to rank the features within the optimal subset. Only the
portion of the learning sample contained within the optimal subset is used. If more
than one optimal subset exists, ranking is done separately for each optimal subset.
RFI ranks features based on the contribution of the original features to the separation in the rotated space. The contribution of the original features to the sepa-

Feature selection and dimensionality reduction for classification 307


ration in the extracted space can be estimated using the eigenvectors and eigenvalues of the optimal transformation. The magnitudes of the eigenvalues measure the
amount that each original feature contributes to each extracted feature. The normalized eigenvalues estimate the amount of separability contributed by each extracted
feature to separation in the extracted space. Thus the normalized eigenvalues can be
used to estimate separability in the rotated space, and the eigenvectors can be used
to estimate the amount that each original feature contributes to that separability.
5.4.9.1

Average absolute weight size

A technique for ranking features using only the eigenvectors was proposed
by Mucciardi and Gose [38]. The technique, the Average Absolute Weight Size
(AAWS), averages the magnitudes of the eigenvectors to estimate the contribution
of the original features to separation. For the
th of / features, where    is the
weight for feature
in the  th extracted feature, 0 %  is given by:
AAWS 


/



/    

Mucciardi and Gose also tried sorting the eigenvectors by eigenvalue, retaining
only those extracted features which accounted for some threshold amount of separation in the extracted space and recomputing AAWS using only the retained extracted features. Both techniques, however, use only the eigenvector information.
Thus, AAWS measures the contribution of each original feature to the extracted
space, rather than to separation in the extracted space. Eliminating the extracted
features which contribute the least separation in the extracted space includes separation information, but introduces a tuning parameter which must be set ad hoc.
5.4.9.2

Weighted absolute weight size

The contribution of the original features to the separation in the extracted space
can be estimated without tuning parameters by using the Weighted Absolute Weight
Size (WAWS). WAWS uses the normalized eigenvalues to measure the contributions of the original features to the extracted features by the proportion of separation the extracted features contribute to separation in the extracted space. The
WAWS of feature
is:
WAWS 




/ 1   

where 1 is the normalized eigenvalue for extracted feature  .

308 Feature Extraction


5.4.9.3

Deriving the ranks from the raw WAWS values: statistical model

Features with statistically distinct WAWS values are given different ranks. To
determine whether WAWS values are distinct, a randomized block analysis of variance (ANOVA) is performed, and intervals constructed around the differences between treatment means using the multiple comparisons formula. Each feature is
thus a treatment, and each data set (optimal subset only), a block.
The null hypothesis, that there is no difference between treatments, is tested
[36].
Features with intervals around the differences from all other features are given
distinct ranks. Groups of features in which some features have distinct WAWS
values, but others do not, are given a single rank. For example, if features 2 and 3
have distinct WAWS values, but feature 4 overlaps both features 2 and 3, all three
features are assigned a single rank. Features not in the optimal subset have rank
zero. Features (or groups of features) with distinct ranks are ranked based on their
treatment means, with the largest distinct treatment mean being assigned the highest
rank. Higher ranks indicate greater discriminatory power. Additional details of the
process appear in [36], along with the results of extensive experimentation.
5.5
5.5.1

Features in practice
Caveats

Diagnosis is the province of physicians. The value of features is measured


ultimately by their ability to contribute to reliable diagnoses made by physicians.
Questions of safety and effectiveness must be answered to the satisfaction of the
physicians involved in the diagnostic process. Thus, the features proposed and
selected on the basis of their representational properties or of their ability to discriminate are subject to considerable additional scrutiny from clinicians. There is
no guarantee that a feature found to be useful mathematically will be acceptable
clinically.
5.5.2

Ultrasound tissue characterization

Much of this section is taken from recent work [50] that has made clear the very
real possibility of identifying tissue types reliably using the observed ultrasound
radio-frequency (RF) signal (see also Chapter 9). A candidate set of features was
selected, based in part on knowledge of the scattering properties of various kinds
of tissue; the features were extracted from a set of labeled samples and evaluated
in several classifier structures. The specific objective was to provide, in data from
the liver, reliable discrimination between hepatitis and normal tissue; a secondary
goal was to reduce the dependence of the features on the imaging system used to
acquire the data.
This section introduces and describes several new parameters that represent
specific characteristics of ultrasound scattering from soft tissues. In the spirit of

Features in practice 309


this chapter, they were proposed based on their relevance to what is known about
the basic echogenicity of liver tissue. Procedures used to validate those parameters
using simulation, phantom, and real data are discussed.
5.5.2.1

Texture parameters

Although texture analysis methods such as the co-occurrence matrix, run-length


histograms, and ring and sector sums of the two-dimensional frequency plane have
shown some success, two issues should be addressed to make those methods more
attractive for wider applications. It is desirable to have the parameters extracted
in the texture analysis related either directly or indirectly to some physical property of ultrasound scattering. And, since Burckhardt [51] and Wagner et al. [52]
have established that the transducer characteristics are primarily responsible for the
speckle patterns in ultrasound images, it is desirable to have texture analysis techniques that significantly reduce this dependence on the transducer characteristics
so that intrinsic characteristics of the tissue can be measured.
There is no work in the literature that links texture parameters measured from
ultrasound signals directly to any physical properties of tissues. Thijssen et al. [53]
did, however, examine the correlation between texture parameters and parameters
extracted from the RF signal that are based on scattering properties of the underlying tissue. They found that, among the texture parameters, the Entropy and Correlation parameters extracted from the co-occurrence matrix had the highest correlation
(r = 0.78 and 0.66, respectively) with the feature     2 , as defined by Insana
et al. [54]. This feature is a measure of the variability in the specular component,
, normalized by the diffuse component, 2 , of the RF signal. Based on this indirect relationship, co-occurrence matrix-based methods have been the focus of the
texture analysis.
Computation of the co-occurrence matrix parameters The computation of the
co-occurrence matrix proceeds as described above; following is a description of
the parameters extracted from the matrix and of the steps taken to reduce the dependence of those parameters on the imaging system characteristics. The first step
in computing the co-occurrence matrix is to demodulate the RF signal to produce
an envelope-detected image. This is accomplished through the use of a Hilbert
transform as described in [50].
Once the co-occurrence matrix was computed, the following parameters were
extracted:

maximum probability:     ,


element difference moment:

inverse element difference moment:


entropy: 

   .


    ,

 
   

  ,

310 Feature Extraction


Mia et al. [55] reported that the four texture parameters exhibited fairly high
correlation (- " ) with each other. Based on that, the Entropy (ENT) feature
was retained and the three other features were eliminated from the analysis. The
ENT feature in combination with a feature derived from cepstral analysis, which
will be discussed below, provided good classification performance (  = 0.86

0.04) at the task of distinguishing normal from hepatitis livers. For the current
texture analysis approach, the ENT feature has been retained and the correlation
of the co-occurrence matrix (COR) feature has been added based on the indirect
relationship to physical parameters as reported by Thijssen et al. [53].
Using the transducer to define relationship operator The distances,  and 
(that define the relationship operator ), to the neighbor pixel should be chosen to
reduce the effects that the transducer characteristics have on the extracted parameters. At the same time the chosen values of  and  must produce parameters that
distinguish various textures. Previous researchers have used fixed values of  and
 . Kadah et al. [56] defined the neighbor as  = 4 and  = 0. Raeth et al. [57]
defined a set of neighbors using all combinations of  = 2, 3, and 4 and  = 2, 3,
and 4. Mia et al. [55] defined the fixed neighbor distance as  = 10 and  = 3.
The work by Valckx and Thijssen [58] provides some guidance on how to
choose an effective neighbor distance. They found that two factors affected the
optimal choice of the displacement used in computing the co-occurrence matrix
parameters. As the distance separating the neighbor pixels decreased, the features
provided better discrimination between various textures. This suggests that a small
displacement would be optimal. The texture parameters, however, are very sensitive to the characteristics of the transducer at small displacements. The parameters
tend to become independent of those factors when the displacement is greater than
a resolution cell size (-6 dB width) of the transducer. This suggests that a large
displacement would be optimal. Given these competing requirements, Valckx and
Thijssen [58] suggest that the optimal displacement would be the smallest displacement at which the parameters are independent of transducer effects.
Following this guideline, the definition of neighbor used in this work was  =
resolution cell size in the axial direction and  = resolution cell size in the lateral
direction. The resolution cell size of the transducer is measured by computing the
auto-correlation function of the envelope-detected signal. The maximum of this
function occurs at the zero lag point. The full-width at half-max (FWHM) distance
of the auto-correlation function is used as the estimate of the transducer resolution
cell size. This is illustrated in Fig. 5.4. The auto-correlation function is computed
at various depths to account for changes in the resolution cell size due to diffraction
effects. The resulting values of  and  at each depth are used to compute the
entries in the co-occurrence matrix associated with pixel pairs at that depth.
Simulation data described in [50] were used to validate the hypothesis that using the transducer resolution cell size as the definition of neighbor when comput-

Features in practice 311

Figure 5.4: Auto-correlation function of the envelope-detected signal from an ultrasound


scan of the liver depicting how the full-width at half-max distance is computed.

ing the co-occurrence reduces the dependence on the transducer characteristics.


For that simulation, 60 combinations of scattering parameters per resolution cell,
strength of regular scatterers, spacing of regular scatterers, and variability of regular
scatterers were used to generate various textures. For each combination of scattering parameters, two transducer settings (different center frequency and bandwidth)
were used to generate the simulated data.
The two texture parameters (ENT and COR) were extracted from each of the
simulated data files using a fixed separation ( = 7 and  =3) of neighboring pixels and using the resolution cell size of the simulated transducer as the separation
distance. For each method of computation, the correlation between the two transducer settings was computed for each feature. The entire experiment was run 100
times to enable computation of the mean and standard deviation of the correlation
for each feature. The results of this simulation experiment are summarized in the
table below.
Using the resolution cell size of the transducer as the definition of neighboring pixels does not eliminate the influence of the transducer characteristics on the
values of the extracted parameters. If that were the case, there would be perfect
correlation (r = 1.0). This approach does, however, significantly increase the correlation of the texture features measured from the same scattering medium using
different transducers.

312 Feature Extraction


Table 5.1: A comparison of the correlation of the ENT and COR texture features between
two transducer settings, as computed using fixed and variable (resolution cell size) separation between neighboring pixels.

Feature

ENT
COR

5.5.2.2

Correlation with fixed


separation between
neighboring pixels
0.38
0.12
0.32
0.10

Correlation with transducer


resolution cell size separating
neighboring pixels
0.63
0.14
0.58
0.11

Cepstral parameters

Previous work by Fellingham and Sommer [59] and Suzuki et al. [60] suggests
that the mean scatterer spacing is a feature of the scattering media that may be
useful in distinguishing diffuse diseases. The cepstrum (specifically, the power
cepstrum) has been used by Suzuki et al. [60], Wear et al. [61], and Kadah et al. [56]
to detect regularly spaced echoes in the RF signal. Those echoes were attributed to
regularly-spaced scatterers in the tissue. Thus, the spacing (in time) of those echoes
in the RF signal are related to the spacing (in distance) of scatterers in the tissue.
The power cepstrum, first developed by Bogart et al. [62], is defined as the
inverse Fourier transform of the logarithm of the magnitude spectrum:
   2/ 3  
) #   where ) #   / 3  

(5.1)

Where / 3 is the Fourier transform operator and 2/ 3 is the inverse Fourier
transform operator. If  is a real sequence then ) ' is a real even sequence,
and   is a real sequence. So the power cepstrum is a real sequence when the
input is a real sequence (the term power refers to the fact that logarithm is taken of
the power spectrum). When the magnitude of ) #  is found prior to computing the
logarithm, the phase information in the original signal is discarded. A consequence
of discarding the phase is that the power cepstrum is not invertible. Details of the
calculation are contained in [50].
One drawback of the power cepstrum-based approach to estimating scatterer
spacing is that it disregards any information that is contained in the phase of the
signal. The success of Varghese and Donohue [63, 64] in estimating scatterer spacing using the autocorrelation of the frequency spectrum, which does utilize the
phase information, suggests that there is some advantage to retaining the phase
information when processing the RF signal.
Mia et al. [65] suggested using the complex cepstrum to identify periodic scattering. The complex cepstrum is an approach in which both the magnitude and
the phase of the time domain signal are considered. The complex cepstrum, as

Features in practice 313


described in Oppenheim and Schafer [66] is the inverse Fourier transform of the
logarithm of the complex frequency spectrum:
   IFT  
) #   

If the input signal  is real, then the magnitude of the Fourier transform, 
[and 45 ] is even and the phase, 6, is odd. So the complex cepstrum,   , is
real and even. The term complex refers to the fact that the logarithm is performed
on a complex value. An advantage of performing the complex logarithm is that
the phase information of the original signal is retained and the complex cepstrum,
 , is invertible.
While the presence of regularly spaced scatterers is a sufficient condition to
cause peaks in the cepstrum, it is not a necessary condition. Kuc et al. [67] showed
that various distributions of scatterer spacing can result in peaks in the cepstrum.
Even when regularly spaced scatterers are present, they are expected to be relatively
weak, and thus, not easily detected. This detection can be improved by signal
averaging.
Consider the RF signal from a single A-line scan as shown in Fig. 5.5. The cepstrum of this signal is shown in Fig. 5.6. The periodic component of the scattering
is not evident in this figure. But when of these spectra, from adjacent A-lines
within an ROI, are summed, the signal due to the periodic scatterers increases
by a
due to the random scatterers increase by a factor of .
factor of while those
This provides a gain of for the periodic component. The averaged cepstrum
(  ) is shown in Fig. 5.7. The effect of the periodic scatterers is clearly visible
as a peak in the cepstrum.
Simulation and phantom data [50] were used to quantify the performance of the
complex-cepstrum approach to scatterer spacing estimation. The simulation data
can be controlled well and the actual values are known precisely. The phantom
data studied by Wear et al. [61] were used to provide a more realistic test.
To compare the relative effectiveness of the power and complex cepstra at the
task of estimating scatterer spacing, an objective performance measure is needed
to judge the effectiveness of a given approach. The task is to detect the main peak
(harmonics are also present) in the cepstrum shown in Fig. 5.7. The detectability
of those peaks under various conditions will serve as a measure of effectiveness for
each approach. One objective measure of the detectability of a peak in the presence
of noise is the number of standard deviations separating the peak from the mean
value. This is the basis of the constant false alarm rate (CFAR) detection technique,
as described by Skolnik [68], that is commonly used in radar applications. The
objective measure of performance is the signal excess (SE = [peak-mean]/standard
deviation) in units of standard deviations. This is illustrated in Fig. 5.8.
Several parameters that are not directly influenced by characteristics of the
transducer are extracted from the complex cepstrum. The first parameter is the

314 Feature Extraction

Figure 5.5: The RF signal of a single A-line from within a ROI of a liver scan.

weighted average of the peaks. It is more likely that components of biological tissues, such as the portal triads in the liver, have a range or several ranges of scatterer
spacing rather than a single dominant spacing. For this reason, the cepstrum is
searched in the range 0.752.5 mm for all peaks that have signal excess greater
than 1.5 standard deviations. The magnitude-weighted average of all such peak
locations is a cepstral parameter called PCEP. The average magnitude of all such
peaks, normalized by the mean of the cepstrum, is a second cepstral parameter
called MCEP. The final parameter extracted from the complex cepstrum is the ratio
of the energy in the low-quefrency 1 portion of the cepstrum to the entire cepstrum.
This parameter, called RCEP, is a measure of the proportion of the backscattered
power that is due to unresolvable scatterers. The low quefrency portion of the
cepstrum represents reflections that are close together spatially, and thus are not
resolvable by imaging system.
5.5.2.3

Phase coherence

The idea of using phase coherence to identify regularlyspaced structures in the


scattering medium was first proposed by Weng et al. [69, 70] and later used by
Molthen et al. [71] to perform tissue classification. They used the technique to estimate the mean scatter spacing of those structures. A similar approach is used here
as a measure of the level of structured regularity present in the scattering medium.
1
In the original paper on the cepstrum [62], a set of terms was defined, each of which is an
anagram of a corresponding term used in conventional Fourier analysis. Thus spectrum becomes
cepstrum, frequency becomes quefrency, phase becomes saphe, etc.

Features in practice 315

Figure 5.6: The complex cepstrum of the RF signal of the single A-line scan shown in
Fig. 5.5.

In the case of purely diffuse scattering, there are a large number of scatterers
within the resolution cell of the transducer. The resulting RF signal is the accumulation of all the random reflections from within a resolution cell. When the number
of scatterers per resolution cell is sufficiently large and the phases of the individual
reflections are randomly distributed between  and  , the phase of the resulting
(accumulated) signal is uniformly distributed between  and  . If, however, there
is structure in the scattering medium, then the reflections from those sites will have
some nonrandom phase relationship. The presence of long-range order results in
some phases occurring more frequently than others do and will be evident in a
histogram of phase.
Structures with long-range order will result in coherent scattering of certain frequency components of the ultrasound pulse. The wavelength, 7, of each frequency
component, , is related to the speed of sound, , in the tissue, and structures that
are located at integral multiples of a wavelength will produce coherent scattering of
frequencies associated with that wavelength. The relative amount of coherent scattering present in the RF signal is characterized by analyzing the phase distribution
of the RF signal at various frequencies. This analysis can be performed only for
frequencies that are within the usable bandwidth of the transducer.
The first step in this process is to use a Hilbert transform to produce the analytic
signal (real and imaginary components) of the RF signal. The analytic signal then
is demodulated at the frequency of interest. This is accomplished by multiplying
the analytic signal by a complex phasor at the demodulation frequency. Next, the

316 Feature Extraction

Figure 5.7: The complex cepstrum averaged from the RF signal of 27 adjacent A-line scans.

instantaneous phase of each sample of the demodulated signal is computed. A


histogram of the instantaneous power of the demodulated RF signal as a function
of the instantaneous phase is computed [50]. The normalized phase profile resulting
from a simulated image containing only diffuse scattering components is shown in
Fig. 5.9. The normalized phase profile resulting from a combination of diffuse
and periodic scatterers, spaced nominally at 1.54 mm, is shown in Fig. 5.10. The
phase profile of the image containing only diffuse scatterers is relatively uniform
compared to that of the image containing both components. One parameter that
characterizes this nonuniformity is the maximum deviation from a flat surface. The
maximum value of the normalized phase profile (MPRF) is extracted as a parameter
from this analysis.
The maximum deviation provides information about nonuniformity at a single
phase location. To get a measure of the systematic deviation from a uniform distribution, the coefficient of variation, which is the standard deviation normalized by
the mean, is computed for the phase distribution at each demodulation frequency.
For a perfectly uniform distribution, the coefficient of variation will be zero. The
coefficient of variation increases as the distribution deviates from uniform. The
coefficient of variation, as a function of the demodulation frequency, for the phase
profile of Fig. 5.9 (diffuse scatterers only) is shown in Fig. 5.11. The coefficient
of variation, as a function of the demodulation frequency, for the phase profile of
Fig. 5.10 (diffuse and periodic scattering components) is shown in Fig. 5.12.

Features in practice 317

Figure 5.8: Illustration of the computation of SE = [Peak-Mean]/Std Dev. The shaded region
in the image represents  1 Std Dev about the mean.

5.5.2.4

Intrapatient variability of parameters

Concerns about heterogeneity of regions within an organ can be addressed by


extracting parameters from several regions within it. The values from the various
regions are averaged to produce a single set of parameters for each patient. Those
(averaged) parameters are then used to discriminate between healthy and diseased
patients. To investigate the usefulness of tissue heterogeneity in discriminating diffuse disease in the liver, the intrapatient variability is computed for various parameters extracted from multiple regions within the liver. The coefficient of variation
of a parameter serves as a measure of tissue heterogeneity with respect to that parameter.
For each subject, the digitized RF signal from five or six regions within the liver
are available for analysis. Each region is approximately 25 mm wide (lateral) and
40 mm long (axial). Using only five or six samples of a parameter to compute the
standard deviation will result in a high variance of that statistic. To alleviate that
problem, each region is divided into a number of subregions of size 8 mm in the
axial direction by 12 mm in the lateral direction. This produces 32 to 60 samples
of the parameter from which to compute the mean and standard deviation.
This technique was applied to some of the parameters previously described in
this chapter. This technique could not be applied to the texture or phase parameters
because the small subregions do not contain enough pixels to adequately fill the
two-dimensional co-occurrence matrix or phase profile. Parameters extracted from

318 Feature Extraction

Figure 5.9: The normalized phase profile of a simulated incoherent signal resulting from
the presence of diffuse scattering components only.

a sparse co-occurrence matrix or phase profile may not provide valid measures.
Similarly, the small subregions do not provide enough data for the prediction filters in the Wold decomposition. The point SNR (mean/standard deviation) of the
envelope signal of each subregion is used as a simple substitute for a texture measure. Wagner et al. [52] showed that the point SNR takes on a value of 1.91 for
purely diffuse Rayleigh scattering and deviates from that value as other scattering
components are introduced.
The coefficient of variation of the three cepstral parameters and the point SNR
are computed to yield, respectively, the following four intrapatient variability features: VPCEP, VMCEP, VRCEP, and VSNR. The simulation experiments used to
analyze the parameters described previously in this chapter are not useful in evaluating these intrapatient variability features.
The new parameter extraction methods produced a set of 12 features that were
used in the analysis of the clinical data. Simulation experiments indicate that using
the resolution cell size of the transducer to define the distance to the neighbor pixel
in the co-occurrence matrix computation indeed does increase the reproducibility
of the texture features ENT and COR, as compared to using a fixed distance.
For the cepstral features, it was shown that peaks in the cepstrum, resulting from
repetitive structures in the RF signal, are more detectable, under a variety of signal
conditions, when the complex cepstrum is used instead of the power cepstrum.
Simulation [50] showed that the parameters extracted from the cepstral analysis
(PCEP, MCEP, and RCEP) exhibited different levels of correlation between two

Features in practice 319

Figure 5.10: The normalized phase profile of a simulated signal containing coherent and
incoherent components resulting from the presence of diffuse and periodic scattering components.

transducer settings (a useful first step in the quest for transducer independence);
PCEP was fairly highly correlated, RCEP was moderately correlated, and MCEP
was weakly correlated.
The phase profile was used to compute the two phase coherency features MPRF
and MVPRF. The RWLD feature was computed from the Wold decomposition of
the scattering field estimated via the deconvolution of the RF signal. The deconvolution of the RF signal was performed using an approach modified from the cepstral
mean subtraction technique commonly used in speech processing applications [50].
5.5.2.5

Resultsfeature selection

The parameters described above were extracted from the RF ultrasound signals
collected from each subject in the clinical data sets. The first step in evaluating
the parameters performance in the task of distinguishing normal livers from those
with hepatitis is to determine which parameters to use in the several designs of
the classification system. The subset selection techniques described earlier (and
in [36]) were not available at the time that these experiments were performed, and,
in light of the Hughes phenomenon, it was nevertheless essential to choose subsets.
The clinical data sets used in this work were relatively small. The data set from
machine D includes 36 normals and 50 cases of hepatitis. The machine A data
set includes 37 normals and 19 cases of hepatitis. The dimensionalities of the

320 Feature Extraction

Figure 5.11: The coefficient of variation of the phase profile, as a function of the demodulation frequency, for a simulated image containing diffuse scatterers only.

classifiers discussed above are restricted to   for data set D and to 


for data set A because of the small number of data points available. This leads to
the next step of the classifier design process, which is to identify the subset of the
available parameters that will result in the best classification performance.
Starting with 12 candidate features (Table 5.5.2.5), there are 12 possible subsets of dimensionality one; there are 66 possible subsets of dimensionality two;
and 220 possible subsets of dimensionality three. This is a total of 298 possible
subsets. There are several possible methods of attempting to select the optimal dimensional subset of  features. One approach is to compute the classification
performance of the  individual features, and select the  features that have the
best performance by themselves. That approach does not take into account the correlation between features. The data set D was used to identify any redundant features, based on the correlation matrix, of the 12 candidate features identified above.
This reduced the number of feature combinations to be evaluated. That data set
was then used to identify a small number of feature combinations, of the remaining
features, that provided good classification performance. This was accomplished by
exhaustive search of all combinations of 1-, 2-, and 3-feature subsets.
All the features that were included as a part of any of the combinations identified by the above process were extracted from the matched data set. Any features
that had very poor correlation between the two imaging systems were eliminated.
This further reduced the number of feature combinations. The remaining few fea-

Features in practice 321

Figure 5.12: The coefficient of variation of the phase profile, as a function of the demodulation frequency, for a simulated image containing diffuse and periodic scattering components.

ture combinations (those that provided good classification performance with data
set D and had reasonable correlation between the two imaging systems) then were
evaluated on data set A to identify feature combinations that provided good classification performance with both data sets.
5.5.2.6

Resultsclassifier performance

The parameters described above were extracted from all of the regions of interest (ROIs) or subregions associated with each subject in data set D. The values of
the first eight of the twelve parameters were averaged over the six ROIs available
for each subject to produce a single feature value for each of the eight parameters.
For the remaining four parameters, the coefficient of variation was computed of the
feature values obtained from all of the subregions associated with each patient. A
correlation matrix was computed for the set of twelve features.
Using a correlation value of 0.75 as the threshold for redundant features, there
are two pairs of features that meet that criterion. The ENT and COR features have
a correlation coefficient of -0.86; the COR and MPRF features have a correlation
coefficient of -0.90. One or more of those three features can be eliminated without
significant loss of information. The decision of which feature(s) to eliminate can
be aided by computing the Mahalanobis distance for each feature.
The Mahalanobis distance is a measure of the separation between the means
of a feature (normalized by the standard deviations) computed for the two classes.

322 Feature Extraction


Table 5.2: Set of 12 candidate features.

Feature
ENT
COR
PCEP

Method
Texture
Texture
Cepstral

MCEP

Cepstral

RCEP

Cepstral/Coherence

MPRF

Phase coherence

MVPRF

Phase coherence

RWLD

Coherence

VPCEP

Intrapatient
variability
Intrapatient
variability
Intrapatient
variability
Intrapatient
variability

VMCEP
VRCEP
VSNR

Description
Entropy of co-occurrence matrix
Correlation of co-occurrence matrix
Weighted average of locations of
cepstral peaks
Weighted average of magnitudes of
cepstral peaks
Ratio of low-to-high portion
of cepstrum
Normalized magnitude of peak of
phase profile
Max of coefficient of variation of
phase profile
Ratio of predictable and random
components of Wold decomposition
of deconvolved RF signal
Intrapatient variability of PCEP
Intrapatient variability of MCEP
Intrapatient variability of RCEP
Intrapatient variability of
point SNR of envelope

While a low value does not necessarily mean a feature provides no separation between the two classes (separation may still be provided using a quadratic or other
more complex classifier), a high value is a good indication that the feature will
provide good separation. Of the three features identified as having high mutual intercorrelation, the MPRF feature has the largest Mahalanobis distance. This feature
is retained for further analysis. The feature COR, highly correlated with MPRF, is
eliminated from the analysis. The ENT feature, also correlated with COR, is retained for further analysis. This leaves eleven features from which to select feature
combinations that perform well.
The leave-one-out (design with    features and test with the th; perform 
times) and resubstitution (design and test with all samples) test methods [44] were
used to measure the classification performance of linear, quadratic, and & -nearest
neighbor (&  ) classifiers at the task of discriminating between the normal and

Features in practice 323


Table 5.3: The best performing three-feature combination for each type of classifier design
along with the expected

value and standard deviation, as computed using the leave-one-

out and resubstitution test methods on data set D.

Classifier
Design
Linear
Quadratic
k-NN

Best
Three-Feature
Combination
MPRF MVPRF
VRCEP
PCEP MPRF
VRCEP
RCEP MPRF
RWLD

Leave-one-out
Performance

Resubstitution
Performance

0.89
0.03

0.91
0.03

0.90
0.03

0.92
0.03

0.93
0.03

Not Applicable

hepatitis cases in data set D. An exhaustive search of the 11 single-feature, 55 twofeature, and 165 three-feature combinations, resulting from the eleven remaining
candidate features, was performed to identify specific combinations that provided
good performance.
The classification performance as measured by the area under the receiver operating characteristic (ROC) curve,   , was computed for all the feature combinations discussed above. (See Chapter 10 of this volume for a discussion of ROC.) A
threshold level was set at   0.82 to identify feature combinations that yielded
good classification performance. No single feature provided classification performance above this level. Almost half of the three-feature combinations (80 out of
165) provided classification performance of   0.82 using one or more of the
classifier designs. The best three-feature performance was dependent on the classifier design that was used. The best performing three-feature combination for each
classifier design is listed in Table 5.5.2.6. The ROC curves resulting from the three
leave-one-out test conditions are shown in Fig. 5.13.
While the best-performing feature combinations selected above may be biased
due to the selection process 2 , the fact that almost half the three-feature combinations resulted in classification performance of   0.82 indicates that data set D
is indeed separable using a three-feature classifier.
Even though many of the three-feature combinations provided good classification performance with data set D, to design classifiers that will work across imaging
2

Performing exhaustive search of all feature combinations results in selecting the optimal subset
but can also result in a selection bias as described by Raudys and Jain [72]. The bias results from the
fact that the measured classification performance of each feature combination is an estimate of the
actual performance of that feature combination. Each estimate deviates from the actual performance
by some estimation error that can be assumed to be Gaussian distributed without loss of generality.
Some estimates will underestimate the performance; others will overestimate the actual performance.

324 Feature Extraction

TPF

0.8
0.6
0.4
0.2
0
0

0.2

0.4

0.6
FPF

0.8

Figure 5.13: ROC curves for a linear (solid line), quadratic (dashed line), and k-NN (dotted
line) classifier resulting from leave-one-out testing of the D data using the best three-feature
combination for each classifier design. There are no significant differences between the
three ROC curves.

systems using the limited data available required that the dimensionality be reduced
to two-feature combinations. This made it possible to identify feature combinations that also provided good classification performance for the much smaller data
set A. Only 13 out of the possible 55 two-feature subsets produced classification
performance exceeding the threshold level of   0.82 for one or more classifier
designs. Those were evaluated in the same way as the three-feature sets, yielding
the results shown in Table 5.5.2.6.
5.5.2.7

Conclusions

Two of the five two-feature combinations under consideration provided good


classification results also using the data set A. The combination of the RWLD and
VRCEP features provides good classification performance for both data sets using
any of the three classifier designs. There are no significant differences between the
performances of the three classifier designs using this combination of features. The
ROC curves resulting from the leave-one-out testing of a linear classifier using the
D and A data sets are shown in Fig. 5.14.
Part of the goal of this work was to evaluate systematically the reproducibility
of tissue characterization parameters extracted from in vivo ultrasound data. The
correlation of features across imaging systems, unfortunately, as measured from

Features in practice 325


Table 5.4: The best performing two-feature combination for each type of classifier design
along with the expected

value and standard deviation, as computed using the leave-one-

out and resubstitution test methods on data set D.

Classifier
Design
Linear
Quadratic
k-NN

Best
Two-feature
Combination
MPRF MVPRF
PCEP VRCEP
PCEP RWLD

Leave-one-out
Performance

Resubstitution
Performance

0.88
0.04
0.85
0.04
0.86
0.04

0.89
0.04
0.85
0.04
Not Applicable

the matched data set, was quite poor. Some of the features did exhibit a moderate
level of correlation that was significantly higher than other features. None of the
features, however, was highly correlated across the two imaging systems, despite
the fact that the characteristics (center frequency and bandwidth) were similar for
the two transducers. Other characteristics of the transducers, though, were quite
different. It is not clear how much of the lack of correlation can be attributed to
differences in the imaging systems and how much to the difficulty in acquiring the
same regions of interest, in the same image plane, for successive in vivo ultrasound
scans in a clinical environment.
Despite the lack of good correlation between imaging systems, several of the
two-feature combinations provided reasonable classification performance for both
data sets. One feature combination in particular RWLD and VRCEP produced very good classification performance, using a simple linear classifier design,
for both the D ( = 0.86
0.04) and A ( = 0.86
0.06) data sets. While that
feature combination provides good classification performance for both data sets,
it does require the classifier to be trained separately with those two features from
each data set.
All the classification analysis indicates that a simple linear classifier design performs as well as or better than the more complex quadratic and &  classifier
designs at the task of distinguishing normal livers from cases of hepatitis using features extracted from ultrasound RF signals. No significant improvement is achieved
through the use of the more complex classifier designs. The features were the key
element, and the search continues for ultrasound-system-independent criteria.
5.5.3

Breast MRI

This section is based on recent work that has shown that fractal-based features
properly defined can add substantially to the characterization of shape and its
value in classification of breast masses [73]. The motivation again is the nature of
the tissue and what it suggests as basic descriptive information.

326 Feature Extraction

1
0.8
TPF

0.6
0.4
0.2
0

0.2

0.4

0.6

0.8

FPF
Figure 5.14: ROC curves resulting from leave-one-out testing of the D (solid line) and A
(dashed line) data sets using a two-feature (RWLD and VRCEP) linear classifier.

5.5.3.1

Background

To improve the detection and staging of breast cancer, magnetic resonance


(MR) imaging is being developed as an alternative approach to mammography.
Results of early studies suggested that it was not possible to detect and characterize breast lesions on the basis of signal intensities of T1- and T2-weighted images [7476]. Follow-up research, however, has shown that injection with gadopentetate dimeglumine enhances breast lesions [77], although not all enhanced lesions
are malignant. In addition, gadolinium-enhanced MR breast images reveal cancers
that are not visible at mammography [78, 79], and MR imaging shows the location
and extent of tumors better than mammography or physical examination [80].
Nunes, Schnall, and others [81] developed a decision tree approach to the interpretation of architectural features as a means of improving the diagnosis of MRI
breast masses. They concluded an interpretation model that incorporates breast
MR architectural features can achieve high sensitivity and improve specificity for
diagnosing breast cancer. In their decision tree, the primary determinant of malignancy is the characterization of the border (B). Secondary determinants of malignancy are septation (S), rim enhancement (R), signal intensity (I), and uniformity
of signal density (D). Here, we refer to those five expert-observer architectural features as the full Nunes-Schnall feature set (abbreviated as BRSID).

Features in practice 327


The fractal dimension of the mass boundary is a measure of the change in
boundary roughness as resolution is increased from a coarse to a fine scale. Clinical interest in the fractal dimension of the lesion border is based on the observation
that border roughness distinguishes benign masses from malignant ones [82]. Although prior researchers have studied the effectiveness of fractal dimension as a
means of characterizing breast lesions [8387], none of those studies found the
fractal dimension to be clinically effective for distinguishing malignant conditions
from benign.
This work evaluated the feasibility of using statistical fractal-dimension features to improve discrimination between benign and malignant breast masses at
MR imaging. Our hypotheses were that the lack of clinical success in previous
fractal-dimension studies was caused by the lack of robustness in estimating the
value of the fractal dimension and that overall discrimination would be improved
when a robust fractal-dimension feature was added to the Nunes-Schnall feature
set.
5.5.3.2

Test data

Test data were regions of interest (ROIs) from 16-bit MR images of focal
masses of the breast. All images were acquired with a three-dimensional fatsuppressed, radio-frequency, spoiled gradient-echo sequence on a 1.5T system. The
images used in this study were obtained during the first 90 seconds after the delivery of a 20-ml bolus of a gadolinium-based contrast agent, gadopentetate dimeglumine. The resulting images consist of 512x512x28 pixels and were obtained in the
sagittal plane from an acquisition matrix of size 512x256x32. For each test case,
we used a single, representative two-dimensional section. The field of view ranged
from 16 to 22 cm, and slice thicknesses ranged from 1.5 to 4.0 mm, depending on
the size of the breast. The ROIs were identified and defined to be rectangular, with
the focal mass approximately in the center of the rectangle. The smallest ROI was
24x27 pixels; the largest was 89x102 pixels.
The test cases, which included 20 benign and 32 malignant masses, were obtained by essentially the same procedures as in the Nunes-Schnall study [81]. Fortyeight cases were selected on the basis of the availability of recorded expert architectural features and suitability of the ROI size for fractal analysis. Four other cases
were identified as having border characteristics that made them difficult to diagnose (one case of spiculated benign lesion, two cases of smooth cancers, one case
of lobulated cancer) and were added to the test data. The border characteristics of
the masses in the study are given in Table 5.5.
5.5.3.3

Fractal dimension estimate

Algorithms that are used to estimate the fractal dimension generally operate on
three-dimensional surfaces derived from gray-scale images. The (x,y) coordinates
correspond to the spatial location of each pixel; the (z) coordinate represents the

328 Feature Extraction


Table 5.5: Number of cases having each border type.

Smooth
Lobulated
Irregular
Spiculated

Malignant masses
2
2
13
15

Benign masses
4
11
4
1

gray-scale intensity at that pixel. The fractal-dimension methods can be classified


as three-dimensional or two-dimensional. Methods from both classes have been
used in the analysis of breast images:
a. For three-dimensional analysis, the fractal dimension is computed directly
from the three-dimensional surface [88].
b. For two-dimensional analysis, the fractal dimension is computed from a
two-dimensional curve which is derived from the intersection of the threedimensional surface with a plane. If the intersecting plane is parallel to the
(x,y) plane, the resulting curve runs through pixels with a constant intensity
level.
The two-dimensional method was used here. Two major concerns in using
this method are selecting a suitable level and determining the mass border (i.e.,
the curve) at the selected level. Those concerns were addressed by using multiple
threshold levels and the contrast-enhanced MR imaging described above.
Each intensity level yielded a binary image after thresholding the level image
and segmenting it into connected components. The connected components were
filled in to include interior points. The largest connected component was labeled
foreground and the remaining pixels were labeled background. As the threshold level is decreased, the foreground size increases. The intensity level at which
the foreground first intersects less than 5% of the image border is called the bottom threshold level. The intensity level at which 25% of the foreground is lost
(relative to the bottom level) is called the top threshold level. When possible, the
threshold range was evaluated at 16 levels, spaced approximately evenly across the
intensity range. In 7 of the 52 cases, the intensity values of the pixels in the mass
were nearly uniformly distributed across the intensity range. When fewer than 16
levels satisfied the above criteria, all available levels were used. For each threshold level, the foreground was referred to as the mass, and the perimeter of the
foreground was referred to as the mass border.
The fractal dimension was estimated using an extension of the fractal interpolation function models method described by Penn and Loew [89]. This method

Features in practice 329


computes the fractal dimension for a single threshold level as follows: a graph of
curvature is constructed in which the abscissa is a sequential count of pixels as the
mass boundary is followed, and the ordinate is the curvature at the corresponding
pixel. The curvature graph is then searched for segments that are approximately
self-affine. A segment is self-affine if it can be decomposed into a collection of
miniature copies of itself (see [89] for a formal definition). Each self-affine segment is then modeled with multiple fractal interpolation functions. For each model,
a fractal dimension is calculated [90]. Thus a family of fractal-dimension estimates
is calculated. An overall fractal-dimension feature is derived as the mean of this
family. For this study, the method was extended by enlarging the family of fractaldimension estimates to include a range of intensity levels.
The curvature graph was generated with a diffusion model [91] in which each
border pixel is initially assigned a charge which is iteratively diffused to neighboring pixels inside the mass. If the pixels are on a thin spiculation, the charge is
transferred to the interior of the spiculation, which rapidly becomes saturated with
charge. The saturation retards further diffusion from the border, and the charge
on the border remains high. If pixels are on a flat or concave portion of the mass
boundary, the charge is diffused into the interior of the mass. This event enables
further diffusion from the border, and the charge on the border continues to decrease. After 100 iterations of the diffusion process, the charge values that remain
on the border constitute the shape descriptor used to represent curvature.
5.5.3.4

Discrimination analysis

Improvement in discrimination of benign from malignant masses was evaluated


by comparing the discrimination achieved when the fractal dimension was used
in conjunction with the Nunes-Schnall features with the discrimination achieved
when only the Nunes-Schnall features were used. Descriptive values of the NunesSchnall feature values were assigned to the test cases by Drs. Mitchell Schnall and
Susan Orel. We translated the descriptive feature values to the numeric values given
in Table 5.5.3.4 for use in the discriminators.
Discrimination was analyzed using two models. The first was a three-layer
back-propagation artificial neural network (ANN) modeled after the ANN study of
mammographic images reported by Baker et al. [92]. The ANN used one input
node for each feature, two nodes on a single hidden layer, and one output node.
Performance was evaluated with the leave-one-out method. Then, to test the effect of the ANN model, the analysis was rerun using a second model the SAS
implementation of logistic regression [93]. Thus for each image there were two
single-number measures of malignancy, each in the range (0,1), with 0 indicating
benign and 1 indicating malignant.
The performance of the five Nunes-Schnall features was evaluated as the baseline measure by an examination of all of their subsets. Subsets of features are
indicated by the identifying letters of the included features. For example, BRSID

330 Feature Extraction


Table 5.6: Translation of descriptive features to numeric values.

Feature
Border

Rim enhancement

Septation

Signal intensity

Density

Descriptor
Smooth
Lobulated
Irregular
Spiculated
Negative
Probable
Definite
Definite
Probable
Negative
None
Minimum
Moderate
Marked
Homogeneous
Heterogeneous

Numerical assignment
0
0.33
0.66
1.00
0
0.5
1.0
0
0.5
1.0
0
0.33
0.5
1.0
0
1.0

refers to the subset consisting of all five Nunes-Schnall features and BRID refers to
the subset of features consisting of border, rim enhancement, signal intensity, and
density.
An ROC was constructed for each subset of features. Operating points that
determine the ROC are established by selecting thresholds that are assumed to separate benign lesions from malignant and by plotting the corresponding computed
false-positive fraction (FPF) and true-positive fraction (TPF) values. As an example, for a threshold of 0.5, any mass with malignancy measure of 0.5 or more is
called malignant and any mass with malignancy measure of less than 0.5 is called
benign. Then the true benign or malignant states, in conjunction with the called
states, are used to compute the FPF and TPF values that determine a single ROC
operating point. A set of threshold values over the range [0,1] is used to generate
the set of operating points that define the ROC curve. Performance was evaluated
in two ways, viz.: (1) from FPF values for ROC operating points with TPF levels in
the range (0.90,1.00), values generally regarded as being clinically important, and
(2) from the area under the ROC curves.
To test improvement in discrimination, the fractal dimension was combined
with the Nunes-Schnall features and that sets discrimination compared to that
obtained by using only the Nunes-Schnall features. At most four Nunes-Schnall
features were used in conjunction with the fractal dimension to limit the fractal-

Features in practice 331


dimension feature set to a maximum of five features, the maximum possible with
the baseline Nunes-Schnall features. The set BRIDF (a subset of the Nunes-Schnall
features augmented with the fractal dimension F) produced good discrimination for
clinically important TPF levels and as measured by the area under the ROC curve.
5.5.3.5

Sensitivity analysis

The robustness of the discrimination generated by the BRIDF features was evaluated when small changes were made to algorithm parameters that affect numerical processing but not the underlying theory. The tested parameters are as follows
(see [89] for details):
a. Max(MN): The algorithm for the fractal interpolation function model (FIFM)
evaluates segments having length MN. The parameter Max(MN) is the largest
segment size that is evaluated and is a determinant of the size of the family of
fractal-dimension estimates. Nominal value was 80 pixels; sensitivity analysis was performed over the range of 70 to 90 pixels.
b. Min(Mod): For each evaluated boundary segment, the FIFM method generates a set of fractal models. Min(Mod) is the minimum number of acceptable
models that a segment must generate for the segment to be used. Nominal
value was 10 models; sensitivity analysis included values from 8 to 12 models.
5.5.3.6

Results

The 32 cancers produced three clinically important (0.9 or greater) TPF levels: 0.969, 0.938, and 0.906, corresponding to three, two, and one false negative
results, respectively. (Allowing TPF to be equal to 1.000 often generates an unacceptably high number of false-positive results.) The three TPF levels were tested
with both the ANN and the logistic regression discrimination models, yielding a
total of six combinations, each of which defined an evaluation criterion. In five
of the six analyses, the optimum discrimination available from the Nunes-Schnall
features was obtained by using either the full set (BRSID), or the subset (BRID),
that omitted septation. To assess the contribution of the fractal feature, discrimination was computed for the BRIDF feature set. Table 5.5.3.6 shows the results
of the discrimination analysis. Smaller values of FPF are preferred; note that the
performance of the set that contained the fractal feature was better than either of
the other two sets, under all conditions.
The TPF levels shown in Table 5.5.3.6 are inferior to those reported by Nunes
et al. [81]. This is because of the presence of an exaggerated proportion of difficultto-analyze cases in this study. When, however, those four specially selected cases
are removed from the data set and the analysis rerun, the FPF levels are comparable
to those reported by Nunes et al. [81].

332 Feature Extraction


Table 5.7: Results of discrimination analysis. Entries are values of FPF.

BRIDF
BRSID
BRID

TPF with ANN


0.969 0.938 0.906
0.5
0.25
0.20
0.65
0.40
0.40
1.00
0.60
0.30

TPF with Logistic


0.969 0.938 0.906
0.45
0.3
0.25
0.65
0.65
0.45
0.80
0.50 0.375

Table 5.8: Effectiveness of feature sets in ANN study.

BRIDF
BRSID
BRID

Smoothed ROC area


0.857
0.681
0.697

Empirical data ROC area


0.845
0.745
0.736

Table 5.5.3.6 shows the importance of the evaluation criteria when determining
which set of expert-observer features should be used to establish the baseline discrimination for comparative analysis. The set BRSID outperformed BRID in three
of the runs; BRID outperformed BRSID in the other three. In all cases, however,
BRIDF performed better than both BRID and BRSID.
The BRIDF improvement in discrimination was also evaluated using areas under both the empirical-data ROC curve and the smoothed ROC curve as computed
by LABROC1 software [94]. In this analysis only the ANN model was used.
While the area measures use a large number of clinically unimportant operating
points, they present an overall picture of discrimination. Table 5.5.3.6 shows that
the feature set BRIDF provides an improvement in discrimination over each of the
Nunes-Schnall feature sets with either of the two ROC curves.
CLABROC software was used to evaluate the statistical significance in smoothed ROC areas between the curves generated by BRIDF and BRSID, and also between the curves generated by BRIDF and BRID [95]. Table 5.5.3.6 shows the
results of pairwise analysis using both the area test and the bivariate chi-square
test. It is clear that the addition of the fractal feature led to a significantly greater
ROC area than that provided by either of the two sets of original features.
Robustness of the fractal-dimension feature was evaluated by comparing the
discrimination of BRIDF when the fractal-dimension algorithm used nominal parameter settings with the discrimination of BRIDF when selected parameters from
the fractal-dimension algorithm were perturbed, as described above. Max(MN) had
nominal value 80 and was tested at values 70 and 90; Min(Mod) had nominal value
10 and was tested at values 8 and 12. Table 5.5.3.6 shows the discrimination anal-

Features in practice 333


Table 5.9: Statistical evaluation of areas under smoothed ROC curves.

Feature combination
BRIDF-BRSID
BRIDF-BRID

Area test
Value Two-tailed 
2.58
0.010
3.54
0.004

Bivariate 8  test
Value

7.20
0.027
14.39
0.001

Table 5.10: Results of sensitivity analysis. Entries are values of FPF.

BRIDF (nominal)
BRIDF-70
BRIDF-90
BRIDF-8
BRIDF-12

True positive values


0.969 0.938 0.906
0.5
0.25
0.2
0.4
0.2
0.2
0.5
0.25
0.2
0.5
0.25
0.25
0.45
0.25
0.2

ysis result using the ANN model and the TPF values obtained from Table 5.5.3.6.
The small changes in FPF values resulting from the perturbed parameters should
be compared to the much larger differences between BRIDF and the Nunes-Schnall
sets shown in Table 5.5.3.6.
5.5.3.7

Conclusions

The addition of the fractal-dimension feature resulted in statistically significant


improvement over the discrimination achieved with the expert-observer architectural features alone. That set was used for the baseline for the following two reasons: (1) it appeared more likely that a fractal-dimension feature would correlate
with an architectural feature than with other features (such as time sequence) that
have been used for mass discrimination, and (2) the tested architectural features
were proposed by experts in the field and had been used to generate promising
results in earlier studies.
Although the full set of 52 images is necessary for statistical inferences, the potential importance of the fractal-dimension feature in diagnosis is illustrated by considering the cases with smooth borders. The test set contained six smooth-bordered
masses four benigns and two cancers. The category of smooth [-bordered] focal
masses was one of five categories of MR breast mass images about which Nunes
et al. [81] observed: If similar results are seen in prospective validation studies
performed in other clinical settings, biopsy may not be necessary in patients with
findings that fall into these five feature categories. For the six smooth-bordered

334 Feature Extraction


masses in the present study, the following facts emerged:
1. The fractal-dimension feature correctly classified all six masses as benign
or malignant, whereas none of the Nunes-Schnall architectural features correctly
classified all six masses.
2. One of the smooth cancer masses had a lower (more cancerlike) fractaldimension value than any of the 20 benign masses in the study and would have been
flagged as suspect on the basis of the fractal-dimension value. The development of
a feature that alerts the diagnostician to a suspect smooth-bordered mass has high
clinical importance.
Perturbation of selected parameters in the fractal-dimension algorithm had little
effect on overall discrimination when the fractal dimension was evaluated in conjunction with expert-observer architectural features. That robustness of the fractaldimension feature is derived, in part, from the fact that it is the mean of a large
number of estimates. Each estimate is the analytically computed fractal-dimension
value of a fractal model of a component part of the image. The large number of
components is the result of an algorithm that uses multiple threshold levels, multiple self-affine pieces of the boundary for each threshold level, and multiple fractal
models for each self-affine piece. This three-tier system of components generated
a minimum of 298 estimates (maximum 576) for each of the tested images. The
distribution of estimates had a mean of the standard deviations of 0.198, which is
a large number relative to the range of possible fractal-dimension values (1.02.0).
That variation indicates a lack of reliability when the fractal dimension is estimated
with a single model. The means of the sample spaces of estimates, however, were
computed with greater confidence because of the large number of models. The
average standard error of the mean was 0.010.
The fractal-dimension feature set contained both fractal-dimension and NunesSchnall features. For fair comparison, the combined set should have no more total
features than the original set, requiring that at least one of that set be discarded.
Septation was the most dispensable in the sense that its elimination had the smallest
effect on the discrimination results. This was consistent with results that showed
that when only the Nunes-Schnall features were considered, the best proper subset
of features was BRID, which omits septation. That finding was surprising because
border characteristics are the primary determinant in the Nunes-Schnall decision
tree and septation is a secondary determinant [81]. This again illustrates the perils
of trying to choose a subset of features by selecting features on the basis of their
one-at-a-time performance rather than their performance in combination [46].
Advances in population risk assessment for breast cancer have led to great interest in developing alternative screening tests for patients who are at high risk. The
development of such tests is particularly important for patients who are relatively
young and have radiographically dense breasts, which make mammographic evaluation more difficult. Although the sensitivity of contrast-enhanced MR imaging for
the detection of invasive breast cancer has been found to approach 100%, its speci-

Future developments 335


ficity levels are low [96]. Improved specificity is necessary for breast MR imaging
to be widely accepted as an alternative screening test for high-risk patients. The
results of the preliminary study reported herein indicate that the fractal-dimension
feature has the potential for improving specificity for cases that are difficult to evaluate.
5.6

Future developments

Better features mean better screening and better diagnosis. Ideally, features are
easy to compute, perform well in the presence of noise, artifact, and anatomic and
physiologic variation, and are in accord with clinician understanding. As medical
knowledge grows, we can expect that the resulting improved models of anatomic
shape and shape change, and of physiologic function, will lead to better understanding of the overall process of characterizing form and function, from first principles.
Better features will reduce the cost and risk of imaging, as speeds and diagnostic accuracies increase. Larger databases, properly constructed from gold-standard
data, would increase confidence in investigator and user alike that the techniques
are indeed generally applicable. A concomitant growth of interest in, and implementation of, standards for feature measurement and use would ensure that real
reproducibility is achieved in a variety of clinical settings.
5.7

Acknowledgments

The assistance of Ms. Claudia Rodrguez in the preparation of this manuscript


is gratefully acknowledged.
5.8

References

[1] M. J. Ackerman, Visible human project, Proceedings of the IEEE, vol. 86, pp. 504
511, Mar 1998.
[2] C. W. Chen, W. Lai, F. Y. Fang, and L. Chen, Portal image feature extraction by hierarchical region processing technique, in Proc. 1995 IEEE International Conference
on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century, vol. 4,
pp. 35613566, 1995.
[3] D. S. Fritsch, E. L. Chaney, A. Boxwala, M. J. McAuliffe, S. Raghavan, A. Thall, and
J. R. D. Earnhart, Core-based portal image registration for automatic radiotherapy
treatment verification, International Journal of Radiation Oncology Biology Physics,
vol. 33, no. 5, 1995.
[4] A. Gueziec, P. Kazanzides, B. Williamson, and R. H. Taylor, Anatomy-based registration of CT-scan and intraoperative X-ray images for guiding a surgical robot,
IEEE Transactions on Medical Imaging, vol. 17, pp. 715728, Oct 1998.
[5] R. H. Taylor, J. Funda, L. Joskowicz, A. D. Kalvin, S. H. Gomory, A. P. Gueziec,
and L. M. G. Brown, Overview of computer-integrated surgery at the IBM Thomas
J. Watson Research Center, IBM Journal of Research and Development, vol. 40,
pp. 163183, Mar 1996.

336 Feature Extraction


[6] J. M. Fitzpatrick, D. L. G. Hill, Y. Shyr, J. West, C. Studholme, and C. R. Maurer
Jr., Visual assessment of the accuracy of retrospective registration of MR and CT
images of the brain, IEEE Transactions on Medical Imaging, vol. 17, pp. 571585,
Aug 1998.
[7] R. Gonzalez and R. E. Woods, Digital Image Processing. Addison-Wesley, 1992.
[8] A. V. Oppenheim and R. Schafer, Digital Signal Processing. New York: PrenticeHall, 1975.
[9] W. K. Pratt, Digital Image Processing. New York: John Wiley, 1991.
[10] D. M. Levi, V. Sharma, and S. A. Klein, Feature integration in pattern perception,
Proceedings of the National Academy of Sciences of the United States of America,
vol. 94, no. 21, pp. 1174211746, 1997.
[11] U. Raff, Visual data formatting, in The perception of visual information (W. Hendee
and P. Wells, eds.), New York: Springer-Verlag, 2nd ed., 1997.
[12] B. M. Dawant and A. P. Zijdenbos, Brain segmentation and white matter lesion detection in MR images, Critical Reviews in Biomedical Engineering, no. 5-6, pp. 401
465, 1994.
[13] C. R. Hill, J. C. Bamber, D. C. Crawford, H. J. Lowe, and S. Webb, What might
echography learn from image science?, Ultrasound in Medicine and Biology, vol. 17,
no. 6, pp. 559575, 1991.
[14] T. McInerney and D. Terzopoulos, Deformable models in medical image analysis,
in Proceedings of the 1996 Workshop on Mathematical Methods in Biomedical Image
Analysis, (San Francisco, CA), Jun 1996.
[15] Z. Zhou, R. M. Leahy, and E. U. Mumcuoglu, Comparative study of the effects of
using anatomical priors in PET reconstruction, in Proceedings of the 1993 IEEE
Nuclear Science Symposium and Medical Imaging Conference, 1993.
[16] R. M. Haralick and L. G. Shapiro, Computer and robot vision. Reading, Mass.:
Addison-Wesley Pub. Co., 1992.
[17] H. P. Chan, B. Sahiner, N. Petrick, M. A. Helvie, K. L. Lam, D. D. Adler, and M. M.
Goodsitt, Computerized classification of malignant and benign microcalcifications
on mammograms: texture analysis using an artificial neural network, Physics in
Medicine and Biology, vol. 42, pp. 549567, 1997.
[18] M. Sonka, V. Hlavac, and R. Boyle, Image processing, analysis, and machine vision.
Pacific Grove, CA: PWS Pub, 2nd ed., 1999.
[19] H. Zhimin, M. L. Giger, C. J. Vyborny, U. Bick, L. Ping, D. E. Wolverton, and
R. A. Schmidt, Analysis of spiculation in the computerized classification of mammographic masses, Medical Physics, vol. 22, pp. 15691579, Oct 1995.
[20] M. K. Hu, Visual pattern recognition by moment invariants, IRE Trans. Information
Theory, vol. IT-8, pp. 179187, 1962.
[21] J. Flusser and T. Suk, Pattern recognition by affine moment invariants, PatternRecognition, vol. 26, no. 1, pp. 167174, 1993.

References 337
[22] J. Flusser and T. Suk, Character recognition by affine moment invariants, in Computer Analysis of Images and Patterns. 5th International Conference, CAIP 93 Proceedings, (Berlin, Germany), pp. 572577, Springer-Verlag, 1993.
[23] J. Flusser and T. Suk, Affine moment invariants: a new tool for character recognition, Pattern Recognition Letters, vol. 15, pp. 433436, April 1994.
[24] L. Gupta and M. D. Srinath, Contour sequence moments for the classification of
closed planar shapes, Pattern-Recognition, vol. 20, no. 3, pp. 267272, 1987.
[25] D. L. Thiele, C. Kimme-Smith, T. D. Johnson, M. McCombs, and L. W. Bassett, Using tissue texture surrounding calcification clusters to predict benign vs. malignant
outcomes, Medical Physics, vol. 23, pp. 549555, April 1996.
[26] R. M. Haralick, Statistical and structural approaches to texture, Proceedings of the
IEEE., vol. 67, pp. 786804, May 1979.
[27] G. E. Carlson and W. J. Ebel, Co-occurrence matrix modification for small region
texture measurement and comparison, in IGARSS88 - Remote Sensing: Moving towards the 21st Century, (Piscataway, N. J.), pp. 519520, IEEE, 1988.
[28] F. Argenti, L. Alparone, and G. Benelli, Fast algorithms for texture analysis using
co-occurrence matrices, IEE Proceedings, Part F: Radar and Signal Processing,
vol. 137, no. 6, pp. 443448, 1990.
[29] L. Alaprone, F. Argenti, and G. Benelli, Fast calculation of co-occurrence matrix
parameters for image segmentation, Electronics Letters, vol. 26, pp. 2324, January
1990.
[30] G. E. Carlson and W. J. Ebel, Co-occurrence matrices for small region texture measurement and comparison, Intl. Journal of Remote Sensing, vol. 16, no. 8, pp. 1417
1423, 1995.
[31] M. Oberholzer, M. Ostreicher, H. Christen, and M. Bruhlmann, Methods in quantitative image analysis, in Histochem Cell Biology, pp. 333355, SV, 1996.
[32] M. Nadler and E. Smith, Pattern Recognition Engineering. New York: John Wiley,
1993.
[33] T. M. Cover, The best two independent measurements are not the two best, IEEE
Transactions on Systems, Man, and Cybernetics, vol. 4, pp. 116117, January 1974.
[34] T. M. Cover and J. M. van Campenhout, On the possible orderings in the measurement selection problem, IEEE Transactions on Systems Man and Cybernetics, vol. 7,
pp. 657661, Sept 1977.
[35] R. Duda, P. Hart, and D. Storck, Pattern Classification and Scene Analysis: Classification. New York: Wiley, 2000.
[36] H. J. Holz, Classifier-independent feature analysis. D.Sc. thesis, The George Washington University, May 1999.
[37] H. J. Holz and M. H. Loew, Multi-class classifier-independent feature analysis, Pattern Recognition Letters, vol. 18, pp. 12191224, November 1997.

338 Feature Extraction


[38] A. N. Mucciardi and E. E. Gose, A comparison of seven techniques for choosing
subsets of pattern recognition properties, IEEE Transactions of Computers, vol. C20, pp. 10231031, September 1971.
[39] W. Siedlecki and J. Sklansky, On automatic feature selection, International Journal
of Pattern Recognition and Artificial Intelligence, vol. 2, no. 2, pp. 197200, 1988.
[40] K. Fukunaga and J. M. Mantock, Nonparametric discriminant analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 5, pp. 671678, November
1983.
[41] G. F. Hughes, On the mean accuracy of statistical pattern recognizers, IEEE Transactions on Information Theory, vol. IT-14, pp. 5563, January 1968.
[42] R. O. Duda and P. E. Hart, Pattern Classification and Scene Analysis, pp. 103105.
New York: Wiley and Sons, 1973.
[43] R. D. Short and K. Fukunaga, The optimal distance measure for nearest neighbor
classification, IEEE Transactions on Information Theory, vol. IT-27, pp. 622627,
September 1981.
[44] K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, Inc,
second ed., 1990.
[45] T. M. Cover and P. E. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, vol. IT-13, pp. 2127, Jan 1967.
[46] J. M. van Campenhout, The arbitrary relation between probability of error and measurement subset, Journal of the American Statistical Association, vol. 75, pp. 104
109, March 1980.
[47] P. M. Narendra and K. Fukunaga, A branch and bound algorithm for feature subset
selection, IEEE Transactions on Computers, vol. 26, pp. 917922, September 1977.
[48] P. Pudil, J. Novovicovaa, and J. Kittler, Floating search methods in feature selection,
Pattern Recognition Letters, vol. 15, pp. 11191125, Nov 1994.
[49] H. J. Holz and M. H. Loew, Relative feature importance: A classifier-independent
approach to feature selection, in Pattern Recognition in Practive IV, pp. 473487,
Elsevier Science B.V, 1994.
[50] R. S. Mia, Classification performance and reproducibility of new parameters for
quantitative ultrasound tissue characterization. D.Sc. thesis, The George Washington
University, May 1999.
[51] C. Burckhardt, Speckle in ultrasound b-mode scans, IEEE Transactions on Sonics
and Ultrasonics, vol. 25, pp. 16, January 1978.
[52] R. Wagner, S. Smith, J. Sandrik, and H. Lopez, Statistics of speckle in ultrasound
b-scans, IEEE Transactions on Sonics and Ultrasonics, vol. 30, pp. 156163, May
1987.
[53] J. Thijssen, B. Oosterveld, P. Hartman, and G. Rosenbusch, Correlation between
acoustic and texture parameters from rf and b-mode liver echograms, Ultrasound in
Medicine and Biology, vol. 19, no. 1, pp. 1320, 1993.

References 339
[54] M. Insana, R. Wagner, B. Garra, D. Brown, and T. Shawker, Analysis of ultrasound
image texture via generalized rician statistics, Optical Engineering, vol. 25, pp. 743
748, June 1986.
[55] R. S. Mia, M. H. Loew, K. Wear, and R. Wagner, Quantitative ultrasound tissue
characterization using texture and cepstral features, in Proceedings of SPIE - Medical Imaging 1998, vol. 3338, pp. 211219, 1998.
[56] M. Kadah, A. Farag, J. Zurada, A. Badawi, and A. Youssef, Classification algorithms for quantitative tissue characterization of diffuse liver disease from ultrasound
images, IEEE Transactions on Medical Imaging, vol. 15, pp. 466478, August 1996.
[57] U. Raeth, D. Schlaps, B. Limberg, I. Zuna, A. Lorenz, G. Kaick, W. Lorenz, and
B. Kommerell, Diagnostic accuracy of computerized b-scan texture analysis and
conventional ultrasonography in diffuse parenchymal and malignant liver disease,
Journal of Clinical Ultrasound, vol. 13, pp. 8799, February 1985.
[58] F. Valckx and J. Thijssen, Characterization of echographic image texture by cooccurrence matrix parameters, Ultrasound in Medicine and Biology, vol. 23, no. 4,
pp. 559571, 1997.
[59] L. Fellingham and F. Sommer, Ultrasonic characterization of tissue structure in the in
vivo human liver and spleen, IEEE Transactions on Sonics and Ultrasonics, vol. 31,
pp. 418428, July 1984.
[60] K. Suzuki, N. Hayashi, Y. Sasaki, M. Kono, Y. Imai, H. Fusamoto, and T. Kamada,
Ultrasonic tissue characterization of chronic liver disease using cepstral analysis,
Gastroenterology, vol. 101, pp. 13251331, November 1991.
[61] K. Wear, R. Wagner, M. Insana, and T. Hall, Application of autoregressive spectral
analysis to cepstral estimation of mean scatterer spacing, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 40, pp. 5058, January 1993.
[62] B. Bogart, M. Healy, and J. Tukey, The frequency analysis of time series echoes:
cepstrum, pseudo-autocovariance, cross-cepstrum, and saphe cracking, in Proceedings of the Symposium on Time Series Analysis (M. Rossenblat, ed.), (New York),
pp. 209243, Wiley, 1963.
[63] T. Varghese and K. Donohue, Mean-scatterer spacing estimates with spectral autocorrelation, Journal of the Acoustical Society of America, vol. 96, pp. 35043515,
December 1994.
[64] T. Varghese and K. Donohue, Estimating mean scatterer spacing with frequencysmoothed spectral autocorrelation function, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 42, pp. 451463, May 1995.
[65] R. S. Mia, M. H. Loew, K. Wear, and R. Wagner, Quantitative estimation of scatterer spacing from backscattered ultrasound signals using the complex cepstrum, in
Proceedings of the 15th International Conference, Information Processing in Medical
Imaging, pp. 513518, 1997.
[66] A. Oppenheim and R. Schafer, Discrete-time signal processing, ch. 12. Englewood
Cliffs, NJ: Prentice Hall, 1989.

340 Feature Extraction


[67] R. Kuc, K. Haghkerdar, and M. ODonnell, Presence of cepstral peak in random
reflected ultrasound signals, Ultrasonic Imaging, vol. 8, pp. 196212, 1986.
[68] M. I. Skolnik, ed., Radar handbook. New York: McGraw-Hill, 2nd ed., 1990.
[69] L. Weng, J. Reid, M. Shanker, K. Soetanto, and X. Lu, Nonuniform phase distribution in ultrasound speckle analysis - part i: Background and experimental demonstration, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control,
vol. 39, pp. 352359, May 1992.
[70] L. Weng, J. Reid, M. Shanker, K. Soetanto, and X. Lu, Nonuniform phase distribution in ultrasound speckle analysis - part 2: Parametric expression and a frequency
sweeping technique to measure mean scatterer spacing, IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, vol. 39, pp. 360365, May 1992.
[71] R. Molthen, V. Narayanan, P. Shankar, J. Reid, V. Genis, F. Forsberg, E. Halpern,
and B. Goldberg, Using phase information in ultrasonic backscatter for in vivo liver
analysis, Ultrasound in Medicine and Biology, vol. 24, no. 1, pp. 7991, 1998.
[72] S. Raudys and A. Jain, Small sample size effects in statistical pattern recognition:
Recommendations for practitioners, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 13, pp. 252264, March 1991.
[73] A. I. Penn, L. Bolinger, M. D. Schnall, and M. H. Loew, Discrimination of MR
images of breast masses using fractal-interpolation function models, Academic Radiology, 1999.
[74] S. J. E. Yousef, R. H. Duchesneau, and R. Alfidi, Magnetic resonance imaging of
the breast, Radiology, pp. 761776, 1984.
[75] C. B. Stelling, P. C. Wang, A. Lieber, S. S. Mattingly, W. O. Griffen, and D. E. Powell,
Prototype coil for magnetic resonance imaging of the female breast, Radiology,
vol. 154, pp. 45746, 1985.
[76] N. Dash, A. R. Lupetin, R. H. Daffner, Z. L. Deeb, R. J. Sefczek, and R. L. Schapiro,
Magnetic resonance imaging in the diagnosis of breast disease, AJR, vol. 146,
pp. 119125, 1986.
[77] W. A. Kaiser and E. Zeaitler, MR imaging of the breast: Fast imaging sequences
with and without gd-dtpa, Radiology, vol. 170, pp. 681686, 1989.
[78] J. P. Stack, A. M. Redmond, and M. B. Codd, Breast disease: tissue characterization
with Gd-DTPA enhancement profiles, Radiology, vol. 174, pp. 491494, 1990.
[79] C. Boetes, J. O. Barentsz, R. D. Mus, R. F. van der Sluis, L. J. van Erning, J. H. Hendriks, R. Holland, and S. H. Ruys, MR characterization of suspicious breast lesions
with gadolinium-enhanced turboflash subtraction technique, Radiology, vol. 193,
pp. 777781, 1994.
[80] S. G. Orel, M. D. Schnall, V. A. LiVolsi, and R. H. Troupin, Suspicious breast
lesions: MR imaging with radiologic-pathologic correlation, Radiology, vol. 190,
pp. 485493, Feb 1994.
[81] L. W. Nunes, M. D. Schnall, S. G. Orel, M. G. Hochman, C. P. Langlotz, C. A.
Reynolds, and M. H. Torosian, Breast MR imaging: Interpretation model, Radiology, vol. 202, pp. 833841, March 1997.

References 341
[82] J. Shea, From missiles to mammograms, PENN Health, vol. 9-10, 1966.
[83] C. Burdett, H. Longbotham, M. Desai, W. Richardson, and J. Stoll, Nonlinear indicators of malignancy, in SPIE: Biomedical Image Processing and Biomedical Visual,
Part 2, vol. 1905, pp. 853860, 1993.
[84] C. Burdett and M. Desai, Localized fractal dimension measurement in digital mammographic images,, in SPIE : Vis Comm and Imag Proc, Part 1,, vol. 2094, pp. 141
151, 1993.
[85] S. Pohlman, K. Powell, N. Obuchowski, W. Chilcote, and S. Grundfest-Broniatowski,
Quantitative classification of breast tumors in digitized mammograms, Med Phys,
vol. 23, pp. 13371345, August 1996.
[86] C. Priebe, J. Solka, R. Lorey, G. Rogers, W. Poston, M. K. amd W. Qian, L. Clarke,
and R. Clark, The application of fractal analysis to mammographic tissue classification, Cancer Letters, vol. 77, pp. 183189, 1994.
[87] V. Velanovich, Fractal analysis of mammographic lesions: A feasibility study quantifying the difference between benign and malignant masses, Amer J Med Sci,
vol. 311, pp. 211214, May 1996.
[88] C. B. Caldwell, S. J. Stapleton, D. W. Holdsworth, R. A. J. amd W. J. Weiser,
G. Cooke, and M. J. Yaffe, Characterisation of mammographic parenchymal pattern by fractal dimension, Phys. Med. Biol., vol. 35, no. 2, pp. 235247, 1990.
[89] A. I. Penn and M. H. Loew, Estimating fractal dimension of medical images
with fractal interpolation function models, IEEE Transactions on Medical Imaging,
vol. 16, pp. 930937, Dec 1997.
[90] M. F. Barnsley, Fractals Everywhere. Academic Press, 1993.
[91] M. H. Loew, A diffusion-based description of shape, in Pattern Recognition Theory
and Application (P. Devijver and J. Kittler, eds.), NATO ASI Series, pp. 501508,
Berlin: Springer-Verlag, 1987.
[92] J. A. Baker, P. J. Kornguth, J. Y. Lo, M. E. Williford, and C. E. Floyd Jr., Breast cancer: Prediction with artificial neural network based on bi-rads standardized lexicon,
Radiology, vol. 196, pp. 817822, 1995.
[93] Logistic Regression, Examples, Version 6. Cary, NC.: SAS Institute, Inc., first ed.,
1995.
[94] D. Dorfman, C. Metz, B. Herman, P. Wang, J. Shen, and H. B. Kronman, Program for
the IBM PC. http://www-radiology.uchicago.edu/sections/roc/
software.cgi, 1993.
[95] J. Shen, B. Herman, H. B. Kronman, P. Wang, and C. Metz, Clabroc program IBMPC version 1.2.1. http://www-radiology.uchicago.edu/sections/
roc/software.cgi, 1993.
[96] S. E. Harms, D. P. Flamig, K. L. Hesley, M. D. Meiches, R. A. Jensen, W. P. Evans,
D. A. Savino, and R. V. Wells, MR imaging of the breast with rotating delivery of
excitation off resonance: Clinical experience with pathologic correlation, Radiology,
vol. 187, pp. 493501, 1993.

CHAPTER 6
Extracting Surface Models of the Anatomy
from Medical Images
Andre Gueziec
Consultant
Contents
6.1
6.2

6.3

6.4

6.5

6.6

6.7

Introduction
Surface representations

345
345

6.2.1

345

Point set

6.2.2 Triangular mesh


6.2.3 Curved surfaces
Iso-surface extraction
6.3.1 Hexahedral decomposition

346
348
348
349

6.3.2
6.3.3

Tetrahedral decomposition
A look-up procedure to replace the determinant test

350
355

6.3.4 Computing surface curvatures


6.3.5 Extracting rib (or ridge, or crest) lines
6.3.6 Iso-surface examples
Building surfaces from two-dimensional contours
6.4.1 Extracting two-dimensional contours from an image

357
358
359
360
361

6.4.2 Tiling contours into a surface portion


Some topological issues in deformable surfaces
6.5.1 Tensor-product B-splines
6.5.2 Dynamic changes of topology

363
368
369
371

Optimization
6.6.1 Smoothing

371
373

6.6.2 Simplification and levels of detail


Exemplary algorithms operating on polygonal surfaces
6.7.1 Apparent contours and perspective registration

376
383
383

343

344 Extracting Surface Models of the Anatomy from Medical Images

6.8

6.7.2 Surface projection for x-ray simulations


Conclusion and perspective

387
390

6.9

References

390

Introduction 345
6.1

Introduction

A large number of applications in medical imaging require geometric models


of the surface of various anatomical structures. Although recent direct volume rendering techniques provide high visual quality [1] and speed, using cost-effective
special purpose hardware [2], surfaces are very effective for showing the spatial relationship between different structures. (Radio-therapy planning visualization provides an excellent bio-medical example of this [3].) Surface extraction is necessary
when surfaces (meshes) are used for further analysis and processing, independently
of visualization. Such processes include finite element analysis (Chapter 19) and
registration (see Chapter 8). Several methods have recently been developed to help
guide surgery using computer-generated displays of anatomical structures registered with live imagery of the patient. Such methods often use surface models for
registration and rendering [46].
This chapter is cross-disciplinary. One of its goals is to illustrate the flexibility
in choosing a surface representation. Surface representations are reviewed in Section 6.2 and illustrated with various examples. Another goal is to describe recent
techniques that have been developed for surface extraction from medical images in
Sections 6.3, 6.4, and 6.5 and to provide references for further reading.
We also study some methods for optimizing surface models in Section 6.6,
such as smoothing (fairing) and simplification. We introduce a few algorithms that
operate on a surface and accompany them with real-world examples in Section 6.7.
6.2

Surface representations

Various geometric objects may be used to represent a surface. The first object,
in Section 6.2.1, specifies a surface with a set of samples (and hence does not define
a surface per se). A powerful piece-wise linear representation is introduced in
Section 6.2.2. Smooth curved surface representations (Section 6.2.3) are (in theory)
more compact and allow the full exploitation of curvature and normal information;
however, their topology is less flexible.
6.2.1

Point set

The simplest representation for a surface is probably a point set, each point,
or vertex, being defined by three Cartesian coordinates    . Contrary to a
curve, for which a set of sample points may be naturally ordered according to a
progression along the curve, there is no such natural ordering for surface sample
points.
Point sets have been used successfully for registration purposes, as in the Head
in the Hat scheme [7]. More recent surface registration methods using sample
points are described in [8, 9]. Registration between point sets or between a point
set and a surface is generally performed by associating a corresponding point (respectively, surface location) to each point of a set and by determining a geometric

346 Extracting Surface Models of the Anatomy from Medical Images


transformation that best aligns each point with its correspondent using various optimization schemes. See Chapter 8 for an in-depth description of surface registration
methods. Point sets have also been used for some visualization tasks where a representation of a surface using discrete points is suitable.
This representation has several limitations, due to the lack of information between sample points: area and volume measurements are not possible; for registration processes, misregistrations can occur due to the fact that a closest sample point
and a closest surface point are not necessarily the same.
A number of strategies have been developed for constructing a surface from its
samples and thus to bridge gaps between samples [1013]. None of these techniques, to the best of the authors current knowledge, is guaranteed to produce a
satisfactory result in all cases.
6.2.2

Triangular mesh

A triangular mesh comprises vertices and triangles, each triangle referring to


an ordered triple of vertices. A triangular mesh modeling a portion of a human
femur is illustrated in Fig. 6.1. For the type of applications that we are interested in,
we consider a particular triangular mesh that is oriented and that is also a manifold.
A triangular mesh is oriented if two triangles sharing an edge (an edge is a pair of
vertices referred to by a triangle) are such that the two vertices (endpoints) of the
edge are listed in a different order in both triangles. A triangular mesh is a manifold
if each vertex is such that there is only one triangle fan incident to it. (A triangle
fan of the mesh is a collection of triangles incident to a given vertex and connected
by edges, such that each edge is shared by no more than two triangles of the mesh,
i.e., is a regular edge. A singular edge is shared by more than two trianglessee
Fig. 6.1(b)). A vertex with a single incident triangle fan is called a regular vertex
(Fig. 6.1(c)). A vertex with more than one incident triangle fan is called a singular
vertex (Fig. 6.1(d)).
A boundary edge is an edge that has only one incident triangle. A boundary
of a manifold triangular mesh is obtained by linking boundary edges connected by
vertices. If  is the number of vertices of the triangular mesh,   the number of
edges and  the number of triangles, the Euler Number is defined to be    .
Supposing that the triangular mesh has no boundaries, then the Euler number equals
    where  is the number of handles (exactly as the handle of a tea-cup)
of the surface: for instance, it is equal to zero for a torus and to two for a sphere
(or a tetrahedron, a cube, etc.). Figure 6.2 illustrates triangular meshes with zero,
one, and two handles. When discussing the topology of a triangular mesh (or more
generally, of a surface) in subsequent sections, we will be referring to the property
of the surface to have handles or boundaries and to the number of handles and
boundaries.
If the surface has no boundaries, we can also write that         (this
is a standard result that the reader should be able to verify by him/herself when

Surface representations 347

triangle fan

singular
edge

(b)

singular
vertex

(c)

regular
vertex

(d)
(a)
Figure 6.1: Triangular meshes: (a) triangular mesh modeling the boundary of a proximal
portion of a human femur; (b) singular edge and vertex; (c) regular vertices. (d) a nonmanifold mesh having a singular vertex.

(a)

(b)

(c)

Figure 6.2: Triangular meshes with a different Euler Number: (a) 2; (b) 0 (one handle);
(c) -2 (two handles).

enumerating the edges by visiting in turn each triangle). It can be seen that 
      . Hence, the number of triangles is approximately twice the number
of vertices, provided that     .
Most of the applications listed in the present chapter utilize a triangular mesh.
If the triangles are sufficiently small, it is possible to approximate surfaces closely
enough in most cases. Also, as discussed further in Section 6.6.1, a signal processing framework may be applied to triangular meshes, allowing such operations as
smoothing.
One drawback of a triangular mesh is the amount of data that is required to
represent it. To address this issue, methods have been developed to approximate
triangular meshes using fewer triangles (Section 6.6.2). Methods for compressing
triangular meshes were also developed, initially in the computer graphics community [14, 15]; these methods facilitate storage, transmission, and rendering.

348 Extracting Surface Models of the Anatomy from Medical Images


Delingette/Cotin use simplex meshes [16] for modeling the elastic properties
of some tissues and simulating surgery. A simplex mesh is the dual of a triangular
mesh: each triangle in a triangular mesh corresponds to a vertex in a simplex mesh,
each pair of adjacent triangles in a triangular mesh corresponds to an edge in a
simplex mesh.
6.2.3

Curved surfaces

Several types of smooth curved surface representations have been developed


in the past and used extensively in computer aided design, including B e zier rectangular patches, Bezier triangles, B-spline surfaces, and NURBS (Non-Uniform
Rational B-splines). These representations are built by blending a set of control
vertices using piecewise polynomial or rational functions. Generally, we will refer
to them as splines. While it is outside the scope of this chapter to review the details of each type of spline, such details are available for instance in Farin [17]. In
Section 6.5, however, we explain the details of tensor-product B-spline surfaces. A
discussion of various representations is also available in Gu e ziec [18]. The process
of fitting these representations to data is discussed in Dierckx [19].
The use of such representations in engineering is quite different from their use
in medical imaging. While spline surfaces allow very effective designing by userdirected manipulation of control vertices, fitting spline surfaces to data is more
complex and is still the focus of much research. In the computer aided design
community this is in fact a process called reverse engineering [20].
Splines are explicit surface representations. Implicit representations, which
characterize a surface as the zero set of some real-valued function of three-dimensional space, are also possible 1 . Implicit superquadrics are smooth curved surfaces
that have been used in medical imaging [21, 22]. Superquadrics may be defined
using fewer parameters than splines. Assuming that the fitting process is accurate,
these parameters may be useful for tracking and characterizing the deformation of
anatomical surfaces in live subjects. Such applications are covered more deeply in
Chapter 3.
Another noteworthy implicit representation is Bajajs implicit algebraic patches,
or A-patches [23]. A-patches are embedded in a supporting tetrahedral mesh (modeling not only the surface, but also the enclosed volume) and can be used for dynamic simulations.
6.3

Iso-surface extraction

Iso-surface extraction is an essential technique for analyzing and visualizing


medical images. The process of extracting an iso-surface that we develop in this
section starts with regular volume data. Volume data may be formed with a se1
A three-dimensional medical image, possibly segmented, may be viewed as a regular grid of
samples of an implicit function. The process of extracting an iso-surface of Section 6.3 uses such an
implicit representation.

Iso-surface extraction 349


ries of two-dimensional image slices (this is the case for CT-scan data) or may be
readily available as in MRI-scan data. Regular volume data is defined with a set of
vertices regularly positioned in space and a function value (e.g., Hounsfield number in CT data) provided at each vertex. Vertices are connected, forming a graph.
More specifically, vertices are connected by a three-dimensional lattice, such that
each vertex has 6 neighbors except at the boundary of the lattice (see Fig. 6.3). In
what follows, we will call the vertices voxels. For simplicity, we will refer to the
function values as intensities. Regular volume data is sometimes referred to as a
three-dimensional image, because one can resample the data on any plane chosen
arbitrarily through the volume and obtain an (two dimensional) image as a result.
An iso-surface is a surface that will connect all the points of space that have the
same function value associated; this function value is called the iso-value. A good
model of the anatomy is rarely obtained by computing an iso-surface of the raw
image data, except in particular cases, e.g., when dry bone is scanned, as illustrated
in Section 6.3.62 . It is thus preferable to perform a segmentation of the image
before extracting an iso-surface. Segmentation is studied in detail in Chapter 2.
There are two fundamental strategies for extracting an iso-surface corresponding to a given iso-value from regular volume data. The strategies differ in how
they treat each cell of the three-dimensional lattice. The first strategy visits all
hexahedral cells of the three-dimensional lattice and draws the (most often, empty)
portion of the iso-surface intersecting each cell. Since a polygonal surface is built,
this is often called polygonizing a cell. The second strategy first decomposes hexahedral cells into tetrahedra and polygonizes tetrahedral cells. There are a number
of variants for either strategy. See [24, 25] for a discussion of various algorithms.
The reader may also consult a text by the same author [26] for a detailed description of a variant of each strategy and for recent research on iso-surface extraction
algorithms.
6.3.1

Hexahedral decomposition

Marching Cubes (Lorensen & Cline, 1987) [27] is probably the best known
variant of the hexahedral decomposition method. However, the resulting tiling may
be inconsistent across hexahedra, resulting in non-manifold surfaces [24, 28, 29].
(This problem may be resolved in some implementations.) The Wyvill et al. [30]
or Kalvin [28] methods for instance do not have this problem. For brevity, we
concentrate on the tetrahedral decomposition method in the next Section 6.3.2. A
complete description of an implementation of the Wyvill et al. method is also available in [26]. In [26] the results that are presented indicate that when subjected to
the same simplification process the final output of the hexagonal and tetrahedral decompositions are virtually identical. It is also argued that implementing the tetrahedral decomposition is perhaps simpler. Additional perspectives on iso-surface
2

Even for CT-scans of dry bone, iso-surfaces may yield models that are not anatomically faithful
in regions of thin bone.

350 Extracting Surface Models of the Anatomy from Medical Images

hexagonal cell
connecting 8
iso surface voxels

volume element
or voxel

Figure 6.3: A hexahedral cell of a three-dimensional lattice of volume elements (voxels). A


central cell is shown with 8 voxels of different intensities (function values), represented using
different gray levels. The arrows indicate that the lattice extends a priori in all six directions.
A portion of iso-surface is also shown. The iso-surface cuts through the lattice.

extraction can be found in [3142].


6.3.2

Tetrahedral decomposition

This section describes the method of Gueziec and Hummel [43], which is an
extension of the method of Doi, Koide, et al. [44, 45]. The differences between
the methods described in [43] and in [44, 45] are related to 1) the specifics of
the decomposition into tetrahedra and the resulting orientation of the tetrahedra,
2) the look-up procedure for oriented triangles, and 3) the addition in [43] of a
simplification process following the extraction process. The simplification process
is described in detail in Section 6.6.2. The hexahedral lattice is decomposed into a
tetrahedral lattice as shown in Fig. 6.4.
In addition to the examples provided in Section 6.3.6, Gueziec and Hummels
method has been applied to compare schizophrenic and normal ventricles by Dean
et al. [46] and is used for surface extraction before registration in [9, 47].
The method has also been used during simulations studies by Gencer et al. on
the optimal placement of electrodes for electric source imaging [48]. The effect of
the simplification procedure on the accuracy of the detection of rib lines (which are
discussed in detail in Section 6.3.5) has been studied by Gu e ziec and Dean [49].

Iso-surface extraction 351

Figure 6.4: Triangulation of an hexahedral cell producing five tetrahedra. Four are isosceles and isomorphic to one another (

 

,

and

denote their apices). Two of the four

are shown at the left and center. The fifth tetrahedron (right) is equilateral and occupies the
center of the hexahedral cell.

6.3.2.1

Identifying tetrahedra vertices

The following procedure is used to build a tetrahedron as an (oriented) quadruple of vertices. The actual implementation may either apply this procedure for
every tetrahedron or apply it once and insert all possible tetrahedra (10) in a table.
This table may also be built manually. The first two methods are probably less
error-prone.
For any given hexahedral cell, two tetrahedral decompositions are possible.
One of these is shown in Fig. 6.4; the other is mirror symmetric to the one shown.
In order to be consistent between neighboring hexahedral cells, i.e., in order that
faces and edges of tetrahedra in one cell match faces and edges of tetrahedra in the
neighboring cells, we must alternate between the two decompositions from cell to
cell, in a three-dimensional checkerboard fashion.
We next describe the use of binary operations to perform the decomposition
easily. (This is not used in the Doi/Koide version.) Each of the vertices of the
hexahedral cell is numbered from 0 to 7 (see Fig. 6.5). Moving along an edge of
the hexahedral cell is performed by inverting one of the three bits of the vertex
number. The three possible 1-bit inversions will be denoted by  001,  010, and
 100 (see Fig. 6.5).
Each hexahedral cell has an integer coordinate location  
 for its local
origin. We use , , and
to measure the row, column, and height in the array of
cells. In order to identify each tetrahedron within a cell, we perform the following
steps:
1. If   
is even, we call the cell an even cell, and we say that the parity of
the cell is even. If the sum is odd, the cells parity is odd. To determine tetrahedron number 1 within the cell, we select apex as vertex 000 in an even
cell, and as 001 in an odd cell. We then obtain three other apices from by
applying the motion operators  001,  010, and  100, resulting in ,  ,

352 Extracting Surface Models of the Anatomy from Medical Images

0(000)
010

1(001)

011

3(011) 110

2(010)

100

101

4(100)
6(110)

001

5(101)
7(111)

Figure 6.5: Some motion operators.

A1

a11

a11

A1

a12

a12

a13

a13

Odd

Even

Figure 6.6: To determine tetrahedron number 1 within the cell, we select apex

as vertex

000 in an even cell and as vertex 001 in an odd cell. We then obtain three other vertices
from

by applying the motion operators

001,

010, and

100.

Iso-surface extraction 353


and  respectively. Tetrahedron number 1 is spanned by    
which we view as an ordered tuple of vertices (see Fig. 6.6).

 ,

2. Tetrahedron number 2 uses the second apex  , obtained from by the


2-bit inversion  011. Three vertices,  ,  , and  , are obtained from
 by the one-bit transitions  001,  010, and  100.
3. Next,  is obtained by applying  101 to . The span of  ,  ,  ,
  defines tetrahedron number 3, where the  ,  , and  once again
were obtained from  after applying  001,  010,  100.
4. By applying  110 to , we define  , and the corresponding tetrahedron
4 is defined analogously to tetrahedra 1, 2, and 3.
5. The fifth tetrahedron is defined by shifting the four apices ,  ,  ,  
by  001, resulting in  001,   001,   001,   001).
6.3.2.2

Intersecting the iso-surface with a tetrahedron

The next step consists in determining whether a portion of the iso-surface will
intersect a given tetrahedron       . The voxel values        corresponding to the four tetrahedron apices are retrieved from the three-dimensional
lattice. For each tetrahedron edge that exhibits an intensity sign change, a vertex of
the polygonal approximation of the iso-surface  is created. The exact position
of the vertex is determined by the zero-crossing of a function interpolating voxel
values along the edge. Because of the issue illustrated in Fig. 6.7, we use linear
interpolation on an edge of the hexahedral cell and bilinear interpolation on a diagonal edge (which amounts to using bilinear interpolation overall). Specifically, if
    denote the four intensity values on the face of an 8-cell, then the intensity
value along the   diagonal edge is given by:

   

 

      

Here,  varies from 0 to 1 linearly along the diagonal. Provided   , will have
exactly one zero in the range    , which can easily be determined from the
quadratic formula (the other zero will fall outside this range).
6.3.2.3

Building a set of oriented triangles

Consider a tetrahedron        that is defined by an ordered tuple of


apices. The order is as determined by one of the five steps from Section 6.3.2.1. The
corresponding intensity values are given by a four-tuple       . If all four
values have the same sign, then the surface does not intersect the tetrahedron. If the
signs are mixed, however, we then have three major cases. Among       ,
there are either one, two, or three positive values. These cases, which we call Cases

354 Extracting Surface Models of the Anatomy from Medical Images

-1
.29

.5

-1
Figure 6.7:

.25

-1
Using a linear interpolation along the diagonal edge results in a severe dif-

ference in position for the polygonal surface when the diagonal edge is swapped (left and
right), assuming a square face. Instead, we use a bilinear interpolant (middle). The numbers .5, .29, and .25 indicate the relative position of the iso-surface, where it intersects the
top-left-to-bottom-right diagonal.

I, II, and III, are illustrated in Fig. 6.8. For the Cases I and III, three of the values
have the same sign. For Case II, two vertices are positive and two are negative. In
this case, the surface will intersect all four faces, and we have a quadrilateral. By
choosing arbitrarily a diagonal of the quadrilateral, we obtain two triangles within
the tetrahedron. Combining all cases, we have a patch of the surface represented as
either one or two triangles. The triangles can be oriented.
Cases I and III For Cases I and III, exactly one value among ,  ,  ,  
has sign opposite to the others. Using this value, we compute intersections along
the edges connecting its vertex to the other three vertices, preserving order among
      . So, for example, if  has the different sign, we compute intersection along   , then    , and then    . These three intersections
determine an ordering which is either the correct direction, or opposite to the correct direction. One way to determine if the orientation is correct is to check it by
examining the sign of a determinant, as done by Doi/Koide, as follows.
First, we reorder the vertices        to obtain     such that
the vertices with negative values precede the vertices with positive values, without
otherwise disturbing relative order. Viewing the vertices as coordinates in threespace, we can then compute the determinant

 



 

If this determinant is negative, then the ordering of the triangle vertices is correct;
otherwise, the ordering must be reversed. The procedure can be verified by carefully considering the cross product       (see Fig. 6.8).
Case II In this case, exactly two vertices are negative, and two are positive. We
reorder the vertices        as above, to give vertices    , where the

Iso-surface extraction 355


-

Case I

+
+

Case II

Case III

Figure 6.8: Defining one or two oriented triangles. Depending upon the number of vertices
outside the solid (marked with the minus sign) we define three or four vertices of the surface
approximation in the ordering specified in the text. The condition for having to reverse the
ordering is           . Note that this value is six times the volume, positive
or negative, of the tetrahedron.

values associated with  and  are negative, and the values associated with  and
are positive. We compute the vertices of the quadrilateral as follows. We first find
the zero along   . Next, we find the zero along  , then  , and finally
  . This sequence of four points establishes a cycle, which is either the correct
ordering, or the incorrect ordering, which can be easily checked.
In this case, the verification procedure is exactly the same as in the previous
case. That is, viewing the vertices as vectors in   , we compute
  
  . The ordering of the interpolation points defined above is correct if the
determinant is negative; if the determinant is positive, then the order should be
reversed.
If the resulting ordering of the interpolation points is given by        ,
then there are two possible triangulations: one is given by the following pair of ordered triangles             , and the other is given by the pair
        . The triangulations may be chosen arbitrarily. However, by always choosing the first one in an even cell, and the second one in an odd
cell, the degree of vertices in the triangulation can be shown to be always less than
or equal to nine [43].
6.3.3

A look-up procedure to replace the determinant test

In fact, it is not necessary to compute the determinant value


   
  to know its sign. By applying the procedure that follows instead of computing
determinants, a 2.5-fold speed-up was recorded in [43].
Let us first suppose that for a given tetrahedron        in an even cell,
no reordering is required to obtain    . It then turns out, based on the
construction of the ordered tuple representing the tetrahedron, that
  
   will be positive, regardless of whether        is a tetrahedron

356 Extracting Surface Models of the Anatomy from Medical Images


number 1, 2, 3, 4, or 5, as long as we are inside an even cell. (For an odd cell,
the result is always negative.) Next, we suppose that some reordering is required
in determining    . Recall that vertices with negative values are listed first,
without changing other relative orderings. Suppose, for example, that  and
must be exchanged to obtain    . The result of this permutation is that the
sign of the determinant is reversed. If instead  must be brought to the front, then
two transpositions are required, and the sign of the determinant stays the same.
In general, the sign of the determinant
      is determined
by the number of transpositions required to permute the negative vertices among
       to the beginning of the list. The number of transpositions required
can be pretabulated, as follows. Representing a negative vertex in       
by a 0, and a positive vertex by a 1, we obtain a 4-bit code for the tetrahedron,
yielding a number between 0 and 15. The resulting determinant will be positive or
negative according to whether an even or odd number of transpositions are required,
as tabulated in Table 6.1. Note that it is no longer required to physically reorder
the verticesonly the bit code is needed. Accordingly, if the table yields a positive
entry for the determinant, the oriented cycle representing the surface patch should
be reversed.
Table 6.1: Mapping from the bit code representing the signs of the values at the tetrahedra
vertices in an even cell to the sign of the determinant that determines whether a triangle is
correctly oriented (negative determinant) or must be reversed (positive determinant). For an
odd cell, the entries are opposites of those of an even cell.

Bit code
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111

Index
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

Sign of det()












Iso-surface extraction 357


Special Cases Zero-area polygons can easily be avoided in the tetrahedral decomposition method as shown in this section. Such degenerate polygons occur
when  at some or all of the positive vertices. (Recall that  corresponds
to a positive vertex.) When    occurs, a single vertex  within a
tetrahedron is handled exactly as any positive vertex. Indeed, we treat a vertex with
 exactly as any other positive vertex except for two special cases: (1) If there
is one vertex with  , and the other three are strictly negative, then we do not
create a triangle, even though we are in a Case I situation; (2) Also, if there are two
vertices with  , and the other two vertices are strictly negative, then we do not
create polygons, even though we have a Case II situation. In these two cases, the
polygons are degenerate. On the other hand, if three values among       
are zero and the other is negative, then we create one triangle, exactly as a Case III
situation would predict. Also, contrary to [44], when        , we
do not create any polygons, instead viewing this situation as four positive vertices
lying inside the iso-surface.
This completes the description of the tetrahedral decomposition method. Once
the surface has been extracted, it is possible to define memory-efficient data structures based on tetrahedra to store vertices and triangles. This is described in [43].
6.3.4

Computing surface curvatures

Monga et al. [50] provide a method for computing principal curvature directions and curvature values of an iso-surface directly from voxel data and spatial
derivatives of the voxel data 3 . Approaches based on triangular patches, rather than
voxel data, are given in [52, 53] and [54]. If a differentiable explicit representation
is available (e.g., using spline tensor products), the method of [18] may also be
used.
In order to evaluate the surface curvatures using volume data, we consider a
curve  , with tangent vector , along a normal section of the iso-surface  4 . Since
 is contained in a plane that contains the surface normal, , the normal of  is
also   

, where designates a continuous function modeling volume
(voxel) data. The differentiation of   in the direction of yields



 

where
denotes the normal curvature of  , which is also by definition the surface
curvature in direction , and   is the Hessian:

   
3

  
  
  




See also [51].


A normal section of a surface (through a vertex of the surface) is a section of the surface cut by
a plane that contains the vertex normal. There are an infinity of normal sections at a given surface
vertex.
4

358 Extracting Surface Models of the Anatomy from Medical Images


   . The principal curvatures are obtained by maximizing
Thus

and minimizing
over possible tangent directions . Accordingly, the principal
curvatures are the negatives of the eigenvalues of      divided by the magnitude of the gradient , where  is the projection of     onto    and is
a three-by-two matrix:


 

and  is the Householder transformation that maps  to     . We first


define       , and then     , where  is the identity
matrix. The principal directions  and  are the tangent directions for which the
minimum and maximum curvatures are obtained.
All vertices of surface patches occur along edges of hexahedral cells, and derivatives of the data may be computed and interpolated at these locations by suitable
methods. Solving the eigenproblem at the interpolated position of each surface
vertex, we may attach curvatures (and also principal curvature directions, after applying the inverse Householder transformation) to each vertex of the iso-surface.
The above method was used to compute surface curvatures that were colorized
with a suitable color map to produce some of the illustrations in Section 6.3.6.
6.3.5

Extracting rib (or ridge, or crest) lines

Rib (or ridge, or crest) lines of a surface are the loci of the surface where the
principal curvatures reach a local extremum. Rib lines have been studied in medical
imaging by Bookstein and Cutting [55, 56], the Epidaure team at INRIA [51, 57
61], as well as Dean et al. [46, 49]. While Epidaures main application has been to
extract (few) salient and robust features for registering 3D image data sets, Cutting,
Bookstein, Dean, et al. have been pursuing the goal of using rib lines, combined
with other networks of (geodesic, or minimum-distance) lines, to build templates
of the anatomy, particularly of the face, skull, or ventricles.
Several characterizations of rib lines have been proposed. We refer to the
above-referenced publications as well as textbooks on the geometry of surfaces [62
64]. Referring to Section 6.3.4 above, rib lines may be related to ribs (discontinuity curves) of the focal surfaces of a surface, which are the loci of the centers of
curvature of the surface 5 . There is one center for the minimum principal curvature,
another center for the maximum principal curvature, and hence two focal surfaces,
except at an umbilic point. Ribs may thus be colored differently, depending on
which focal surface they relate to [64].
In this section, we use and illustrate Markatiss characterization of a rib line
5

A center of curvature is located along the normal, at an offset equal to the radius of curvature,
which is the inverse of the curvature.

Iso-surface extraction 359


based upon focal surfaces [65] 6 . Markatiss equation applies for a surface defined
using an implicit equation, which is the case for an iso-surface (the implicit function
being known at the voxels only). Markatiss characterization is also discussed on
p.197 of the book by Porteous [64]. According to Markatiss equation, a surface
point belongs to a rib if either of the following relations holds:

    
    




    
    

(6.1)

(6.2)

The notations in Eqs. (6.1, 6.2) should be interpreted as follows:    ,


     and   is a matrix of third order derivatives of the intensity (with

10 different entries).  and  are the principal directions, as previously discussed


in Section 6.3.4.
Equations (6.1) and (6.2) are implicit equations that define another set of isosurfaces in the volume data. The intersection between these surfaces and the first
surface corresponds in general to a set of lines. To extract these lines, one possibility is to use the Marching Lines method developed by Thirion and Gourdon [60].
Another possibility is to extend the method of Section 6.3.2 as follows. For each triangle of the iso-surface, the three vertices of the triangle are tested against Eqs. (6.1)
and (6.2). If a change in sign is observed, then a segment of a rib line is created.
Such segments may then be linked together, forming curves. We show a result of
the latter approach in Fig. 6.9, where we illustrate ribs on an iso-surface of the
cortex. The ribs appear to follow the centerlines of the sulci.
6.3.6

Iso-surface examples

The first example in Fig. 6.10 is a cortex model composed of 362,000 triangles
(courtesy of Henry Rusinek and Gregoire Malandain, who performed the segmentation [66]) that was constructed in 59 seconds on an IBM RS6000 computer using
the tetrahedral decomposition method. Surface simplification (see [43] and Section 6.6.2) reduced the number of triangles from 362,000 to 52,000. The result is
shown in Fig. 6.10(b). Brain sulci appear in red, and gyri appear in green and blue,
as color coded relative to the magnitude of the largest principal curvature.
The next example is a surface model obtained from a CT scan of a cranium from
the Cleveland Museum of Natural History Collection (Courtesy of Bruce Latimer,
Court Cutting, and David Dean). The following surface model was obtained using the tetrahedral decomposition method followed by surface simplification. The
model, shown in Fig. 6.11, comprises 66,000 triangles and 129,000 triangles, simplified from an original iso-surface containing 3,450,000 triangles.
6

Markatiss ridge equation was brought to the authors attention through personal communication
with I. Porteous, from the University of Liverpool.

360 Extracting Surface Models of the Anatomy from Medical Images

Figure 6.9: Ribs (in white) drawn on an iso-surface of a cortex segmented in an MRI image. The ribs were characterized using Markatiss equation. The ribs appear to follow the
centerlines of the sulci.

6.4

Building surfaces from two-dimensional contours

As explained before, direct extraction of anatomical surfaces as iso-surfaces


using the method explained above may fail to extract an accurate model of the
anatomy. Another, perhaps more classical, approach for extracting a surface model
from image slice data is to first extract two-dimensional contours from each relevant image slice and then tile the two-dimensional contours together to form a
surface [6771].
In the present section, we illustrate this situation with a concrete example: the
extraction of a three-dimensional model of the femur from CT-scan data for surgical planning and intra operative registration in Total Hip Replacement (THR)
surgery [7274].
In the CT scan data that is clinically acquired for THR [75], the slice spacing
typically varies significantly, from 1 mm in the vicinity of the fiducial markers to 6
mm or more in the femur shaft in order to maximize the detail in the critical areas

Building surfaces from two-dimensional contours 361

(a)

(b)

Figure 6.10: Example of iso-surface extraction and curvature computation. (a) cortex model
represented as an iso-surface (362,000 triangles). (b) simplified cortex (52,000 triangles);
the vertices are color-coded (with Gouraud shading)

according to the magnitude of the

largest surface principal curvature. (For a color version of this Figure see Plate 1 in the
color section of this book.)

while limiting the x-ray dosage. Direct extraction of bone surfaces from such CT
data using software that builds an iso-surface produces staircase artifacts: bone
contours are located several voxels apart from slice to slice, and the gap is modeled
by the iso-surfacing method as a flat surface portion parallel to the slices. Aside
from the problems created by the irregular spacing between slices, bone has very
different Hounsfield numbers in the CT data because of varying bone densities
(within the same femur bone). Selecting a different iso-value for each slice may
not be sufficient to obtain a correct segmentation of each slice. This problem is
illustrated in Fig. 6.12.
6.4.1

Extracting two-dimensional contours from an image

The extraction of relevant two-dimensional contours from each slice may be


automated or performed by an operator or semi automated (and operator assisted).
This operation is part of a segmentation process and is described more completely
in Chapters 2 and 3.
A solution that performs well in practice on the data of our THR example operates as follows [72]. For each slice, we use a deformable model technique to
detect the two-dimensional contour of the bone. We implement the technique of
Kass et al. [76] with some modifications as explained next. In [76] an energy is
defined that incorporates the stretching and bending energies of the contour as well

362 Extracting Surface Models of the Anatomy from Medical Images

(a)

(b)

Figure 6.11: Iso-surface extracted from a CT-scan image. (a) model of cranium extracted
as an iso-surface and simplified. (b) the surface is color-coded (with Gouraud shading)
according to the curvature as in Fig. 6.10. (For a color version of this Figure see Plate 2 in
the color section of this book.)

Figure 6.12:

Because of bone density variations, iso-contouring methods may produce

inadequate results, even if the iso-value is chosen differently for each slice. We show here
a CT slice taken in the distal femur region (region of the condyles) along with a best attempt
(yet widely unsuccessful) at segmenting the bone using iso-contours (gray almost-fractallike curves). A segmentation using our deformable model implementation is shown using
the white curve (another gray curve next to the white curve represents a preliminary result
obtained using a smoothed low-resolution version of the image).

Building surfaces from two-dimensional contours 363


as a potential field that measures closeness to the data; a two-dimensional contour
modeling of the bone is obtained at a minimum of the energy. The contour is then
modeled with a series of points, and the energy is minimized by solving a partial
differential equation using a finite difference method.
In our implementation we have chosen a smoothed image gradient norm for
the potential field. The user is asked to select a few points in the vicinity of the
structures of interest. The system then connects such points by a polygon, samples
new points on that polygon, and uses such sampled points as an initial estimate
for the deformable model (Fig. 6.13). As discussed in Chapter 3, a considerable
research effort was recently devoted to extend [76] for attracting the contours from
a long range and thus automating the process better 9 . To define a longer range
potential, we have implemented a two-step process, whereby a first deformable
model is attracted by a smoothed low-resolution version of the potential; the result
is then used as an initialization for a second deformable model that is attracted by
the full resolution potential. This process was particularly useful to reuse the same
initial points from slice to slice and limit the user input. Such reuse of the input
points is shown in Fig. 6.13. In this figure, the white curve model corresponds to
the first iteration of the deformable model, operating at a coarse scale, while the
black curve corresponds to the second iteration of the deformable model, operating
at a fine scale.
6.4.2

Tiling contours into a surface portion

The problem of tiling two-dimensional contours extracted from two parallel 10


slices can be quite complex. There are essentially two cases: (1) the correspondence between two contours is established and corresponding vertices for the beginning and end of contours are determined; (2) the correspondence must yet be
determined.
In order to solve either problem both volume-based and surface-based
approaches have been proposed. In volume-based approaches [3], contours are
scan-converted to binary images. Intermediate (in-between) binary images may
be interpolated to build a volume using various interpolation methods, including
shape-based interpolation [77]. Shape-based interpolation methods use distance
transforms [78] applied to binary images to assign grey levels to intermediate slices.
Resulting grey-level images are then thresholded. A surface portion can then be
extracted from the volume data that was built by the interpolation process using
iso-surface extraction methods (see Section 6.3). While to some extent, the process
of back converting to pixels/voxels a contour that was previously extracted from
pixel/voxel data is wasteful, volume-based methods are guaranteed to produce a
valid surface.
9

Another avenue of research seeks to automate changes of topology. We briefly study some
surface modeling aspects of this issue in Section 6.5.
10
The methods could be extended to nonparallel slices, but they dont seem to occur in practice.

364 Extracting Surface Models of the Anatomy from Medical Images

Figure 6.13: Using deformable contours for surface extraction. A hierarchical deformable
contour can cope with incorrect user input (e.g., rightmost point marked with a square), or
reuse such input for multiple slices.

Surface-based (or polygon-based) methods concentrate on determining a set of


polygons or triangles that span the vertices of contours extracted from the slices.
Several approaches address the general problem (2) associated with complex correspondence and branching situations between contours [69, 70, 7982]. Boissonnat and Geigers approaches [70, 79] use the notion of proximity, formalized using Voronoi diagrams, to decide which vertices to connect with triangles. Meyers et al. [69] represent (approximate) contours with ellipsoids and objects as cylinders connected with a graph; this representation is used to infer branchings. Bajaj et al. [82] impose further constraints on the problem of shape reconstruction and
use them to derive precise correspondence and tiling rules.
In several respects, the correspondence and (particularly) branching problems
are still not completely solved at the time this chapter is written: with some particular configurations of contours, even sophisticated algorithms may still fail to
produce a suitable tiling.
One major argument in favor of surface-based models has traditionally been
that these methods allow significant data reduction [70]: contours may be individually approximated using fewer polygonal edges, and triangles/polygons are
built only to span such vertices. In contrast, when using volume based methods,
contours are discretized using small voxels, and an iso-surface will likely produce a large collection of small triangles/polygons with a size comparable to a

Building surfaces from two-dimensional contours 365


voxel. However, since [70] was published, there has been considerable interest
and progress in polygonal surface simplification methods. This is studied in Section 6.6.2. Iso-surfaces can be simplified as a post process with bounds on the
computational burden as well as the resulting surface deviation (degradation); isosurface extraction followed by simplification has been demonstrated as an effective
tool for real-world (clinical) applications [3]. The examples of Section 6.3.6 illustrate simplified iso-surfaces. For surfaces with complex topologies that represent a
challenge for surface-based methods, volume-based methods, combined with data
reduction, offer a guarantee of success, perhaps at a higher computational cost.
6.4.2.1

Tesselating when the correspondence is known

For many data-sets encountered in practice, the topological changes between


adjacent contours can be resolved without too much difficulty, and the remaining
challenge is to tesselate the contours. We next illustrate this process with a concrete
example.
Returning to our example involving Total Hip Replacement surgery data, the
full generality of the above methods is not necessary for most of the femur length,
as contours are in one-to-one correspondence. Branching contours exist at the head
of the femur and condyles.
When the correspondence between two contours is obvious, we may use a variation of a method described in [67, 83] to build a piece of surface connecting the
two contours. We define a measure of surface quality as the sum of the measures
of each individual triangle quality. As in Keppel [83], we can then build a set of
triangles that connects the two contours and corresponds to a true optimum of the
surface quality measure. Typical criteria that have been proposed so far include
minimizing surface area or maximizing the volume enclosed by the surface portion
between the two slices. The exact volume enclosed by a triangular mesh without
boundary can be easily computed as a sum of volumes of tetrahedra spanned by an
arbitrary origin and the triangles of the triangular mesh. After a coordinate transformation, we can assume that the origin has coordinates   . The volume  is
given by the following sum, wherein       designates a triangle of the mesh
(   , and  are indices of the three vertices of the triangle having label ). Only
the triangles of the surface connecting the contours contribute to the sum since the
triangles capping the surface above and below cannot be modified.


 



    

 



    










 
 
 

 
 
 



 
 
 





(6.3)

(6.4)

366 Extracting Surface Models of the Anatomy from Medical Images

(a)
Figure 6.14:

(b)

Femur models obtained by tiling contours. The models have been simpli-

fied as explained in Section 6.6.2. (a) Proximal femur (1,664 triangles). (b) distal femur
(4,199 triangles).

To produce the pictures shown in Fig. 6.14, we have maximized the sum of triangle
compactnesses [84]. In [84], the compactness  of a triangle is defined with the
following formula:






    


(6.5)

where is the positive area of the triangle and        are the lengths of the
three sides. This formula defines a dimensionless measure similar to the area-toperimeter ratio; this
measure can be computed inexpensively, without evaluating
square roots (except , which is precomputed or read from a table).
Following [67, 83], we consider a rectangular graph (grid) whose    nodes
represent edges between  and  vertices in the respective adjacent contours. Edge
  (which is the same as    ) is chosen as a closest point pair between
the contours. A path in the graph from node   to node     represents
a surface connecting the two contours. This graph is shown in Fig. 6.15(a). Starting
with Edge 00, we may decide to construct a triangle by connecting the edge with
with the next vertex on the top contour (with  vertices), which is represented in
the graph by drawing a horizontal arrow starting from vertex (0,0) and pointing
left. Alternatively, we may decide to connect Edge 00 with the next vertex on
the bottom contour (with  vertices), which is represented in the graph by drawing
a vertical arrow starting from vertex (0,0) and pointing down. Starting from the
graph node pointed by the arrow (this node represents an edge in the tesselation,

Building surfaces from two-dimensional contours 367

0
m
0
n

n
0
(a)
Figure 6.15:

(b)

Tesselating two closed contours. When the correspondence between two

contours comprising  and  vertices is known, and a first edge connecting the two contour

is chosen (Edge 00), an optimum of a user-defined surface quality criterion is built using

dynamic programming. (a) rectangular graph whose    nodes represent edges between
 and  vertices in the respective adjacent contours. Each node has a cost associated,
representative of an optimum path from (0,0) to that node; (b) edges drawn in gray map the
gray path in A (see text).

either Edge 10 or Edge 01), we may again decide to connect with a vertex from
either the top or the bottom contours. Both choices may be recorded in the graph.
By examining carefully Fig. 6.15, the reader will be able to convince him/herself
that all possible tesselations starting with Edge 00 may be represented as a path in
the toroidal graph of Fig. 6.15(a).
For each triangle, we compute a cost function (area, volume, or else) as determined by the user. The cost of a particular tesselation is obtained by adding the
costs of all triangles. The costs for all possible paths may be computed by filling
an  by  table, wherein each table entry corresponds to a vertex of the graph of
Fig. 6.15(a), and represents the cost of the optimum path leading to that vertex. The
cost of Entry     thus represents the cost of the best tesselation. This
process follows a general method called dynamic programming [85].
6.4.2.2

Simple branching and capping

We next illustrate with a concrete example how simple branching situations


may be resolved. In our exemplary data-set, branching situations have been limited

368 Extracting Surface Models of the Anatomy from Medical Images

(a)

(b)

Figure 6.16: Capping of a distal femur model by extrapolating areas and centroids of a few
contours: (a) before; (b) after.

to one primary contour branching to two secondary contours. We may treat


this common situation using a particular case of the method of [70] as follows.
We compute a Voronoi diagram [86] of the points of the two secondary contours,
retaining only the edges of the diagram that bisect a pair of vertices from different
contours. We then project this reduced diagram on the plane supporting the primary
contour. It would generally intersect the primary contour in two locations that,
together with segments falling inside the primary contour, may be used to split the
contour into two portions; each portion may be tiled separately to one secondary
contour.
CT slices generally fail to capture the exact location where an object starts or
ends. This is the case for a CT scan of the femur in the vicinity of the condyles (the
femur is visible in one slice and invisible in the next slice). In [87] we developed
an original method to cap the surface: using a few contours, we fit an ellipsoid
of revolution to the data formed by the contour areas and centroids. A better fit is
generally obtained by allowing the ellipsoids main axis to be tilted with respect to
the scans main axis. The fitted model then predicts that the equation of the area
of the intersection between the ellipsoid and the slice is a parabola as a function


of the slices  coordinate:        , with   and 
 (the area
must decrease from a positive value). After fitting    to the data, at the slice
 coordinate that predicts an area of zero, we create a new surface vertex ( and 
coordinates may be predicted with the centroid data) and link it with triangles to all
edges of the last contour: the surface is thus capped. This operation is illustrated in
Fig. 6.16.
6.5

Some topological issues in deformable surfaces

In previous sections we provided the complete details of methods for extracting surfaces from segmented volume data (Section 6.3) and from a collection of

Some topological issues in deformable surfaces 369


contours that were segmented from two-dimensional image data (Section 6.4). We
thus assumed that the processes of segmentation and surface extraction were decoupled. This is often the case, reflecting the organization of the present handbook
comprising separate chapters for segmentation (Chapter 2) and surface extraction
(present chapter).
In several situations, however, it is useful to combine the two processes. Threedimensional deformable models, which are covered in a more comprehensive fashion in Chapter 3, allow us to do just that. With deformable surface models the
characteristics of the surface (continuity, curvature) are used to guide the segmentation process by bridging areas where the image data is ambiguous, or by ignoring
spurious data. In this section we study how the type of representation selected
for the deformable surface (see Section 6.2) affects the result and particularly its
topology.
The first method that we present uses tensor-product B-splines for representing
the surface with a fixed topology. It is described with more detail in [18].
6.5.1

Tensor-product B-splines

B-spline basis functions are piecewise polynomials with a finite support and a
recursive definition. The spline basis functions of degree zero are the characteristic
functions of the variable   for the intervals between real values   called
knots:

!  



   

otherwise

Higher-order splines are formed by convolving lower-order splines:

!  




!  
!

 

 

(6.6)

The ! functions have degree " . They are globally  . The evaluation
of (6.6) is especially efficient, i.e., can be computed with divided differences, and
can be used to implement splines of all orders. We plot in Fig. 6.17 a quadratic
B-spline function. In order to model a curve, we associate a control vertex with
each function. With the shape of the functions in Fig. 6.17(a), endpoints are interpolated. It is also possible to obtain a closed curve, with the help of functions as in
Fig. 6.17(b).
Cubic spline functions (degree "  ) are particularly useful because of their
property of minimizing the bending energy in # (see for instance [88, 89]).
In order to model a surface patch  , we construct a tensor product of spline
functions, i.e., a surface point    will be written as
     ! !    , where we have fixed and suppressed " .
When using the B-spline functions of the type of Fig. 6.17(a), if none of the
control vertices is repeated, the surface has a planar topology. When using the Bspline functions of the type of Fig. 6.17(b) for one of the variables (say ) setting

370 Extracting Surface Models of the Anatomy from Medical Images


1

0.8
0.6

0.6

0.

0.

0.2

0.2

10

30

50

10

20

30

50

60

(a)

(b)

Figure 6.17: B-spline functions. (a) Uniform quadratic singular B-spline functions. If a

control point is attached to each function, a  curve that interpolates endpoints is obtained.
(b) Uniform quadratic periodic splines. The two last control points are set equal to the two
first in order to obtain a closed  curve.

the last " control points along  equal to the " first for each row of control points,
we obtain a cylindrical topology. When using the B-spline functions of the type of
Fig. 6.17(b) for the two variables   and setting the first " control points equal
to the last " for each row and each column of control points, a toroidal topology is
obtained.
The spherical topology is slightly more difficult to obtain than the planar, cylindrical, and toroidal topologies. The control vertices may be organized along an even
number of meridians, aligned along one parametric direction. As in the toroidal
case, B-spline functions of the type of Fig. 6.17(b) are used for both parametric
directions. Control points should be repeated, and one must distinguish between
odd order and even order splines. For instance, in the case of biquadratic B-splines
illustrated in Fig. 6.19, each two terminal control vertices on a meridian are set
equal to the two first control vertices on the opposite meridian (which is why the
number of meridians must be even).
When using one B-spline surface patch, only the above mentioned four different
topologies may be obtained (planar, cylindrical, toroidal, and spherical). This is a
significant limitation for a large number of anatomical structures.
In order to deform the surfaces, the same mechanisms that were developed for
snakes may be used [76]. An energy, compounding surface tension and bending and external forces defined using the data to be segmented (often, depending
upon the gradient of image data), is minimized. As the energy term is generally
too complex to be minimized in closed form (it depends upon local voxel values
and their derivatives), the minimization is generally performed using an iterative
process, for instance an iterative solution of a partial differential equation, or a
sequence of least-square approximations [18]. Each step of the minimization corresponds to a new position for the surface. Such an iterative deformation process

Optimization 371
is illustrated in Fig. 6.18. An initial B-spline surface approximating a cylinder is
positioned inside a three-dimensional volume of MRI data. Selected slices of the
MRI data set are shown in Fig. 6.18(a). The surface then evolves and converges to
approximate the epidermis [Fig. 6.18(b)(d)]. A more detailed study of deformable
models and their convergence is available in Chapter 3.
Splines of order 3 and higher are particularly useful for computing surface curvatures. The ability to compute surface curvatures is one of the justifications for
using smooth surface models (see also Cohen et al. [90]). Curvatures are visualized
in Figs. 6.18 and 6.20. In order to perform these curvature computations, since the
spline surfaces are of the form    with derivatives with respect to  and easily available, it is possible to use standard formulae. These formulae can be found
in textbooks of differential geometry [91] or in [92]. Recent work has demonstrated
that high curvature features are very useful for registration purposes [93,94]. However, several methods exist for computing curvatures on triangular meshes [53, 54].
6.5.2

Dynamic changes of topology

Few methods using curved surface models and allowing dynamic changes of
topology have been proposed. The method of Leitner and Cinquin [95] does so
using cubic B-spline surfaces. This method starts with a spherical topology. A
hole is created (evolving to a torus) if the method detects a self-intersection of
the surface at some point. To create more complex topologies, several smoothly
connected B-spline patches are used.
The study of recent work indicates that when using a triangular mesh to represent a surface, there is more flexibility in changing the topology. Lachaud and
Montanvert [96] propose a method in which the topology can be changed at each iteration. After the vertices have been moved using rules similar to other deformable
models, simple proximity constraints determine whether vertices should be created
or removed or whether a connection between surface bodies should be created or
removed. Figure 6.21 illustrates the operation of the method of [96] for extracting
brain vessels from a MRI scan.
McInerney and Terzopoulos [97] tessellate volume space using a tetrahedral
mesh. After each evolution of the surface, the vertices that are crossed in the mesh
are marked, and the surface is retiled (from the tetrahedral mesh data) as explained
in Section 6.3. Owing to the mechanism used for marking the tetrahedral mesh
vertices, the surface can only expand everywhere or retract everywhere.
6.6

Optimization

The surface models, either spline or polygonal, that were constructed in the
previous sections may benefit from various optimizations. During iso-surface construction or tiling from contours for instance, the difference between intraslice and
interslice sampling ratios may create artifacts, which can be purely visual (such as
a staircase effect) or which can affect further processing such as registration.

372 Extracting Surface Models of the Anatomy from Medical Images

(a)

(b)

(c)

(d)

Figure 6.18: Segmenting a MRI scan using a B-spline deformable surface model. (a) MRI
slices used as input to the B-spline deformable model. (b)(d): successive iterations of the
model with regions of higher curvature highlighted. Dark curves are crest lines, extracted
with an algorithm described in [18].

Optimization 373

Figure 6.19: Network of control vertices for a biquadratic spline surface with spherical
topology. Each pair of opposite meridiansshare three vertices at each pole (north, south),
including the pole vertex itself.

6.6.1

Smoothing

We describe here a method for smoothing polygonal surfaces that was developed by Taubin [98]. A straightforward technique for smoothing a polygonal mesh
consists of replacing each vertex with an average of its neighbor vertices. Taubin
noted that this technique resulted in shrinkage for the objects bounded by the polygonal mesh. His solution involves two steps. The set of vertices connected to a vertex
by an edge is denoted
 . For each of the two steps, a vertex
is updated
according to the following equation:




$ %

&

(6.7)

where the weights & sum to one, the scale factor $ is positive and applied during
the first step, and the scale factor % is negative and applied during the second step.
Note that for a given step the displacements are first computed for each vertex and
then applied all together at the same time. (Otherwise the displacement applied to
a particular vertex would influence the displacements applied to its neighbors.)
The first step of this procedure thus corresponds to applying the traditional
neighbor averaging step (albeit using a scale factor). While it has been observed
that this first step yields shrinkage, the second step moves each vertex in the opposite direction, thereby compensating for the shrinkage effect.

374 Extracting Surface Models of the Anatomy from Medical Images

Figure 6.20: Curvature of a tensor-product spline surface. Maximum surface principal curvature represented using a colormap that varies from blue to red. (For a color version of this
Figure see Plate 3 in the color section of this book.)

Taubin has provided a detailed mathematical justification of this process in [98],


based on application of signal processing. The process of mapping each vertex
to
 & 
 (the equivalent of a Laplacian operator) can be expressed by
multiplying a column vector containing the surface vertices with a (sparse) matrix
" containing the weights & . Eigenvalues of this matrix may be considered as
frequencies and eigenvectors as modes of the triangular mesh. Depending upon the
distribution of these frequencies, a given triangular mesh can be considered to be
smooth, or smoother than another mesh. A smoothing step is performed using the
matrix '  %"  ' $" . If
is an eigenvalue of " ,   %
  $
 is an
eigenvalue of the matrix '  %"  ' $" . Hence (
    %
  $

represents a transfer function. $ % can be set to create various types of filters,
such as a low-pass filter.
Taubin has proposed various suitable combinations for the $ % scaling factors. For instance     is a valid combination. Another parameter of the method is the number of iterations of the process, or smoothing steps.

Optimization 375

(a)

(b)

(c)
Figure 6.21:

Extracting brain vessels from a MR scan using an adaptive topology de-

formable model represented as a triangular mesh. The model evolves inside an image
pyramid; several iterations are performed for a given level [96]: (a) level 3 of the image pyramid; (b) level 1; (c) level 0. (Scan courtesy of UMDS Group, London. Images courtesy of
J.O. Lachaud and A. Montanvert.)

376 Extracting Surface Models of the Anatomy from Medical Images

(a)

(b)

Figure 6.22: Illustration of surface smoothing using the method of [98]: (a) before smoothing; (b) after smoothing: 200 smoothing steps,  
  
 .

Figure 6.22 illustrates the operation of this method on a femur model comprising
180,854 triangles, extracted from a CT scan. The spacing between slices of the
CT data is about 6 times the size of a pixel within a slice. The staircase effect
resulting from this difference in sampling is illustrated in Fig. 6.22(a). The result
of smoothing is visualized in Fig. 6.22(b).
The smoothing affects the vertex coordinates and, thus, the accuracy of the
model. For some applications in which the geometric accuracy cant be compromised, the difference between the surfaces before and after smoothing may be
tracked using the methods in Section 6.6.2.
Smoothing techniques are not limited to the ones presented here. For instance,
a new technique, probably more complex than Taubins, was recently introduced
in [99].
6.6.2

Simplification and levels of detail

Simplification deals with approximating a polygonal (spline) surface using


fewer polygons (spline patches). This is important for processing, visualizing,
and transmitting datasets that are too large for available computing and networking
equipment.
Polygonal surfaces that result from iso-surface extraction (see Section 6.3) are
generally over-tessellated and can greatly benefit from polygon reduction techniques without compromising accuracy.
Also, simplification techniques can be used for generating multi resolution hierarchies comprising several levels of detail. These hierarchies are very useful
for interactive visualization applications, allowing interactive selection of the most
appropriate level of detail of each structure to adapt to the parameters of the visualization. Figure 6.23 illustrates six surface structures of the human torso at three

Optimization 377
different levels of detail. For this application, the simplification process should
preserve the visual appearance of surfaces as much as possible.
A technique for reducing the number of patches in a spline model was proposed by Gopi and Manocha [100]. This method merges adjacent triangular B e zier
patches according to several patterns, to form larger triangular B e zier patches.
To guarantee the accuracy of the approximations, and the faithfulness of visualizations, it is very useful to be able to bound the deviation between the original
model and the approximation. This operation is difficult because the maximum
distance between two polygonal meshes is not reached in general for a pair of vertices. Methods for bounding the maximum surface deviation during simplification
are described in [101105]. Surface simplification methods are reviewed in [84].
Here are some additional references: [106122].
We next focus on the variable tolerance method, which is described fully
in [84] 11 . This method, along with several of the methods referenced above, relies on the atomic operation represented in Fig. 6.24 to reduce the resolution of the
surface: the edge contraction brings together the two endpoints of an edge, thereby
eliminating one vertex and two triangles.
The variable tolerance method further relies on two main processes. The first
process, called the subdivision process, measures the incremental deviation between two corresponding portions of a polygonal surface before and after a simplification operation (i.e., an edge contraction, but this would work with other operations such as vertex or triangle removals [124]). Given the two graphs representing
the two surface portions, the subdivision process builds subdivisions of both surfaces in piecewise planar polygons such that there is a one-to-one correspondence
between the elements and a direct comparison may be performed. Such subdivisions are shown in Fig. 6.25, denoted by ) and  . For each corresponding pair
of planar polygons the maximum distance between the two must occur for a pair
of corresponding vertices of the polygons. The error bounding process described
below uses this information to keep track of a bound on the deviation between the
simplified surface and the original surface.
In practice, the subdivision process only generates pairs of corresponding vertices. As illustrated in Fig. 6.25, for each vertex of one graph that does not appear in
the other graph, a representative must be found in the other graph. This is done by
projecting the vertex on the closest triangle. Pairs of edges belonging to different
graphs must also be tested for potential bridges, which occur when the shortest segment that can be drawn between the two edges has its endpoints inside the edges:
this is the case for edges
 
  and
 
  in Fig. 6.25. The full detail of this
process is described in [84].
The second process, or error bounding process, is used for keeping track of the
deviation after an arbitrary number of simplification operations. The error bounding
process uses an error volume for reporting an error for each surface point (not
11

An earlier description appeared in [123].

378 Extracting Surface Models of the Anatomy from Medical Images

(a)

(b)

(c)
Figure 6.23: Simplification of 6 surfaces (lung, external, tumor, spinal cord, vertebrae,
bolus) extracted and visualized using a clinical radio-therapy visualization system: (a) high
resolution: 46,469 vertices; (b) low resolution: 4,688 vertices; (c) very low resolution: 1,811
vertices. Here, the difference between the surfaces can hardly be seen, which is the ideal
outcome of surface simplification. The simplification method of [84] was used. (Courtesy of
Mike Zeleznik, RAHD Oncology Products.) (For a color version of this Figure see Plate 4 in
the color section of this book.)

Optimization 379

v2

v1

v0

Figure 6.24: An edge contraction consists of bringing together the two endpoints 
of an edge, until they become a single vertex

vm-1

vm-1

tr+1

vm

tr
v2

tm

w0

v3

sr+1

sm-1

sr
vr

sm

t1

vr+1

v0

vr

t0

vr+1

tm-1

vm

sr-1
tr-1

v1

wg

t2

v3

t3

wh
s2

s3

v5

v4

w1
v5

v4

Figure 6.25: Graphs

and

built by the subdivision process. For each vertex of one

graph that does not appear in the other graph, a representative must be inserted in the
other graph (vertices  ). This is done by projecting the vertex on the closest triangle of the
other graph: e.g.,

of

is projected to 

on the closest triangle of

:   


. In

addition, for each pair of edges, a test determines whether the shortest segment that can
be drawn between the two edges has its endpoints inside the edges. When this is the case,
as for edges 

 and 

, both segment endpoints  and  are inserted in their

respective graph. (For a color version of this Figure see Plate 5 in the color section of this
book.)

380 Extracting Surface Models of the Anatomy from Medical Images


only vertex). The error volume is defined by sweeping a ball across the simplified
surface. The radius of the ball varies linearly over a surface triangle, interpolating
the ball radii at the triangle vertices. To specify the error volume only one floating
point value is necessary for each vertex of the simplified surface, representing the
radius of the ball, or error value, at that vertex. The error volume is built gradually
as the simplification progresses, containing the previous error volumes in a manner
similar to Russian dolls. The properties of the error volume are summarized in
Fig. 6.26(a), (b).
The objective of the error bounding process is to compute error values at the
vertices of the surface, representing a bound to the deviation between original and
simplified surface. Referring to Figs. 6.24 and 6.25, the simplest strategy is to determine an error value at the vertex
 ,  defining a valid error volume, and to leave
all other error values unchanged. (One floating point value must be computed.)
Other possible objectives are discussed in [84].
Each pair of corresponding vertices in the graphs ) and  defined above, such
as  and in Fig. 6.25, allows us to define a constraint on the error values: the
error volume should be at least as wide at this location as dist    plus the
width of the previous error volume (before the edge contraction occurred). Each
constraint is represented by an error * and a vector whose origin is (on the
surface before the edge contraction) and whose destination is  (on the surface
after the edge contraction). The origin of is a linear combination of
 and a
point

of a triangle incident to
 (on the edge opposite
 ) with   )
weights. Then, to each constraint we associate the vector    and the ball

*

of radius + 
centered at the tip of , as shown in Fig. 6.27(c).

We then consider the set of vectors  and balls or radii + corresponding to all
the constraints, as illustrated in Fig. 6.27(d). The error value  may be computed
by determining the smallest ball !  centered at the origin enclosing the set of balls.
(For a ball of negative radius, !  is not required to enclose it, but to contain at least
one of its points, potentially a single point where both balls are tangent.)
By allowing the center to move, we can reduce  , the radius of the enclosing
ball. Given an initial ball enclosing a set of balls, we wish to obtain a smaller enclosing ball. We are not particularly interested in the smallest possible ball because
after moving
 , the constraints become stale. A simple solution is to construct the
bounding box of the set of balls, take its center, and determine the ball of smallest
radius with that center.
The method can be extended for preservation of data attached to surface vertices, assuming a linear variation across triangles. This is illustrated by the example
of Fig. 6.30.
6.6.2.1

Examples

The example of Fig. 6.23 is a visualization of 6 surfaces of a human torso at


three levels of detail: high resolution, low resolution (10-fold reduction factor),

Optimization 381

A1
A2

A3
A4
(a)
Figure 6.26:

(b)

Properties of the error volume. The error volume is defined by sweeping

a ball across the simplified surface. The radius of the ball varies linearly over a surface
triangle, interpolating the ball radii at the triangle vertices. (a) Error volume, in green (A1),
centered on the simplified surface (A2). The radii of the balls are such that the original
surface (A3 or A4) is not only contained in the error volume, in dashed red (A3: incorrect),
but also intersects all the spheres, in blue (A4: correct). (b) After several edge contractions,
the resulting error volume contains all the intermediate error volumes. (For a color version
of this Figure see Plate 6 in the color section of this book.)

and very low resolution (25-fold reduction factor). The difference between the
surfaces can hardly be seen in Fig. 6.23, which is a very desirable goal in surface
simplification. Statistics about the simplification of these surfaces are provided in
Table 6.2. The availability of several levels of detail allows for real-time interaction
with the surfaces on a computer display.
Table 6.2:

Vertex counts and timings (CPU seconds measured on a DEC Alpha) for the

level-of-detail computations on the 6 structures of the human torso dataset.

<