Beruflich Dokumente
Kultur Dokumente
1
of shape and texture variation and the optimization performed on the true shapes according to the deni-
of the model. Finally, an improved method for auto- tion. As shape metric in the alignment the Procrustes
mated initialization of AAMs is devised. distance [13] is used. Other shape metrics such as the
For a commented pictorial elaboration on the sections Hausdor distance [14] could also be considered.
below including the alignment process refer to Consequently it is assumed that the set of N shapes
appendix A. constitutes some ellipsoid structure of which the cen-
troid the mean shape can be estimated as:
2.1 Shape & Landmarks N
X
x = N1 xi (2)
The rst matter to clarify is: What do we actually i=1
understand by the term shape? This paper will adapt
the denition by D.G. Kendall [8]: The maximum likelihood estimate of the covariance
matrix can thus be given as:
Denition 1 : Shape is all the geometri-
cal information that remains when location, N
X
scale and rotational eects are ltered out = N1 ( xi ; x)(xi ; x)T (3)
from an object. i=1
2
close resemblance of some of the AAM techniques to
techniques in computer graphics. b = Qc (9)
In computer graphics, the term texture relates di-
rectly to the pixels mapped upon virtual 2D and 3D The PCA scores are easily obtained due to the linear
surfaces. Thus, we derive the following denition: nature of the model:
Denition 3: Texture is the pixel inten- b= Wsbs = WsT Ts (x ; x) (10)
sities across the object in question (if neces- bg g (g ; g)
sary after a suitable normalization).
where a suitable weighting between pixel distances
A vector is chosen, as the mathematical representa- and pixel intensities is obtained through the diagonal
tion of texture, where m denotes the number of pixel matrix Ws . An alternative approach is to perform
samples over the object surface: the two initial PCAs based on the correlation matrix
as opposed to the covariance matrix.
Now using simple linear algebra a complete model
g = (g1 ; : : : ; gm)T (7) instance including shape, x and texture, g, is gener-
ated using the c-model parameters.
In the shape case, the data acquisition is straightfor-
ward because the landmarks in the shape vector con-
stitute the data itself. In the texture case one needs a x = x + sWs;1Qs c (11)
consistent method for collecting the texture informa-
tion between the landmarks, i.e. an image warping
function needs to be established. This can be done g = g + g Qg c (12)
in several ways. Here, a piece-wise ane warp based
on the Delaunay triangulation of the mean shape is Regarding the compression of the model parameters,
used. Another, theoretically better, approach might one should notice that the rank of Q will never exceed
be to use thin-plate splines as proposed by Bookstein the number of examples in the training set.
[1]. For details on the Delaunay triangulation and
image warping refer to [11, 21]. Observe that another feasible method to obtain the
Following the warp sampling of pixels, a photometric combined model is to concatenate both shape points
normalization of the g-vectors of the training set is and texture information into one observation vector
done to avoid the inuence from global linear changes from the start and then perform PCA on the correla-
in pixel intensities. Hereafter, the analysis is identical tion matrix of these observations.
to that of the shapes. Hence a compact PCA repre-
sentation is derived to deform the texture in a manner 2.4 Optimization
similar to what is observed in the training set:
In AAMs the search is treated as an optimization
g = g + g bg (8) problem in which the dierence between the synthe-
sized object delivered by the AAM and an actual im-
Where g is the mean texture; g represents the eigen- age is to be minimized.
vectors of the covariance matrix and nally bg are the In this way by adjusting the AAM-parameters (c and
modal texture deformation parameters. pose) the model can deform to t the image in the
Notice that there will always be far more dimensions best possible way.
in the samples than observations thus leading to rank Though we have seen that the parameterization of the
deciency in the covariance matrix. Hence, to ef- object class in question can be compacted markedly
ciently compute the eigenvectors of the covariance by the principal component analysis it is far from an
matrix one must reduce the problem through use of easy task to optimize the system. This is not only
the Eckart-Young theorem. Consult [5, 22] or a text- computationally cumbersome but also theoretically
book in statistics for the details. challenging optimization theory-wise since it is
not guaranteed that the search-hyperspace is smooth
2.3.1 Combined Model Formulation and convex.
However, AAMs circumvent these potential problems
To remove correlation between shape and texture in a rather untraditional fashion. The key observation
model parameters and to make the model repre- is that each model search constitutes what we call a
sentation more compact a 3rd PCA is performed prototype search the search path and the optimal
on the shape and texture PCA scores of the training model parameters are unique in each search where the
set, b to obtain the combined model parameters, c: nal model conguration matches this conguration.
3
These prototype searches can be performed at model observed texture dierences, gure 1 shows the actual
building time; thus saving the computationally ex- and the mean predicted displacement from a number
pensive high-dimensional optimization. Below is de- of displacements. The error bars correspond to one
scribed how to collect these prototype searches and standard deviation.
how to utilizes them into a run-time ecient model Hence, the optimization is performed as a set of it-
search of an image. erations, where the linear model, in each iteration,
It should be noticed that the Active Blobs approach predicts a set of changes in the pose and model pa-
is optimized using a method quite similar to that of rameters leading to a better model to image t. Con-
AAMs named dierence decomposition as introduced vergence is declared when an error measure is below
by Gleicher [12]. a suitable threshold.
As error measure, we use the squared L2 norm of the
2.4.1 Solving Parameter Optimization O- texture dierence, jgj2 . To gain a higher degree of
line robustness, one might consider using the Mahalanobis
distance or a robust norm such as the Lorentzian error
It is proposed that the spatial pattern in I can pre- norm [20]. Fitness functions allowing for global non-
dict the needed adjustments in the model and pose linear transformations such as the mutual information
parameters to minimize the dierence between the [23, 24] measure might also be considered.
synthesized object delivered by the AAM and an ac-
tual image, I:
2.5 Initialization
I = Iimage ; Imodel (13) The optimization scheme described above is inher-
The simplest model we can arrive at constitutes a ently sensitive to a good initialization. To accommo-
linear relationship: date this, we devise the following search-based scheme
thus making the use of AAMs fully automated. The
technique is somewhat inspired by the work of Cootes
c = RI (14) et al. [9].
To determine a suitable R in equation (14), a set of The fact that the AAMs are self-contained or gener-
experiments are conducted, the results of which are ative is exploited in the initialization i.e. they can
fed into a multivariate linear regression using prin- fully synthesize (near) photo-realistic objects of the
cipal component regression due to the dimensional- class that they represent with regard to shape and
ity of the texture vectors. Each experiment displaces textural appearance. Hence, the model, without any
the parameters in question by a known amount and additional data, is used to perform the initialization.
measuring the dierence between the model and the The idea is to use the inherent properties of the AAM-
image-part covered by the model. optimization i.e. convergence within some range
from the optimum. This is utilized to narrow an ex-
haustive search from a dense to sparse population of
Quality of y−displacement prediction.
20
5
This has proven to be both feasible and robust. A set
predicted dy (pixels)
4
by using more elaborate error measures. In pseudo- 4 Experimental Results
code, the initialization scheme for detecting one ob-
ject per image is: Segmentation in medical images has always posed a
dicult problem due to the special image modalities
1. Set emin = 1 and m to a suitable low number (CT, MRI, PET etc.) and the large biological vari-
(we use m = 3) ability. To assess the performance of AAMs in such
2. Obtain application specic search ranges within environments, our implementation has been tested
each parameter (e.g. ; c1 etc.) upon radiographs of human hands and cardiac MRIs.
3. Populate the space spanned by the ranges as
sparsely as the linear regression allows by a set 4.1 Radiographs of Metacarpals
of sampling vectors V = fv1 ; : : : ; vn g.
4. For each vector in V Segmentation in radiographs (x-rays) pose a dicult
problem due to large shape variability and inherent
5. Do AAM optimization (max m iterations) ambiguity of radiographs. This forms a suitable chal-
6. Calulate the t, efit , as given by (15) lenge. Other attempts to perform segmentation in
radiographs include the work of Eord [10], where
7. If efit < emin Then emin = efit , vfit = vn ASMs and other methods were used.
8. End Twenty radiographs of the human hand were obtained
and three metacarpal bones were annotated using 50
The vector vfit will now hold the initial conguration. points on each. The annotation of metacarpals 2,3
Notice that the application specic search ranges in and 4 were concatenated into a 150-point model. To
step 2 is merely a help to increase initialization time incorporate a more substantial texture contrast into
and robustness than it is a requirement. If nothing the model, additional 150 points were placed along
is known beforehand, step 2 is eliminated and an ex- the normal of each model point, thus arriving at a
haustive search is performed. 300-point shape model. The texture model consisted
of approx. 10.000 pixels. Using 16 model param-
This approach can be accelerated substantially by eters, 95% of the variation in the training set were
searching in a multi-resolution (pyramidal) represen- explained.
tation of the image. Automated initialization of the model followed by op-
timization reached a mean location accuracy of 0.98
3 Implementation pixels (point to associated border [2, 4]) when testing
on four unseen images with ground truth annotations.
The mean texture error was approx. 7 gray levels (in-
All experiments, illustrations etc. have been made us- put images were in the byte range).
ing the Active Appearance Models Application Pro- Examples of initialization and optimization are given
grammers Interface (AAM-API) developed in the in the gures 2-5. Notice a fairly good t even in
C++ language and based on the Windows NT plat- the distal (upper) and proximal (lower) end of the
form. This API will be released under the open source metacarpals where radiographs are rather ambiguous.
initiative in Autumn 20001, which means that other
freely can download, use and elaborate on the AAM- To asses the performance within points, the mean
API. point to point distance is plotted in gure 6. Not
The foundation for the AAM-API is the Intel Math surprisingly, problems arise in the distal and proximal
Kernel Library for fast MMX implementation of end of the metacarpals due the large shape variability
BLAS, MS VisionSDK for image handling, Im- and the ambiguous nature of radiographs in regions
ageMagick for image I/O and nally DIVA for image of overlap.
processing and matrix handling.
AAM-API performance compares to that of Cootes 4.2 Cardiac MRIs
et al. As an example, the MRI optimizations took
each 200 ms on average on a PII 350 MHz. Much Another application in medical imaging is locating
eort has been put into providing documentation and structures of 4D (space, time) cardiovascular mag-
educational features such as movies of the modes of netic resonance images. Temporal registration rel-
variation, model search etc. ative to the heart cycle has been done using ECG-
Further info on AAMs, the AAM-API and full source triggered image acquisition. The pixel depth was 8
code documentation can be obtained at the AAM- bits.
site.2 An AAM has been built upon only four spatially
1Probably under the GNU Public License. and temporally corresponding 2D slices of four dif-
2 http://www.imm.dtu.dk/aam/ ferent hearts. The endocardial and epicardial con-
5
Figure 2: Model border after automated initialization
(cropped). Figure 5: Optimized AAM (cropped).
6
ously been done by Mitchell et al. [19]. A total of
102 images were used for the training set reaching a
mean point accuracy of approx. 1 pixel on the endo-
cardial and epicardial contour. Annotated structures
were the right ventricle and endocardial and epicar-
dial contours. The model was initialized manually.
7
programming. Furthermore, we have experienced the [9] G. J. Edwards, T. F. Cootes, and C. J. Taylor. Ad-
AAM approach to be data-driven (non-parametric), vances in active appearance models. In Proc. Int.
self-contained and fast. We also notice that AAMs Conf. on Computer Vision, pages 137142, 1999.
extend to multivariate imaging and higher spatial di- [10] N. D. Eord. Knowledge-Based Segmentation and
mensions - i.e. into 3D models etc. Feature analysis of Hand Wrist Radiographs. Tech.
Report, University of Leeds, 1994.
More information on AAMs, papers, presentations,
movies of model searches etc. can be obtained at [11] C. A. Glasbey and K. V. Mardia. A review of
the AAM-site at http://www.imm.dtu.dk/aam/. image-warping methods. Journal of Applied Statis-
A more comprehensive treatment of the work of tics, 25(2):155172, 1998.
M. B. Stegmann can be found in [22]. [12] M. Gleicher. Projective registration with dierence
decomposition. In Proc. 1997 Conf. on Computer Vi-
sion and Pattern Recognition, pages 331337. IEEE
Comput. Soc, 1997.
6 Acknowledgements [13] C. Goodall. Procrustes methods in the statistical
analysis of shape. Journal of the Royal Statistical
Cardiac MRIs were provided and annotated by M.D. Society, Series B, 53:285339, 1991.
Jens Christian Nilsson and M.D. Bjørn A. Grønning, [14] D. P. Huttenlocher, G. A. Klanderman, and W. J.
H:S Hvidovre Hospital. M.Sc.Eng. Torben Lund pro- Rucklidge. Comparing images using the Hausdor
vided the annotation tool. distance. IEEE Trans. on Pattern Analysis and Ma-
chine Intelligence, 15(9):850863, 1993.
Dr. Rasmus Larsen, M.D. Jens Christian Nilsson and
M.D. Bjørn A. Grønning is acknowledged for their [15] J. Isidoro and S. Sclaro. Active voodoo dolls:
very useful comments on the manuscript. a vision based input device for nonrigid control.
In Proc. Computer Animation '98, pages 137143.
The work of M. B. Stegmann is in part supported by IEEE Comput. Soc, 1998.
a grant from Pronosco A/S. [16] A. K. Jain, Y. Zhong, and M.-P. Dubuisson-Jolly.
Deformable template models: A review. Signal Pro-
cessing, 71(2):109129, 1998.
References [17] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Ac-
tive contour models. International Journal of Com-
[1] F. L. Bookstein. Principal warps: thin-plate splines puter Vision, 8(2):321331, 1988.
and the decomposition of deformations. IEEE Trans- [18] T. McInerney and D. Terzopoulos. Deformable mod-
actions on Pattern Analysis and Machine Intelli- els in medical image analysis: a survey. Medical Im-
gence, 11(6):56785, 1989. age Analysis, 2(1):91108, 1996.
[2] T. F. Cootes, G. Edwards, and C. J. Taylor. A com- [19] S. Mitchell, B. Lelieveldt, R. Geest, J. Schaap,
parative evaluation of active appearance model algo- J. Reiber, and M. Sonka. Segmentation of cardiac
rithms. In BMVC 98. Proc.of the Ninth British Ma- mr images: An active appearance model approach. In
chine Vision Conf., volume 2, pages 680689. Univ. Medical Imaging 2000: Image Processing, San Diego
Southampton, 1998. CA, SPIE, volume 1. SPIE, 2000.
[3] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Ac- [20] S. Sclaro and J. Isidoro. Active blobs. Proc. of the
tive appearance models. In Proc. European Conf. IEEE ICCV, pages 11461153, 1998.
on Computer Vision, volume 2, pages 484498. [21] J.R. Shewchuk. Triangle: engineering a 2D qual-
Springer, 1998. ity mesh generator and Delaunay triangulator. In
[4] T. F. Cootes, G. J. Edwards, and C. J. Taylor. Applied Computational Geometry. FCRC'96 Work-
Comparing active shape models with active appear- shop., pages 203222. Springer-Verlag, 1996.
ance models. In Proc. British Machine Vision Conf., [22] M. B. Stegmann. Active appearance models: Theory
pages 173182, 1999. and cases. Master's thesis, Department of Mathe-
[5] T. F. Cootes and C. J Taylor. Statistical Mod- matical Modelling, Technical University of Denmark,
els of Appearance for Computer Vision. Tech. Re- Lyngby, 2000.
port, ed. 02-03-2000 , University of Manchester, [23] C. Studholme, D. L. G. Hill, and D. J. Hawkes.
http://www.isbe.man.ac.uk/bim/, 1999. An overlap invariant entropy measure of 3D medical
[6] T. F. Cootes and C. J. Taylor. Combining elastic and image alignment. Pattern Recognition, 32(1):7186,
statistical models of appearance variation. In ECCV 1999.
2000, Proc. European Conference on Computer Vi- [24] P. Viola and W. M. Wells III. Alignment by maxi-
sion, volume 1, pages 149163, 2000. mization of mutual information. International Jour-
[7] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Gra- nal of Computer Vision, 24(2):137154, 1997.
ham. Active shape models - their training and appli- [25] A. L. Yuille, P. W. Hallinan, and D. S. Cohen. Fea-
cation. Computer Vision and Image Understanding, ture extraction from faces using deformable tem-
61(1):3859, 1995. plates. International Journal of Computer Vision,
[8] I. L. Dryden and K. V. Mardia. Statistical Shape 8(2):99111, 1992.
Analysis. John Wiley & Sons, 1998.
8
A Illustrated Cardiac AAM
Figure 12: Point cloud of four unaligned heart cham- Figure 16: Point variation of the four annotations;
ber annotations. radius = x + y . Notice the large point variation to
the lower left.
120
20 40 60 80 100 120