Sie sind auf Seite 1von 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/328983256

Using Octuplet Siamese Network For Osteoporosis Analysis On Dental


Panoramic Radiographs

Conference Paper  in  Conference proceedings: ... Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society.
Conference · July 2018
DOI: 10.1109/EMBC.2018.8512755

CITATIONS READS
3 189

9 authors, including:

Vasileios Megalooikonomou Fan Yang


University of Patras Temple University
247 PUBLICATIONS   1,938 CITATIONS    20 PUBLICATIONS   319 CITATIONS   

SEE PROFILE SEE PROFILE

Xinyi Li Haibin Ling


Temple University Temple University
2 PUBLICATIONS   3 CITATIONS    143 PUBLICATIONS   6,856 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Frailsafe View project

ARMOR project (FP7-ICT, No. 287720) View project

All content following this page was uploaded by Fan Yang on 19 November 2018.

The user has requested enhancement of the downloaded file.


Using Octuplet Siamese Network For Osteoporosis Analysis On Dental
Panoramic Radiographs
Peng Chu1 , Chunjuan Bo2 , Xin Liang3 , Jie Yang4 , Vasileios Megalooikonomou5 ,
Fan Yang1 , Bingyao Huang1 , Xinyi Li1 , and Haibin Ling1?

Abstract— Dental Panoramic radiography (DPR) image pro-


vides a potentially inexpensive source to evaluate bone den-
sity change through visual clue analysis on trabecular bone 1
2
structure. However, dense overlapping of bone structures in
DPR image and scarcity of labeled samples make learning of 3 4
accurate mapping from DPR patches to osteoporosis condition
challenging. In this paper, we propose a deep Octuplet Siamese
Network (OSN) to learn and fuse discriminative features for
osteoporosis condition prediction using multiple DRP patches. 7
8
By exploring common features, OSN uses patches of eight 5 6
locations together to train the shared feature extractor. Feature
fusion for different location adopts both accumulation and con-
catenation with fully considering of patches’ spatial symmetry.
In our dedicated two-stage fine-tuning scheme, an augmented
texture analysis dataset is employed to prevent overfitting in
transferring weights learned on ImageNet to DPR dataset when
using merely 108 samples. Leave-one-out test shows that our
proposed OSN outperforms all other state of the art methods Fig. 1. DPR image with eight ROIs. Same color represents the same group.
in osteoporosis category classification task.

I. I NTRODUCTION
Osteoporosis is the most common bone disease affecting osteoporosis prediction. Faber et al. [5] exploit Fourier and
millions of people every year. Reference-standard diagnosing wavelet analysis to detect trabecular changes in osteoporosis.
of osteoporosis conditions involves bone mineral density Tosoni et al. [6] compare the pixel intensity values and fractal
(BMD) measurements by dual-energy X-ray absorptiometry dimensions in some selected mandibular regions. In [7] and
(DXA), which is expensive and has limited availability in [8], correlation between various CBCT features and different
population. Recently, increasing works have demonstrated gender-age groups are investigated, which implicitly relate to
the feasibility of using relative inexpensive panoramic radio- bone quality.
graphy for BMD estimation. In [1], relation between DXA However, creating direct and reliable mapping from tra-
measurements and some densitometric and linear measure- becular panoramic radiography images to osteoporosis con-
ments including mandibular cortical thickness and panoramic dition is challenging. Bo et al. [9] propose a pioneer work
mandibular index are investigated. [2] and [3] verify the to build a two stage SVMs learning framework to combine
correlations of several panoramic radiomorphometric indices information in multiple panoramic radiography patches for
with lumbar spine and hip BMDs, and confirm the possibility osteoporosis status classification. However, only general pre-
of using mental index, mandibular cortical index and visual defined feature descriptors are investigated in their work.
estimation of cortical as osteoporosis predictors. Image Each pixel in panoramic radiography image is the accu-
feature information of trabecular bone provides another mulation of X-ray response in depth dimension, trabecular
promising source for osteoporosis evaluation. In [4], Parfitt et bone response thus is usually overlapping with response
al. show that texture changes in iliac trabecular bone, such from other structures. Existing pre-defined feature descriptors
as surface texture, volume and thickness, can be used for work well on modeling low level image clue. For remedy of
1
overlapping in panoramic radiography images, higher level
Peng Chu, Fan Yang, Bingyao Huang, Xinyi Li and Haibin information is essential.
Ling are with Department of Computer and Information Sciences,
Temple University, Philadelphia, USA. Recent deep learning methods, especially Convolutional
2
Chunjuan Bo is with College of Electromechanical Engineering, Neural Network (CNN) architecture, are reporting success
Dalian Nationalities University, Dalian, China. in learning those hierarchical features. Usually successful
3
Xin Liang is with School of Stomatology, Dalian Medical training of a deep convolutional network (DCN) depends
University, Dalian, China.
4 heavily on the huge amount of labeled data to find optimized
Jie Yang is with Division of Oral and Maxillofacial Radiology,
School of Dentistry, Temple University, Philadelphia, USA. value for millions of weights in DCN. Even for fine-tuning,
5
Vasileios Megalooikonomou is with Computer Engineering and enough samples are still required to prevent the powerful pre-
Informatics Department, University of Patras, Patras, Greece. trained model to overfit at noise in target dataset. However,

978-1-5386-3646-6/18/$31.00 ©2018 IEEE 2579


+
Addition

+
c Concatenation

+
Normal
Patch Share all
augment weights c
OP

+
Linear
classifier

+
Feature
fusion
Feature
DPR image
extraction

Fig. 2. Overview of Octuplet Siamese Network.

big p small n is the most common type of problems in med- can be formally expressed as
ical image analysis. Specifically, in [9], each osteoporosis
condition prediction depends on eight panoramic radiography ybn = G(In1 , In2 , . . . , In8 ), (1)
patches while total number of labeled subjects is merely 40. where ybn ∈ Y for Y = {yi }2i=1 is the two osteoporosis
It is far insufficient for directly training or fine-tuning any categories, Inj ∈ RL×L for j = 1, 2, . . . , 8 is the j-th DPR
ordinary DCN. patch of pixel size L × L, and G(.) is the mapping function
Attacking these problems, in this paper we proposed an to be learned.
Octuplet Siamese Network (OSN) trained through a two-
stage fine-tuning scheme to predict osteoporosis condition B. Octuple Siamese Network
using limited number of panoramic radiography images. We To efficiently fuse the visual clue from eight ROIs, we
carefully design our network architecture to fully explore designed a Siamese-style convolutional network with eight
common features across eight input patches and use all subnets sharing all weights to do feature extraction and
patches for feature learning. To further reduce complexity classification jointly. Configuration of our proposed network
and avoid overfitting, input features of single layer classifier is shown in Figure 2. The whole network can be roughly
are accumulated based on their spatial symmetry in DPR divided into two parts, deep feature extraction and classifi-
image before concatenation. During training process, we cation with fusion features.
design a dedicated two-stage fine-tuning scheme. Through In the first part, eight AlexNet [10] based deep CNNs
introducing an auxiliary texture dataset, we ensure the parallel encode patches in the eight ROIs into eight feature
smoothly transferring of model learned on ImageNet to vectors. As extracted from the same DPR image, each of the
panoramic radiography dataset without losing its powerful eight patches shares similar low-level features. Thus, instead
discrimination. By combining these strategies and training on of learning eight independent CNN feature extractors, we are
only 108 samples, our proposed OSN beats all other existing using a Siamese-style network which can be written as
method in classifying osteoporosis condition task.
Φnj = ϕj (Inj , WS , bS ), (2)
II. METHODOLOGY where Φnj ∈ RM is the M -dimension features output by the
A. Problem Formulation j-th subnet, ϕj (.) is the j-th subnet, WS = {wl }Tl=1 and
bS = {bl }Tl=1 are all the weights of the T layers CNN.
Dental panoramic radiography (DPR) is two-dimension Also, by sharing weights and updating mirrored in each
panoramic X-ray imaging of entire mouth, which is com- subnet, patches from all locations can be used together to
monly used in dental diagnosis. Benefiting from the fact train one set of weights which increases the number of train-
that trabecular patterns are distributed in various places in ing samples by 8× and reduces the potential of overfitting.
the oral cavity, DPR images thus contain rich information Moreover, unlike other works to exploit features in FC6 or
of trabecular bone. Eight regions of interest (ROI) in DPR even Pool5, we use complete AlexNet and take features
images are further selected by dentist for relative better at FC8. This is based on two considerations. First, each
imaging quality of trabecular bone structure. The eight ROIs patch contains very limited context or layout information,
can be divided into four groups based on symmetry of human therefore spatial information encoded in feature from Pool5
face as illustrated in Figure 1. or FC6 is useless in our application; Second, to use features
Then our problem is formulated as using the eight patches in Pool5 or FC6, either one more layer will be learned to
from each subject to predict osteoporosis condition of the map the high dimension raw features to some desired low
subject, e.g. osteoporosis or normal. For the n-th subject, it dimension space, or increasing input dimension of classifier

2580
1×2 We use an auxiliary dataset to bring up the gap between the
1×286
Fusion two datasets.
Random
Ignoring the distracted response from other overlapping
AlexNet AlexNet AlexNet
initialization bone structure, trabecular bone structures are usually char-
Pre-trained acterized by the texture property of images. To capture any
Online data Online data
possible type of texture, we create an augmented texture
augmentation augmentation dataset by merging several public texture analysis datasets.
In particular, ALOT [11], KTH-TIPS2 [12] and UIUC [13]
texture datasets are used. These datasets contain texture from
different sources which include both natural and manmade
ImageNet Texture DPR data patterns. All images in each dataset are first converted into
pre-train fine-tuning fine-tuning
grayscale to match the style of DPR patches, then duplicated
to three channels and resized to adapt the input dimension
Fig. 3. Two-stage fine-tuning. of AlexNet. Most images in texture dataset are not square.
Instead of directly resizing as in other classification task, we
crop images to squares to avoid interference from distortion
is required. In both approaches, more weights without pre- which is important for texture analysis. Finally, we create a
trained value are introduced, which increases the potential of dataset with 55,000 images in 286 texture categories. When
overfitting. Notice that, although here we are using AlexNet fine-tuning on this dataset, we add a fully connected layer
based network for feature extraction, all other recent DCN with 286 output neurons on top of AlexNet. After the first
architectures will also be compatible in our setting. stage fine-tuning, the last fully connected layer is removed,
We extend effective training samples for feature extraction only weights in AlexNet are used to initialize the feature
by sharing weights, whereas only the original limited number extractor in OSN for the second stage fine-tuning on DPR
of labels can be used for classifier training. To reduce the dataset.
complexity of classifier, besides choosing the simplest linear Another key component in our two-stage fine-tuning is
classifier, we also minimize its input dimension. Instead online data augmentation. Commonly used offline data aug-
of directly concatenating, we first perform element-wise mentation generates augmented images from each real image
addition between feature vectors whose patches belong to the before training, which can only produce a fixed number of
same group as shown in Figure 1. To further enhance this augmented sets. Due to the limited number of real image
invariance in feature, we add random horizontal flip during in DPR dataset, stochasticity in fixed augmented set will
data augmentation. By this change, we reduce half of the still be exhausted after epochs. Online data augmentation, on
weights in classifier and boost final classification accuracy by the other hand, produces augmented images real-time during
2% compared with using direct concatenation. Finally, a fully training, which needs additional computation but can gen-
connected layer followed by softmax function is employed erate unlimited combinations of augmentation operations as
as the classifier which outputs two possibilities, osteoporosis training process continuing. We use online data augmentation
(OP) and normal. The whole network can be written as in both fine-tuning stages.
(i) (i)
ybn∗ = argmax φsof tmax (wC Φn + bC ) (3) III. EXPERIMENTS AND RESULTS
i
A. DPR Image
Φn = [Φ1 (In1 ) + Φ2 (In2 ), ..., Φ7 (In7 ) + Φ8 (In8 )] (4) We collect a dataset containing 108 images from 108
different subjects, which is an extension of the original 40
where Φn ∈ R4M are the concatenated features, [. . . ] is
subjects reported in [9]. Our new dataset is relative balanced
vector concatenation, Φj (Inj ) is the feature vector associated
and containing 52 subjects with osteoporosis and 56 normal
with patch Inj , wC and bC are the weights for the last fully
subjects. In each DPR image, dentist manually annotates the
connected layer. Therefore, {WS , wC , bS , bC } are the set
eight ROIs with normalized pixel size of 50 × 50.
of weights to be learned.
B. Environmental Settings
C. Texture Fine Tuning We use Caffe for network implementation. Caffe master
Due to the scarcity of training samples, we design a two- branch does not include an online augmentation layer, while
stage fine-tuning scheme to learn the weights as illustrated in third party implementation cannot augment eight patches in-
Figure 3. We start from the network pre-trained on ImageNet. dependently within each input patch tuple. We implemented
Images in ImageNet mostly are natural objects and captured our own online data augmentation Python layer, where
as colorful RGB images, whereas DPR patches contain standard random crop and horizontal flip were included.
overlapped human bone structure X-ray response recorded as For fine-tuning process, we downloaded AlexNet pre-
grayscale images. Directly transferring a powerful network trained weights from Caffe Model Zoo. Newly added fully
learned on ImageNet using insufficient samples may cause connected layers were initialized by gaussian noise with
the network overfitting at noise and losing generalizability. standard deviation of 0.01. We used fixed base learning rate

2581
TABLE I features from multi-ROIs and a dedicated two-stage fine-
C LASSIFICATION RESULTS BY LEAVE - ONE - OUT. tuning scheme for training. In our experiments, proposed
method with two-stage fine-tuning achieves the best over-
Methods TSL[9] HARA SFTA OSN OSN+TF all accuracy of 89.8%. Our results show that deep learn-
NOPA(%) 67.86 67.86 71.43 75.00 89.29
OPA(%) 67.31 71.15 82.69 96.15 90.38
ing method with carefully designed network structure and
OA(%) 67.59 69.44 76.85 85.19 89.81 training scheme can be adopted for osteoporosis condition
analysis using DPR images, even for scarce training samples.
This result encourages future work using DPR image for
of 0.001, and new fully connected layers with 10 times of inexpensive osteoporosis prescreening.
base learning rate. Dropout with rate 0.5 was applied to all ACKNOWLEDGMENT
internal fully connected layer to further reduce overfitting. This work is supported in part by US NSF Grants 1407156
10K and 200 iterations were used in the two fine-tuning and 1350521.
stages.
R EFERENCES
C. Results And Discussion
[1] K. Horner and H. Devlin, “The relationship between mandibular bone
In this part, we compare our proposed method with mineral density and panoramic radiographic measurements,” Journal
methods in other well-established work and several shallow of dentistry, vol. 26, no. 4, pp. 337–343, 1998.
[2] K. Z Vlasiadis, C. A Skouteris, G. A Velegrakis, I. Fragouli, J. M
methods where pre-defined features and general classifier are Neratzoulakis, J. Damilakis, and E. E Koumantakis, “Mandibular ra-
adopted. TSL is the two-stage learning method reported in diomorphometric measurements as indicators of possible osteoporosis
[9]. Various feature descriptors are investigated in their work. in postmenopausal women,” Maturitas, vol. 58, no. 3, pp. 226–235,
2007.
We compare the result of using HOG which achieved the best [3] A.F. Leite, P. T. de Souza Figueiredo, C. M. Guia, N. S. Melo, and
performance in their original work. A. P. de Paula, “Correlations between seven panoramic radiomorpho-
We also compare our learning based feature with pre- metric indices and bone mineral density in postmenopausal women,”
Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and
defined features. Haralick (HARA) feature [14] is widely Endodontology, vol. 109, no. 3, pp. 449–456, 2010.
used in X-ray image analysis, which fuses multiple statistics [4] A. M. Parfitt, C. H. Mathews, A. R. Villanueva, M. Kleerekoper,
metric computed on gray-level co-occurrence matrix of input B. Frame, and D. S. Rao, “Relationships between surface, volume,
and thickness of iliac trabecular bone in aging and in osteoporosis.
image. Segmented Fractal Analysis of Textures (SFTA) [15] implications for the microanatomic and cellular mechanisms of bone
is another type of texture analysis feature popular in medical loss.,” Journal of clinical investigation, vol. 72, no. 4, pp. 1396, 1983.
image analysis, which explores the fractal dimension of input [5] T. D Faber, D. C Yoon, S. K Service, and S. C White, “Fourier and
wavelet analyses of dental radiographs detect trabecular changes in
images sub-grayscale region. We use eight binary thresholds osteoporosis,” Bone, vol. 35, no. 2, pp. 403–411, 2004.
for SFTA. For HARA and SFTA, Random Forest with 100 [6] G. M Tosoni, A. G Lurie, A. E Cowan, and J. A Burleson, “Pixel in-
trees is employed as the classifier, and features from eight tensity and fractal analyses: detecting osteoporosis in perimenopausal
and postmenopausal women by using digital panoramic images,”
ROIs are directly concatenated. For the proposed method, we Oral Surgery, Oral Medicine, Oral Pathology, Oral Radiology, and
compare the results with or without using texture dataset for Endodontology, vol. 102, no. 2, pp. 235–241, 2006.
fine-tuning (TF). [7] P. Li, X. Yang, F. Xie, J. Yang, E. Cheng, V. Megalooikonomou, Y. Xu,
and H. Ling, “Trabecular texture analysis in dental cbct by multi-roi
We adopt leave-one-out cross-validation as results eval- multi-feature fusion,” in Biomedical Imaging (ISBI), 2014 IEEE 11th
uation criteria. To be specific, for n samples, there are n International Symposium on. IEEE, 2014, pp. 846–859.
different training and test sets. Thus, the final accuracy is cal- [8] H. Ling, X. Yang, P. Li, V. Megalooikonomou, Y. Xu, and J. Yang,
“Cross gender–age trabecular texture analysis in cone beam ct,”
culated by averaging the n classification outputs. In addition Dentomaxillofacial Radiology, vol. 43, no. 4, pp. 20130324, 2014.
to the overall accuracy (OA), we also report the accuracy on [9] C. Bo, X. Liang, P. Chu, J. Xu, D. Wang, J. Yang, V. Mega-
osteoporosis subjects (OPA) and non-osteoporosis subjects looikonomou, and H. Ling, “Osteoporosis prescreening using dental
panoramic radiographs feature analysis,” in Biomedical Imaging (ISBI
(NOPA). 2017), 2017 IEEE 14th International Symposium on. IEEE, 2017, pp.
As shown in Table. I, our proposed OSN outperforms other 188–191.
shallow methods with large margins. Moreover, compared [10] A. Krizhevsky, I. Sutskever, and G. E Hinton, “Imagenet classification
with deep convolutional neural networks,” in Advances in neural
with OSN+TF, OSN without texture database fine-tuning information processing systems, 2012, pp. 1097–1105.
suffers serious imbalanced accuracy in two osteoporosis cat- [11] G. J Burghouts and J. Geusebroek, “Material-specific adaptation of
egories, which results predictions with very poor precision. color invariant features,” Pattern Recognition Letters, vol. 30, no. 3,
pp. 306–313, 2009.
Since training error is always balanced in both settings, fine- [12] P. Mallikarjuna, and M. Fritz A. T. Targhi, E. Hayman, B. Caputo,
tuning on a large and more comprehensive texture databased and J. Eklundh, “The kth-tips2 database,” 2006.
is crucial for preventing OSN from overfitting at one cate- [13] S. Lazebnik, C. Schmid, and J. Ponce, “A sparse texture representation
using local affine regions,” IEEE Transactions on Pattern Analysis and
gory. Machine Intelligence, vol. 27, no. 8, pp. 1265–1278, 2005.
[14] R. M Haralick, K. Shanmugam, et al., “Textural features for image
IV. CONCLUSION classification,” IEEE Transactions on systems, man, and cybernetics,
We presented a study of using DPR image for osteoporosis , no. 6, pp. 610–621, 1973.
[15] A. F. Costa, G. Humpire-Mamani, and A. J. M. Traina, “An efficient
condition evaluation and proposed an image-based osteo- algorithm for fractal analysis of textures,” in Graphics, Patterns
porosis classification method. The proposed method uses and Images (SIBGRAPI), 2012 25th SIBGRAPI Conference on. IEEE,
Siamese-style convolutional network to learn and fuse image 2012, pp. 39–46.

2582

View publication stats

Das könnte Ihnen auch gefallen