Beruflich Dokumente
Kultur Dokumente
DOI 10.3233/XST-180399
IOS Press
PY
Department of Biomedical Engineering, College of Electronics and Information, Kyung Hee University,
Yongin, Republic of Korea
CO
Revised 21 June 2018
Accepted 27 June 2018
Abstract.
BACKGROUND: Accurate measurement of bone mineral density (BMD) in dual-energy X-ray absorptiometry (DXA) is
essential for proper diagnosis of osteoporosis. Calculation of BMD requires precise bone segmentation and subtraction of soft
tissue absorption. Femur segmentation remains a challenge as many existing methods fail to correctly distinguish femur from
soft tissue. Reasons for this failure include low contrast and noise in DXA images, bone shape variability, and inconsistent
OR
X-ray beam penetration and attenuation, which cause shadowing effects and person-to-person variation.
OBJECTIVE: To present a new method namely, a Pixel Label Decision Tree (PLDT), and test whether it can achieve higher
accurate performance in femur segmentation in DXA imaging.
METHODS: PLDT involves mainly feature extraction and selection. Unlike photographic images, X-ray images include
features on the surface and inside an object. In order to reveal hidden patterns in DXA images, PLDT generates seven new
feature maps from existing high energy (HE) and low energy (LE) X-ray features and determines the best feature set for
TH
the model. The performance of PLDT in femur segmentation is compared with that of three widely used medical image
segmentation algorithms, the Global Threshold (GT), Region Growing Threshold (RGT), and artificial neural networks
(ANN).
RESULTS: PLDT achieved a higher accuracy of femur segmentation in DXA imaging (91.4%) than either GT (68.4%),
RGT (76%) or ANN (84.4%).
CONCLUSIONS: The study demonstrated that PLDT outperformed other conventional segmentation techniques in seg-
AU
menting DXA images. Improved segmentation should help accurate computation of BMD which later improves clinical
diagnosis of osteoporosis.
Keywords: Dual-energy X-ray absorptiometry (DXA), osteoporosis, decision tree, segmentation, feature extraction, feature
selection, non-local means filter, mathematical morphology
1. Introduction
Osteoporosis is a common skeletal disorder caused by loss of bone mineral density (BMD). Osteo-
porosis can lead to bone fracture without any symptoms due to bone loss and fragility. It affects men
∗
Corresponding authors: Seung-Moo Han and Tae-Seong Kim, Department of Biomedical Engineering, College of Elec-
tronics and Information, Kyung Hee University, 1732, Deogyeong-daero, Giheung-gu, Yongin-si, Gyeonggi-do 17104,
Republic of Korea. Tel.: +82 10 6379 9828; E-mail: smhan@khu.ac.kr (Seung-Moo Han) and Tel./Fax:+82 31 201 3731;
E-mail; tskim@khu.ac.kr (Tae-Seong Kim)
0895-3996/18/$35.00 © 2018 – IOS Press and the authors. All rights reserved
728 D. Hussain et al. / Femur segmentation in DXA imaging
and women equally and can occur at any age. The probability of bone fracture with osteoporosis
above age 50 is high, especially in menopausal women [1]. According to Osteoporosis Canada, frac-
tures from osteoporosis are more common than heart attacks, strokes and breast cancer combined
[2]. This disease can be diagnosed effectively using a low-dose X-ray imaging technique called dual-
energy X-ray absorptiometry (DXA), which allows measurement of BMD and bone mineral content
(BMC) [1]. The primary sites screened in clinical DXA for osteoporosis fracture risk analysis are
the lumbar spine, proximal femur, and forearm. An alternate method for osteoporosis fracture risk
analysis is Quantitative Computed Tomography (QCT). However, QCT requires a high dose of X-rays
and is costly. In clinical practice, DXA is the gold standard for diagnosis of bone-related diseases
[3, 4].
Accurate calculation of BMD is essential for analysis of osteoporosis. BMD is calculated based on
PY
bone pixels identified by selecting a region of interest (ROI) from the DXA images. Precise segmenta-
tion of bone and separation from soft tissue are critical to measurement accuracy; however, incorrect
identification of bone regions is common for several reasons [3–5]. First, DXA images suffer from
noise due to the use of low-dose X-rays. Second, the pelvic bone and hip muscles often overlap with
the femur head; in femur DXA images, X-ray attenuation in the femur head is greater than in the
CO
trochanteric regions, casting a negative shadow that appears as a dark area in the images, making it
difficult to separate bone pixels from soft tissue. Other factors impact segmentation quality, including
scanning orientation, resolution, luminous intensities, and X-ray attenuation [6]. X-ray attenuation by
femurs varies between patients, producing objects of different contrast in X-ray images, thus making
automation of segmentation challenging.
A number of techniques have been developed for DXA image segmentation [7–13]. Manual seg-
OR
mentation has been used but is time-consuming and requires an expert to separate bone and soft tissue,
making it impractical for screening large populations [7, 9, 10]. Edge detection is a segmentation
method, but is unsuitable due to limitations in combining small edges to find large object boundaries
[7]. Threshold-based methods for DXA image segmentation have their own limitations [7, 8]. Defining
a proper threshold value for segmentation is challenging due to the diverse nature of the data acquired
TH
and to the difficulty of the calibration process for DXA imaging devices [48, 51–52]. Vertebra and
femurs are often segmented using an Active Shape Model (ASM) or Active Appearance Model (AAM)
[11, 13]. In ASM, landmark points are used to represent an object’s shape. Each landmark corresponds
to the same anatomical position. Convergence of landmark points is needed to determine the final
AU
position of an object. However, bone structures vary between patients. ASM sometimes converges
to the wrong edges of an object. ASM uses edge matching or a statistical measure of Mahalanobis
distance (MD) by matching landmarks of a model template to a set of pixels in the test image. How-
ever, it is often difficult to define MD when the covariance matrix in the ASM model is sparse. In
AAM, the model assumes appearance spaces to be Gaussian, but this assumption fails due to vari-
ations in bone structure, particularly in patients with osteophytes (bone spurs) [19]. These methods
require prior knowledge of the object shape and close initialization of the landmark set to the object
shape to be segmented, and are unsuitable for separation of bone and soft tissue. Image segmentation
using deformable models requires shape initialization, an obstacle to automation. Other medical image
segmentation techniques such as Region Growing Threshold (RGT) are unsuitable for DXA image
femur segmentation due to structural specification ambiguities created by partial-volume effects [14].
Using a watershed algorithm to partition X-ray images into homogeneous regions usually suffers from
over-segmentation [18].
With the inherent limitations described above, these existing techniques are unsuitable for DXA
image segmentation. There is an urgent need for DXA image segmentation methods that are automated
and yield greater accuracy. Pixel classification, rather than classical segmentation, is a method often
used in medical image segmentation to identify anatomical structures [14]. In pixel classification, a
D. Hussain et al. / Femur segmentation in DXA imaging 729
pixel is classified as belonging to any of n classes. The number of classes is assumed to be known
based on prior knowledge of the anatomy of the body part being considered. Here, we describe a
method based on pixel classification that uses a decision tree. Decision trees are well-known decision
support tools in the medical field widely used in osteoporosis risk analysis with DXA imaging [15–17].
Despite successful application to medical image analysis, decision trees have not been applied to DXA
image segmentation. In this work, we apply a decision tree to DXA image femur segmentation. We
present a novel method called a Pixel Label Decision Tree (PLDT), which improves the performance
and accuracy of DXA image femur segmentation. PLDT has new features such as a difference map
for low-energy (LE) and high-energy (HE) X-rays, a BMD map, a local standard deviation of BMD
map, a composite map of LE and BMD, and a sigmoid BMD map. We compare PLDT to other
conventional approaches which are Global threshold (GT), artificial neural networks (ANN), and RGT
PY
[7–12] and demonstrate superior performance and more accurate automated femur segmentation from
DXA images by PLDT.
A novel concept of femur DXA image segmentation is introduced in this paper with the following
contributions. To the best of our knowledge, this is the first machine learning based segmentation work
applied to DXA images. We have introduced nine features: three generic of DXA (i.e., HE, LE, BMD)
CO
and six new that are derived from the generic (i.e., LE and HE difference map, HE to LE ratio map,
LE to HE ratio map, local standard deviation map, composite map, and sigmoid BMD map). We used
a wrapper technique to derive an optimal set of features for femur segmentation from DXA images.
Our PLDT Segmenter outperforms other conventional segmentation techniques in terms of accuracy,
specificity, and sensitivity of pixel classification from femur DXA images.
The objective of this research is to find a solution that automatically segment femur from DXA
OR
images. The rest of the paper is organized as follows. Section 2, explains about the segmentation
method of our model. Section 3, shows the proposed model results. Discussion about our method and
results are presented in section 4. Finally, Section 5 presents some concluding remarks and future work
to be carried out.
TH
2. Methods
Femur data are acquired from DXA scanning as HE and LE images. HE and LE images are denoised
AU
using a non-local means filter (NLMF) [21–23, 49]. New feature maps are extracted from the femur
data. Feature maps are used to generate training and test datasets. The model is trained using a training
dataset and applied to a test subject. An overview of our DXA femur segmentation using PLDT is
shown in the Fig. 1.
We used images acquired on a DXA scanner (OsteoPro MAX, B.M.Tech Worldwide Co., Ltd.,
Republic of Korea). We used 600 femur images segmented by radiology experts as the “ground truth”
for training, testing our model, and determining the accuracy of PLDT. Images were segmented and
each pixel was labeled as either bone or soft tissue. For a five-fold cross validation evaluation test, data
were divided into two sets, 80% for training and 20% for independent testing. A training database was
prepared from feature vectors with a class label to train a PLDT model. A test database was prepared
from feature vectors without a class label and the model generated output labels for each pixel of a
test subject.
730 D. Hussain et al. / Femur segmentation in DXA imaging
PY
Fig. 1. Overview of the PLDT algorithm. Input training data are converted into feature vectors which are fed into the model.
A trained model is used on the test data to predict a subject pixel label.
To prepare training data, training features are extracted from denoised images. At each pixel i of the
jth training dataset (an image), features are extracted to form a feature vector Xi ,j for that particular
pixel, and assigned a class label (bone or soft tissue), Yi ,j . Thus, in a training dataset, the components
of the vectors Xi ,j become attributes along with a dependent variable Yi ,j , and form training pairs
(Xi ,j , Yi ,j ). We normalize the data to a range of [0, 1], using minimum (min) and maximum (max)
AU
data values. A test dataset is prepared with each pixel i in a test subject represented by a number of
features and by the pixel position in the image, with no label.
2) LE is the second feature map, also available from the DXA scan. Lowering the X-ray energy
increases X-ray attenuation by tissue, producing a more informative image than the HE image [50].
3) HLR is the third feature map, generated as a pixel-by-pixel log ratio from the HE and LE images:
log (HE/HEo )
HLR = (1)
log (LE/LEo )
where HE0 and LE0 are constants measured as X-ray counts by a detector without any object
between the X-ray source and detector. HE and LE are detector counts of the X-ray values at each
image pixel during scanning of an object. In HLR map, the value of a pixel belonging to the tissue
region is usually higher than the value of a pixel belonging to the bone region. This feature map is
shown in Fig. 2(e).
PY
4) LHR is the fourth feature map, generated as a per pixel log ratio of the LE and HE images:
log (LE/LEo )
LHR = (2)
log (HE/HEo )
In the LHR map, the value of a pixel belonging to the bone region is usually higher than the value
i of the LE image: CO
of a pixel belonging to the tissue region. This feature map is shown in Fig. 2(d).
5) LHD is the fifth feature map, generated as each pixel i of the HE image subtracted from each pixel
Rst × log HE
HEi
− log LE
LEi
BMDI = 0 0
(4)
ul − uh × Rst
where Rst , ul , uh , HE0 , and LE0 are constants, and the HEi and LEi values change from pixel
AU
to pixel. The BMD map is useful to discriminate between the bone and soft tissue regions.
7) LSTD BMD, the seventh feature map, is generated by using a 3 × 3 window with nine neighbor
pixels and calculating the standard deviation (STD) of each pixel in a BMD map. Then the window
is moved to the next pixel. The local STD of BMDI is calculated using the equation,
1 n 2
LSTD BMD = ( BMDIi − μj ), (5)
n i=1,j
where i, n, and j are an individual pixel, the total number of pixels, and the mean value in local
window j, respectively.
STD values in the LSTD BMD map highlight pixel neighborhood gradient changes. Large STD
values indicate that the neighborhood pixel values are far from the average and are associated with
sudden changes in the image. Standard deviation have been utilized in the past by some other
researches for medical images segmentation, such as magnetic resonance imaging (MRI) images
segmentation [60].
The composite map, CI, is the eighth feature map generated. In LE images, the energy count in
soft tissue regions have higher counts than bone. Therefor a pixel belong to tissue region usually
732 D. Hussain et al. / Femur segmentation in DXA imaging
PY
CO
OR
TH
AU
Fig. 2. Feature maps. (a) HE, (b) LE, (c) LHD, (d) HLR, (e) LHR, (f) BMD, (g) local STD of BMD, (h) sigmoid BMD, and
(i) composite image.
contain higher value than a pixel value belong to bone region. Using BMDI and inverse of LE, tissue
regions are suppressed to enhance bone regions. We get a better contrasted map with CI compared
to BMDI and LE. Using CI map we can better discriminate between bone and soft tissue regions
compared to BMDI. LE and BMDI are normalized with a lower scale (a) and an upper scale (b),
and are used to form CI as follows:
BMDIi − min (BMDI)
BMDI = a + (b − a) (6)
max (BMDI) − min (BMDI)
Table 1
Feature maps and their characteristics
PY
6 BMDI (Equation 4) Bone regions have higher values and are brighter than tissue
regions.
7 LSTD BMD (Equation 5) Provides a brighter image of boundaries between bone and soft
tissue regions.
8 CI (Equation 8) Combines information from BMDI and LE maps to compress
CO
some tissue regions.
9 SIG BMD (Equation 9) Bone regions are enhanced using a sigmoid function applied to the
BMDI map.
where n is a power factor used to suppress the tissue regions. In the current study, a value of n = 3
was used for CI feature generation. Higher values of n cause less accurate identification of bone
OR
regions, while lower values fail to suppress tissue regions. In our experiments, a value of n = 3 was
optimal.
8) SIG BMD, the ninth feature map, is generated from BMDI using a sigmoid function. This sigmoid
function is a bounded differentiable real function between bone and soft tissues and has a non-
negative derivative at each pixel. SIG BMD is generated from BMDI as
TH
1
SIG BMD = , (9)
1 + e−C∗BMDI
where c is a constant (0.05 in our study). A different bone and soft tissue composition is formed by
AU
changing the c value. Sigmoid gradient has been utilized in the past for ultrasound image segmentation
[61].
These new feature maps provide new capabilities to the classification model to better distinguish
bone and soft tissue. The existing and new feature maps and their characteristics are listed in Table 1;
the new feature maps are illustrated in Fig. 2.
the remaining features. Some features did not improve or even reduced the accuracy of the model. The
best performing features for our PLDT model were BMDI, LE, LHD, LSTD BMD, and CI.
PY
attribute value. Y is a special discrete attribute designated as a class label (bone or soft tissue). The
learning classification function f assigns each attribute in feature dataset S one of the predefined class
labels Yj . The test image data are represented in the form of feature vectors as
f (X) = X1,j , X2,j , . . . . . . . . . Xk,j , (11)
CO
where X1 , . . . Xk represents a feature vector of pixels j in the image to be assigned a label by the PLDT
model. The label assigned to each pixel is remapped to its pixel position in the DXA image, and femur
object boundaries are extracted followed by a post-processing method.
The split optimization criterion used to measure a node impurity is the Gini Index (GI). Given a
training dataset S, the GI of target attributes Y is measured as
OR
c−1
GI (Y, S) = 1 − [p(i|S)]2 , (12)
i=0
where p refers to the fraction of a record belonging to one of the classes in target variable Y . GI
measures the impurity of an individual attribute X corresponding to a single value that belongs to a
class Y in features dataset S as follows:
TH
c−1
GI (X) = 1 − [p(Xxk =vj,k |Y )]2 , (13)
i=0,vj ,k∈xj
where xk represents an individual node having value Vj,k . To compute the quality of the entire split
corresponding to an entire attribute, we computed the weighted average over all sets resulting from
AU
the split of dataset X into two subsets, the left tree branch (L) and right tree branch (R), as follows:
c−1 |Xj |
GI (X, Y ) = 1 − [p(Xj,xk =vj,k |Y )]2 , (14)
i=0,vj ,k∈xj |X|
|Xj |
GIGN (X, Y ) = GI (Y, S) − .GI (X, Y ) , (16)
j
|X|
PLDT requires stopping criteria as it splits down the tree with the training data. We identified a
minimum count of training instances to be set aside for every leaf node. A smaller count compared to
the minimum in a node is taken as a final leaf node and stops further splitting. Pruning was applied
D. Hussain et al. / Femur segmentation in DXA imaging 735
Table 2
Average Gini gain for each attribute in the training dataset
Index Feature
LE BMDI LHD STD BMD CI
Gini 0.2853 0.2850 0.4640 0.4508 0.1978
Gain 0.2147 0.2149 0.0359 0.0492 0.3022
to reduce the size of PLDT and improve classification accuracy. Nodes with anomalies and the lowest
classification performance are removed. In this work cost, complexity pruning was applied to make
sure that only the most useful splits are located by the model [34].
PY
2.3.2. PLDT optimization
The performance of PLDT is directly affected by the hyperparameter (HP) values chosen [35].
HPs represent the degrees of freedom PLDT has in fitting the data and solve an overfitting problem. In
machine learning, the same model requires different constraints, weights, or learning rates to generalize
CO
different data patterns. HPs of a machine learning algorithm are tuned so that the model can best solve
a particular problem. A parameter sweep method is utilized for HP optimization by specifying a subset
of the HP space of our learning algorithm. The performance of the parameter sweep is measured using
cross-validation based on a withheld validation set. The experiment was repeated many times and
every time a distinct model was tested to find optimal parameters for PLDT.
To tune the maximum depth (MXD) of PLDT, a parameter value is selected over a defined range
OR
of 7 × 103 to a maximum number of training samples (N-1) within ten steps. A decision tree with
depth N is large, where almost every sample in the decision tree will participate as a decision node.
Setting the maximum tree depth in the range of 100 and 6 × 105 from 1 × 107 samples improved the
accuracy of our algorithm. The final depth of our PLDT is calculated as 125 with a size of 3 × 106 . If a
node sample is less than a threshold of the minimum sample count (MNSC), then the node will not be
TH
split. The MNSC parameter helps reduce overfitting and prevents the tree from prematurely classifying
outliers. Accuracy and performance both improved with a properly tuned MNSC parameter. Surrogate
split (SUS) takes over and takes on the work when the primary split is finished. SUS puts the PLDT
model in the position of dealing with missing data. Pruned branches are removed from the tree using
AU
the truncate prune tree (TPT) method. To better estimate the generalization error and avoid overfitting,
five-fold cross-validation (CV) was applied to the model based on a withheld validation set. Prior class
probabilities (PR) were applied to the model to improve classification. PR tunes the PLDT and gives
preference to a certain class (i.e., bone pixels), so the weight of misclassified samples or anomalies is
increased artificially, and the tree is adjusted properly. Our classification model is improved by setting
one standard error as true because it makes a tree more compact and resistant to training data noise.
The best HP-space for the current model explored in experiments is shown in Table 3.
To train PLDT with N samples, a feature vector and class label are generated for each pixel i in the
training dataset. This yields N feature vectors, each with j components called attributes (Xi ,j ) with a
class label (Yi ,j ). Y is the dependent variable. Training pairs (Xi,j, Yi ,j ) are constructed in each pixel
of the training dataset. The splitting attribute that maximizes the Gini gain is selected. Child nodes are
created recursively based on defined split criteria until a stopping criterion is reached.
Prediction is based on passing a test dataset into the trained model. Each pixel i in the test dataset is
represented with a feature vector and its position is preserved in the femur image. To obtain a response
736 D. Hussain et al. / Femur segmentation in DXA imaging
Table 3
PLDT hyper-parameter optimization
PY
to the input feature vector, the algorithm traverses the nodes of the tree starting from the root node by
observing the splitting criterion and threshold at each node until it reaches a leaf node. The training data
CO
vectors present in the leaf respond to predict the label for the input data. The output of the predicted
label is remapped to its position in the femur image to obtain bone and tissue boundaries. The training
data consist of ∼2 × 106 samples for each feature derived from the 600 training images, and a total of
∼ 1 × 107 samples. We create a trained ensemble of decision trees in 3 minutes and 20 seconds on a
dual-core, 2.70 GHz machine with 6 GB RAM, while prediction on a new dataset that is not included
in the training dataset takes less than 15 seconds on the same machine.
OR
2.5. Post-processing
TH
A challenging task in image segmentation using ML is to remove spots introduced during pixel
labeling. To deal with these imperfections, morphological image processing (MIP) performs well by
considering the shape and structure of the image [36–39]. The PLDT method labels each pixel in
the femur DXA image as bone or soft tissue. The output of PLDT is a binary image. Bone pixels
are separated from soft tissue/background, then MIP is used to smooth femur object boundaries. To
AU
separate small objects with few pixels connected to a large femur object, small-scale binary erosion
is applied. Objects having a lesser area than one large object in the image are removed. Once small
objects are removed, dilation is performed to define object boundaries. The pixel labeling process
leaves some small holes in femur objects which are closed. Femur boundaries labeled by PLDT are not
smooth. Binary smoothing was applied to smooth femur object boundaries. Binary smoothing removes
small-scale noise in a shape while maintaining large-scale features [40]. Our binary smoothing follows
the steps in the flow chart diagram in Fig. 3. PLDT output and binary smoothed objects are shown in
Fig. 4.
To evaluate the effectiveness of PLDT in DXA image femur segmentation, we compared PLDT
with other techniques (global threshold, region-growing threshold, and ANN) using the same dataset.
The results of this comparison are shown in Table 6. We also cross-validated and divided the data into
training and test datasets for independent results verification.
D. Hussain et al. / Femur segmentation in DXA imaging 737
PY
CO
Fig. 3. Post-processing flow chart. A pixel is set as an object pixel if the value in the smoothing window is greater than the
defined threshold, otherwise it is set as the background.
OR
TH
AU
PY
Tk =
1 n log HE
n i=1
( AH
μh
i
−
CO
Fig. 5. Segmentation by parts.
log LE
μl
AL
i
), (18)
where n is the total number of pixels in the image, and uh and ul are constant measures of bone mass
OR
attenuation coefficients in HE and LE, respectively. AH is the high X-ray energy and AL is the low
X-ray energy detected by an X-ray detector in the DXA imaging system without any obstacle between
source and detector.
image segmentation, it was observed that in the greater trochanter area of the femur, X-ray scanning
attenuation is weaker compared to the shaft and femur head area. The reason for this is overlapping
of pelvic bone and sinews over the femur bone area. As a result, every threshold-based segmentation
technique fails to correctly segment the greater trochanter area. An air mask is used to segment the air
region which is generated from the LHD map, as indicated by a green line in Fig. 5.
are adjusted. The desired output was adjusted to 1 for the bone region and 0 for the tissue region.
The sorting process is split into training and testing phases. The training network model is stored for
test data classification. The output layer of neural networks is mapped back in the femur image with
preserved pixel position. Post-processing is applied to smooth the output boundaries between bone
and smooth tissue as discussed in the post-processing section.
PY
described below were used to assess the accuracy of models.
1) Intersection over union (IOU) is an evaluation metric used to measure the accuracy of the seg-
mented object with ground truth. IOU = TP / (TP + FP + FN). TP is the object area (correctly
classified) common between segmented image and ground truth. FP and FN are the number of bone
IOU =
CO
and soft tissue pixels wrongly classified between two classes (bone and soft tissue).
Areaoftheintersection
Areaofunion
. (19)
2) Sensitivity, also called the positive prediction rate (TPR), measures the proportion of positive pixels
OR
identified accurately. In a sensitivity test, the number of correctly classified bone tissue pixels in
the femur DXA image is compared to the ground truth:
TP
Sensitivity (TPR) = × 100, (20)
TH
GTb
where TP is the total number of correctly classified pixels representing bone, and GTb is the ground
truth in bone pixels.
3) Specificity, also called the true-negative prediction rate (TNR), measures the proportion of negative
pixels accurately identified. In a specificity test, the number of correctly classified soft tissue pixels
AU
TN
Specificity (TNR) = × 100, (21)
GTt
where TN is the total number of pixels correctly classified as soft tissue and GTt is the ground truth
in rejection of the bone pixels.
4) False-positive prediction rate (FPR) is the measure of soft tissue pixels wrongly classified as
bone:
FPR = (1 − TNR) × 100. (22)
5) False-negative prediction rate (FNR) is the measure of bone pixels wrongly classified as soft
tissue:
Table 4
Feature combinations for PLDT
PY
HE, LE, BMDI, LHD, LSTD BMD 84
HE, LE, BMDI, LHD, LSTD BMD, CI 90
LE, BMDI, LHD, LSTD BMD, CI 91.4
CO
2.7.2. Fold Test
The test performance of each method per image was calculated by comparing the segmentation
output of a femur object to the ground truth. We used sensitivity, specificity, and IOU tests to measure
the accuracy for an individual image. A segmentation method was considered to have failed to correctly
segment a femur object in a test image if IOU <0.95, sensitivity <95%, or specificity <93%. The final
accuracy of the model was calculated by comparing the number of accurately segmented images out
OR
of the total number of test images:
Numberofcorrectlysegmented
Accuracy = × 100. (24)
Totalnbumberoftestimages
The 600 DXA images were divided for experiments as follows: 80% (400 images) were used
TH
for training, and the remaining 20% (200 images) were used for independent testing with five-fold
cross-validation. Test data (20%) were swapped with training data for subsequent testing during cross-
validation. The segmentation methods RGT and GT were applied to the test data in each cross-
validation.
AU
3. Results
Different combinations of characteristics were tested to select the best feature set for the PLDT
model. Feature combinations and their effects on PLDT performance are shown in Table 4. The
importance of each extracted feature is shown in Table 5 where one means the highest and zero means
the lowest importance. To identify the best performing feature subset, models were created repeatedly
and a feature was removed based on its performance at every repetition. The next model was created
with the remaining features. The feature set that achieved the highest accuracy was selected. The same
features were used for PLDT and ANN classification models.
We compared PLDT with other segmentation techniques (GT, RGT, and ANN). Model accuracy
was calculated using Equation 24. The final subset of features – LE, BMDI, LHD, LSTD BMD, and
D. Hussain et al. / Femur segmentation in DXA imaging 741
Table 5
Variable importance for all extracted features from DXA images
Table 6
Cross-validation and accuracy of PLDT, ANN, RGT, and GT models
PY
1 80 85 93 98
2 68 73 91 92
3 64 70 83 87
4 60 68 72 86
5 70 84 85 94
Mean 68.4 76.0
Table 7
CO 84.8 91.4
Mean TPR, FPR, TNR, and FNR of cross-validation of ANN, RGT, TH and PLDT models
OR
Model TPR (%) FPR (%) TNR (%) FNR (%)
ANN 86.12 12.01 87.99 13.88
RGT 82.73 15.34 84.66 17.27
GT 80.21 17.39 82.61 19.79
PLDT 94.19 7.09 92.91 5.81
TH
CI – was used as the feature set for PLDT and ANN. For threshold- and RGT-based segmentation,
we used the HLR map. Results of five-fold cross-validation are shown in Table 6. Results for PLDT,
AU
ANN, GT, and RGT compared to ground truth are shown in Fig. 6.
The test data in Table 7 represents the mean value of TPR, FPR, TNR, and FNR from all test samples
in five-fold cross-validation. TPR and NPR represent correctly classified bone and soft tissue pixels,
respectively. FPR and FNR represent incorrectly classified bone and soft tissue pixels, respectively.
Figure 7 shows the test dataset analysis receiver operating characteristic (ROC) curve for PLDT, ANN,
RGT and GT based segmentation. PLDT segmentation results are shown in Fig. 8.
4. Discussion
Using PLDT classification, we achieved a femur segmentation accuracy of 91% in DXA images,
significantly greater than that of other segmentation techniques. The feature set directly influences
the segmentation results and accuracy [53–56]. From experiments on features selection, we observed
that some features boosted segmentation accuracy while others reduced it. PLDT produces optimal
results when tuned to the data. Using the PLDT model, we segmented 600 femur DXA images with
five-fold cross validation and achieved excellent performance in high-contrast regions (femur head and
shaft), as well as accurate segmentation in some of the most challenging regions (greater and smaller
742 D. Hussain et al. / Femur segmentation in DXA imaging
PY
CO
Fig. 6. DXA image femur segmentation by GT, RGT, ANN, and PLDT. Boundaries based on ground truth (red), TH (blue),
regional (yellow), ANN (magenta), and PLDT segmentation (green).
OR
TH
AU
trochanteric). Small variations in femur data related to device-specific calibration were handled by
data normalization; and the model performed well on multiple devices.
The results demonstrate that the PLDT method yields higher sensitivity, specificity, and accuracy
than other models. Pixel label predictions made by the PLDT model provide an accurate and robust
tool for femur segmentation in DXA imaging.
Although we have considered deep convolutional neural network (CNN) as one of the emerging
segmentation methods, CNN requires a large amount of training data [47]. Decision tree could be a good
alternative for a limited amount of dataset [57]. Also it provides real-time performance which might
not with CNN. Previous study presented by Rich Caruana compared ten different binary classifiers,
namely SVM, Neural-Networks, KNN, Logistic Regression, Naive Bayes, Random Forests, Decision
Trees, Bagged Decision Trees, Boosted Decision trees, and Bootstrapped Decision Trees to classify
D. Hussain et al. / Femur segmentation in DXA imaging 743
PY
CO
OR
TH
Fig. 8. PLDT results. Boundaries using PLDT (green) are compared with ground truth (red).
eleven different data sets and compared the results via eight different performance metrics. They found
that all decision trees models outperform SVM [58].
AU
5. Conclusion
We present PLDT classification, a learning-based approach for femur segmentation in DXA imaging
with a focus on accuracy and automation. The cross-validation experiments on test data demonstrate the
high performance and accuracy of PLDT. The PLDT method will be useful in improving the accuracy
of BMD calculations and improving clinical diagnosis of osteoporosis. Our results have shown that
the PLDT model could be employed for femur segmentation from DXA images as it performs well
with the suitable set of features. One limitation of this approach is the requirement for the optimal
supervised selection of features. As the latest deep learning methods can learn features from original
data, we plan to adopt deep learning approaches for femur segmentation from DXA images in the near
future as the larger dataset is building up [59].
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
744 D. Hussain et al. / Femur segmentation in DXA imaging
References
[1] K.A. John, Diagnosis of osteoporosis and assessment of fracture risk, The Lancet 359(9321) (2002), 1929–1936.
[2] N.O. Osteoporosis Canada, Facts and figures about osteoporosis, https://osteoporosis.ca/about-the-disease/fast-facts/,
Accessed 23-02-2018.
[3] R. Dendere, J.H. Potgieter, S. Steiner, S.P. Whiley and T.S. Douglas, Dual-Energy X-ray Absorptiometry for Mea-
surement of Phalangeal Bone Mineral Density on a Slot-Scanning Digital Radiography System, IEEE Transactions
on Biomedical Engineering 62(12) (2015), 2850–2859.
[4] N.F.A. Peel, A. Johnson, N.A. Barrington, T.W.D. Smith and Dr. R. Eastell, Impact of anomalous vertebral segmentation
on measurements of bone mineral density, Journal of Bone and Mineral Research 8(6) (1993), 719–723.
[5] F. Ding, W.K. Leow and T.S. Howe, Automatic segmentation of Femur bones in anterior-posterior pelvis X-ray images,
in: International Conference on Computer Analysis of Images and Patterns, Springer, Berlin, Heidelberg, 2007, pp.
205–212.
PY
[6] C.S. CriŞan and S. Holban, A comparison of X-ray image segmentation techniques, in: Advances in Electrical and
Computer Engineering 13(3) (2013), 85–92.
[7] K.E. Naylor, E.V. McCloskey, R. Eastell and L. Yang, Use of DXA-based finite element analysis of the proximal Femur
in a longitudinal study of hip fracture, Journal of Bone and Mineral Research 28(5) (2013), 1014–1021.
[8] T.A. Burkhart, K.L. Arthurs and D.M. Andrews, Manual segmentation of DXA scan images results in reliable upper
and lower extremity soft and rigid tissue mass estimates, Journal of Biomechanics 42(8) (2009), 1138–1142.
(1990), 173–178. CO
[9] H. Yasufumi, Y. Kichizo, F.M.I. Toshinobu, T. Kichiya and N. Yasuho, Assessment of bone mass by image analysis
of metacarpal bone roentgenograms: A quantitative digital image processing (DIP) method, Radiation Medicine 8(5)
[10] C. Matsumoto, K. Kushida, K. Yamazaki, K. Imose and T. Inoue, Metacarpal bone mass in normal and osteoporotic
Japanese women using computed X-ray densitometry, Calcified Tissue International 55(5) (1994), 324–329.
[11] J.P. Wilson, K. Mulligan, B. Fan, J.L. Sherman, E.J. Murphy, V.W. Tai, C.L. Powers, L. Marquez, V. Ruiz Barros
and J.A. Shepherd, Dual-energy X-ray absorptiometry–based body volume measurement for 4-compartment body
OR
composition, The American Journal of Clinical Nutrition 95(1) (2012), 25–31.
[12] M. Roberts, T. Cootes, E. Pacheco and J. Adams, Quantitative vertebral fracture detection on DXA images using shape
and appearance models, Academic Radiology 14(10) (2007), 1166–1178.
[13] N. Sarkalkan, H. Weinans and A.A. Zadpoor, Statistical shape and appearance models of bones, Bone 60 (2014),
129–140.
[14] D.L. Pham, C. Xu and J.L. Prince, Current methods in medical image segmentation, Annual Review of Biomedical
TH
based on clinical data and periapical radiography, Dentomaxillofacial Radiology 39(4) (2010), 224–230.
[17] A.M. Schott, C. Ganne, D. Hans, G. Monnier, R. Gauchoux, M.A. Krieg, P.D. Delmas, P.J. Meunier and C. Colin,
Which screening strategy using BMD measurements would be most cost effective for hip fracture prevention in elderly
women? A decision analysis based on a Markov model, Osteoporosis International 18(2) (2007), 143–151.
[18] L. Siyuan, S. Wang and Y. Zhang, A note on the marker-based watershed method for X-ray image segmentation,
Computer Methods and Programs in Biomedicine 141 (2017), 1–2.
[19] J. Wu and M.R. Mahfouz, Robust X-ray image segmentation by spectral clustering and active shape model, Journal
of Medical Imaging 3(3) (2016), 034005–034005.
[20] M.A. Al-antari, M.A. Al-masni, M. Metwallyb, D. Hussain, S.-J. Parkb, J.-S. Shinb, S.-M. Hana and T.-S. Kima,
Denoising images of dual energy X-ray absorptiometry using non-local means filters, Journal of X-ray Science and
Technology 26(3) (2018), 395–412.
[21] J.C.R. Giraldo, Z.S. Kelm, L.S. Guimaraes, Lifeng Yu, J.G. Fletcher, B.J. Erickson and C.H. McCollough, Comparative
study of two image space noise reduction methods for computed tomography: Bilateral filter and nonlocal means, in:
Engineering in Medicine and Biology Society, EMBC, Annual International Conference of the IEEE (2009), pp.
3529–3532.
[22] Z. Li, Lifeng Yu, J.D. Trzasko, D. S. Lake, D.J. Blezek, J.G. Fletcher, C.H. McCollough and A. Manduca, Adaptive
nonlocal means filtering based on local noise level for CT denoising, Medical Physics 41(1) (2014), 011908.
[23] A. Buades, B. Coll and J.M. Morel, A non-local algorithm for image denoising, In Computer Vision and Pattern
Recognition, CVPR, IEEE Computer Society Conference, IEEE (2), (2005), pp. 60–65.
D. Hussain et al. / Femur segmentation in DXA imaging 745
[24] Al-Antari, M.A. Al-Masni, M. Metwally, D. Hussain, E. Valarezo, P. Rivera, G. Gi et al., Non-local means filter
denoising for DXA images, In Engineering in Medicine and Biology Society (EMBC), 2017 39th Annual International
Conference of the IEEE (2017), pp. 572–575.
[25] L. Wang, J. Lu, Yeqiu Li, T. Yahagi and T. Okamoto, Noise removal for medical X-ray images in wavelet domain,
Electrical Engineering in Japan 163(3) (2008), 37–46.
[26] J.W. Kwon, S.I. Cho, Y.B. Ahn and Y.M. Ro, Noise reduction in DXA image based on system noise modeling, in:
Biomedical and Pharmaceutical Engineering, 2009, ICBPE’09, IEEE (2009), pp. 1–6.
[27] P. Lambin, E.R. Velazquez, R. Leijenaar, S. Carvalho, R.G.P.M. van Stiphout, P. Granton, C.M.L Zegers, et al.,
Radiomics: Extracting more information from medical images using advanced feature analysis, European Journal of
Cancer 48(4) (2012), 441–446.
[28] G. D. Tourassi, E.D. Frederick, M.K. Markey and C.E. Floyd, Application of the mutual information criterion for
feature selection in computer-aided diagnosis, Medical Physics 28(12) (2001), 2394–2402.
[29] H. Guo, L.B. Jack and A.K. Nandi, Feature generation using genetic programming with application to fault classifica-
PY
tion, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 35(1) (2005), 89–99.
[30] M.A. Al-antari, M.A. Al-masni, S. Park, J. Park, M.K. Metwally, Y.M. Kadah, S.M. Han and T.-S.Kim, An Automatic
Computer-Aided Diagnosis System for Breast Cancer in Digital Mammograms via Deep Belief Network, Journal of
Medical and Biological Engineering 38(3) (2018), 443–456.
[31] I. Guyon and A. Elisseeff, An introduction to variable and feature selection, Journal of Machine Learning Research 3
(2003), 1157–1182.
CO
[32] G. Chandrashekar and F. Sahin, A survey on feature selection methods, Computers & Electrical Engineering 40(1)
(2014), 16–28.
[33] B. Gupta, A. Rawat, A. Jain, A. Arora and N. Dhami, Analysis of Various Decision Tree Algorithms for Classification
in Data Mining, International Journal of Computer Applications 163(8) (2017), 15–19.
[34] B. Leo, J. Friedman, C.J. Stone and R.A. Olshen, Classification and regression trees. CRC press, 1984.
[35] R.G. Mantovani, T. Horvath, R. Cerri, J. Vanschoren and A.C.P.L.F de Carvalho, Hyper-parameter Tuning of a Decision
Tree Induction Algorithm, in: Intelligent Systems (BRACIS), 5th Brazilian Conference IEEE (2016), pp. 37–42.
OR
[36] A. Bleau and L.J. Leon, Watershed-based segmentation and region merging, Computer Vision and Image Understanding
77(3) (2000), 317–370.
[37] P. Salembier and L. Garrido, Binary partition tree as an efficient representation for image processing, segmentation,
and information retrieval, IEEE transactions on Image Processing 9(4) (2000), 561–576.
[38] L. Vincent, Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms, IEEE
Transactions on Image Processing 2(2) (1993), 176–201.
TH
[39] P. Vogt, K.H. Riitters, C. Estreguil, J. Kozak, T.G. Wade and J.D. Wickham, Mapping spatial patterns with morphological
image processing, Landscape Ecology 22(2) (2007), 171–177.
[40] R. Strzodka and A. Telea, Generalized distance transforms and skeletons in graphics hardware, in: Proceedings of the
Sixth Joint Eurographics-IEEE TCVG conference on Visualization (2004), pp. 221–230.
[41] C. Amza, A review on neural network-based image segmentation techniques, De Montfort University, Mechanical and
AU
Manufacturing Eng., The Gateway Leicester, LE1 9BH, United Kingdom, 2012, pp. 1–23.
[42] M.J. Moghaddam and H.S. Zadeh, Medical image segmentation using artificial neural networks, in: Artificial Neural
Networks-Methodological Advances and Biomedical Applications. InTech, (2011), pp. 122–138.
[43] A.Q. Syed and K. Narayanan, Detection of Tumor in MRI Images using Artificial Neural Networks, International
Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering 3(9) (2014), 11749–11754.
[44] P. Arbelaez, B. Hariharan, C. Gu, S. Gupta, L. Bourdev and J. Malik, Semantic segmentation using regions and parts,
in: Computer Vision and Pattern Recognition (CVPR), IEEE, International Conference (2012), pp. 3378–3385.
[45] K. Roy, R. Dey, D. Bhattacharjee, M. Nasipuri and P. Ghosh, An automated system for platelet segmentation using
histogram-based thresholding, In Advances in Computing, Communication, & Automation (ICACCA) (Fall), IEEE,
International Conference (2016), pp. 1–7.
[46] W.Y. Loh, Classification and regression trees, Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
1(1) (2011), 14–23.
[47] J. Nicolas, T.W. Rogers, E.J. Morton and L.D. Griffin, Detection of concealed cars in complex cargo X-ray imagery
using deep learning, Journal of X-ray Science and Technology 25(3) (2017), 323–339.
[48] P. Hossein, A novel material detection algorithm based on 2D GMM-based power density function and image
detail addition scheme in dual energy X-ray images, Journal of X-ray Science and Technology 20(2) (2012),
213–228.
[49] M. Ertas, I. Yildirim, M. Kamasak and A. Akan, Iterative image reconstruction using non-local means with total
variation from insufficient projection data, Journal of X-ray Science and Technology 24(1) (2016), 1–8.
746 D. Hussain et al. / Femur segmentation in DXA imaging
[50] W. Donovan, W. Xizeng and L. Hong, Image quality and dose efficiency of high energy phase sensitive X-ray imaging:
Phantom studies, Journal of X-ray Science and Technology 22(3) (2014), 321–334.
[51] A. Kai, G. Yanhua and Y. Gang, A scatter correction method for dual-energy digital mammography: Monte Carlo
simulation, Journal of X-ray Science and Technology 22(5) (2014), 653–671.
[52] E.B. Bellers, F.J. de Bruijn, C.A. Mistretta and Y. Wang, An automatic calibration method for dual energy material
decomposition, Journal of X-ray Science and Technology 12(1) (2004), 19–25.
[53] L. Pan, Z. ChongXun, Y. Yong, Z. Feng and Y. XiangGuo, A probability model-based level set method for biomedical
image segmentation, Journal of X-ray Science and Technology 13(3) (2005), 117–127.
[54] E. Ahmed, M.A. Yousry, W. Shiqian and H. Qingmao, Robust kernelized local information fuzzy C-means clustering
for brain magnetic resonance image segmentation, Journal of X-ray Science and Technology 24(3) (2016), 489–507.
[55] R.I. Maher, Feature extraction of dermatoscopic images by iterative segmentation algorithm, Journal of X-ray Science
and Technology 16(1) (2008), 33–42.
[56] P.N.R. Shabnam and S.M. Mohamed, Detection of pneumonia in chest X-ray images, Journal of X-ray Science and
PY
Technology 19(4) (2011), 423–428.
[57] S. Lawrence, C.L. Giles, A.C. Tsoi and A.D. Back, Face recognition: A convolutional neural-network approach, Neural
Networks, IEEE Transactions on 8(1) (1997), 98–113.
[58] R. Caruana and A. Niculescu-Mizil, An empirical comparison of supervised learning algorithms, In Proceedings of
the 23rd international conference on Machine learning, ACM (2006), 161–168.
[59] L. Perez and J. Wang, The Effectiveness of Data Augmentation in Image Classification using Deep Learning, arXiv
CO
preprint arXiv:1712.04621 (2017).
[60] R. Karim, P. Bhagirath, P. Claus, R.J. Housden, Z. Chen, Z. Karimaghaloo, H. Sohn et al., Evaluation of state-of-the-
art segmentation algorithms for left ventricle infarct from late Gadolinium enhancement MR images, Medical Image
Analysis 30 (2016), 95–107.
[61] Y. Yuhua, L. Lixiong, L. Lejian, M. Wei, G. Jianping and L. Yinghui, Sigmoid gradient vector flow for medical image
segmentation, In Signal Processing (ICSP), 2012 IEEE 11th International Conference on, IEE 2(2012), 881–884.
IEEE.
OR
TH
AU