Sie sind auf Seite 1von 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.


Computer aided lung cancer diagnosis with deep learning algorithms

Conference Paper · March 2016

DOI: 10.1117/12.2216307

32 2,037

3 authors:

Wenqing Sun Bin Zheng

University of Texas at El Paso University of Oklahoma


Wei Qian
University of Texas at El Paso


Some of the authors of this publication are also working on these related projects:

Breast Cancer Detection and Prediction View project

All content following this page was uploaded by Wenqing Sun on 02 April 2018.

The user has requested enhancement of the downloaded file.

Computer aided lung cancer diagnosis with deep learning algorithms
Wenqing Suna, Bin Zhengb, c, Wei Qiana, c
a) Medical Imaging and Informatics Laboratory, Department of Electrical & Computer Engineering,
University of Texas, El Paso, Texas, United States
b) College of Engineering, University of Oklahoma, Norman, Oklahoma, United States
c) Sino-Dutch Biomedical and Information Engineering School, Northeastern University, Shenyang, China

Deep learning is considered as a popular and powerful method in pattern recognition and
classification. However, there are not many deep structured applications used in medical imaging
diagnosis area, because large dataset is not always available for medical images. In this study we
tested the feasibility of using deep learning algorithms for lung cancer diagnosis with the cases
from Lung Image Database Consortium (LIDC) database. The nodules on each computed
tomography (CT) slice were segmented according to marks provided by the radiologists. After
down sampling and rotating we acquired 174412 samples with 52 by 52 pixel each and the
corresponding truth files. Three deep learning algorithms were designed and implemented,
including Convolutional Neural Network (CNN), Deep Belief Networks (DBNs), Stacked
Denoising Autoencoder (SDAE). To compare the performance of deep learning algorithms with
traditional computer aided diagnosis (CADx) system, we designed a scheme with 28 image
features and support vector machine. The accuracies of CNN, DBNs, and SDAE are 0.7976,
0.8119, and 0.7929, respectively; the accuracy of our designed traditional CADx is 0.7940,
which is slightly lower than CNN and DBNs. We also noticed that the mislabeled nodules using
DBNs are 4% larger than using traditional CADx, this might be resulting from down sampling
process lost some size information of the nodules.
Key Words: lung cancer, deep learning, computed tomography, computer aided diagnosis
(CADx), Convolutional Neural Network (CNN), Deep Belief Networks (DBNs), Stacked
Denoising Autoencoder (SDAE)
1. Introduction:
Deep Learning is a new subfield of machine learning created by Hinton [1] which was inspired
by the human brain’s architecture. By learning from the deep, layered and hierarchical models of
data, the deep learning algorithms can outperform the traditional machine learning models.
However, even ten years ago, most people still think this deep structured algorithm can only be
used in simple image classifications like handwritten numbers recognition. But with the
development of deep learning algorithm, many research groups have been already successfully
applied to more complicated classification tasks. In ImageNet LSVRC-2012 contest, the winner
group used deep learning algorithm successfully classified 1.2 million high-resolution images
into 1000 different classes with an error rate of 15.3%, compared to 26.2% reported by the
second-best group [2]. In another contest, deep learning algorithm won MICCAI 2013 Grand
Challenge and ICPR 2012 Contest on Mitosis Detection [3]. In recent years, some researchers
used convolutional neural network (CNN) to detect clustered microcalcifications in digital breast
tomosynthesis, and the results are promising [4] [5].

Medical Imaging 2016: Computer-Aided Diagnosis, edited by Georgia D. Tourassi, Samuel G. Armato III,
Proc. of SPIE Vol. 9785, 97850Z · © 2016 SPIE · CCC code: 1605-7422/16/$18 · doi: 10.1117/12.2216307

Proc. of SPIE Vol. 9785 97850Z-1

Downloaded From: on 04/17/2016 Terms of Use:

To the best of our knowledge, no literature has reported the purely data driven approach to
classify the lung cancer lesion images. In this study, we implemented and compared three
different deep learning algorithm with the traditional image feature based CAD system. All the
algorithms were applied to Lung Image Database Consortium and Image Database Resource
Initiative (LIDC/IDRI) public database, and details of data description and algorithms design are
listed below.
2. Materials and methods:
2.1 Data:
There are 1018 lung cases in LIDC database collected from Seven academic centers and eight
medical imaging companies. Four radiologists independently reviewed each CT scan and
suspicious marked lesions with five malignancy level ratings. There are five different ratings of
the malignancy levels raging from 1 to 5, and level 1 and 2 represent benign cases and level 4
and 5 denote malignant cases.
For every nodule, we removed top layer and bottom layer of each cubic, because of their sizes
and shapes might significantly different from rest of the layers, and they are not representative
for the nodule. The rest of the layers, the nodule areas were segmented based on the union of the
four radiologists’ truth files. If the segmented region of interest (ROI) can be fitted into a 52 by
52 pixel rectangular, this ROI was placed into the center of the box. For the ROIs exceed this
size, they will be down sampled to the size of 52 by 52 pixels. Then each ROI will be rotated to
four different directions, and converted to four single vectors with each represent one ROI at one
direction. All the pixel values in the vectors were down sampled to 8 bits. From these 1018 cases
we generated 174412 vectors are acquired and each vector has 2704 elements. According to the
malignancy levels provided by the four radiologists, we calculated the average ratings of the four
radiologists, and the final truth file was made based on the average ratings. The distribution of
malignancy levels of the data is shown in table 1. All the intermediate cases (level 3) were
eliminated, and 114728 vectors were remained and used for classification. Among them, 54880
cases were benign and 59848 cases were malignant.
Table 1: The distribution of malignancy likelihood level of each nodule.

Level 1 Level 2 Level 3 Level 4 Level 5

Amount 20500 34380 59684 26316 33532

2.2 Methods:
In this study, three deep learning models, Convolutional Neural Network (CNN), Deep Belief
Networks (DBNs), Stacked Denoising Autoencoder (SDAE) were implemented and compared
on the same dataset. Al the codes and experiments were implemented and run on a machine with
2.8 GHz Intel Core i7 processor and 16 GB 1600 MHz DDR3 memory.
The architecture of our CNN contains 8 layers, and except input and output layer, every odd
number layer is a convolution layer and every even number layer is a pooling and subsampling
layer [6]. For each convolution we used 12, 8 and 6 feature maps, and they are all connected to

Proc. of SPIE Vol. 9785 97850Z-2

Downloaded From: on 04/17/2016 Terms of Use:

the pooling layers through 5 by 5 kernels. The batch size was set to 100 and the learning rate was
1 for 100 epochs. The details of each layer is shown in the figure 1.

Input image

Feature map: 12 Kernel size: 5
Scale 2

Feature map: 8 Kernel size: 5
Scale 2

Feature map: 6 Kernel size: 5
Scale 2

Output neuron

Figure 1: The architecture of CNN

The second deep learning algorithm we tried was DBNs, it was obtained by training and stacking
four layers of Restricted Boltzmann Machine (RBM) in a greedy fashion. Each layer contains
100 RBM, and RBM doesn’t allows the interactions either between the hidden units or between
visible units with each other. The trained stack of RBMs were used to initialize a feed-forward
neural network for classification. For the output vector hk of layer k is computed by using the
output hk−1 of the previous layer k-1 following the formulation hk = tanh(bk + Wk hk−1), where
parameters bk represents the vector of offsets and Wk is a matrix of weights [7].
The third model we tested was three layer SDAE [8] and each autoencoder was stacked on the
top of each other. The structure is similar to the DBNs motioned above. There are 2000, 1000,
and 400 hidden neurons in each autoencoder with corruption level of 0.5. For both DBN and
SDAE the size of batches was set to 100, and the learning rate was 0.01 for all the 100 epochs.
To compare the performance of deep learning algorithms and traditional CAD scheme, we also
tested the same dataset on our traditional CAD system. We extracted the 35 features, including
30 texture features and 5 morphological features from each ROI, and these features are proved to
be useful features in our previous studies. The 30 texture feature include 22 features from Grey-
Level Co-occurrence Matrix (uniformity, entropy, dissimilarity, inertia, inverse difference,
correlation, homogeneity, autocorrelation, cluster shade, cluster prominence, maximum
probability, sum of squares, sum average, sum variance, sum entropy, difference variance,
difference entropy, information measures of correlation, information measures of correlation,

Proc. of SPIE Vol. 9785 97850Z-3

Downloaded From: on 04/17/2016 Terms of Use:

maximal correlation coefficient, inverse difference normalized, inverse difference moment
normalized), and 8 wavelet features (mean and variance from four combinations of high-pass
filters and low-pass filters). The 5 morphological features are area, skewness, mean intensity,
variance and entropy. Then the kernel based support vector machine (SVM) was used to train the
classifier on the same training data.
3. Results:
The CNN algorithm can help the computer learn its own features, instead of using human-
designed features. There are 600, 400 and 300 feature maps in each layer, some examples of the
learned features in layer 1 are shown in figure 2. From the figure we can see different curves
representing the characters of lower left corners of the nodules. There are 12, 96, and 48 kernels
in layer 1, 2, 3 respectively, and Figure 3 and 4 are the visualizations of the kernels in the first
and last layer. The final mean squared error of the training data is 0.1347, and its change with
iterations is shown in Figure 5.

Figure 2: Examples of some learned features from CNN

Figure 3: The visualization of the 12 kernels in the first layer.

Proc. of SPIE Vol. 9785 97850Z-4

Downloaded From: on 04/17/2016 Terms of Use:

Figure 4: The visualization of the 48 kernels in the third layer.

Mean squared errors on training data








0:5 1 1.5 2 2:5 3 31.5

Iterations x10°

Figure 5: The mean squared errors history of training samples in CNN.

Figure 6 and 7 shows the visualization of the weights in first and second RBM, which is a key
module for DBNs. Figure 8 shows the visualization of the weights of the neurons in SDAE.

Proc. of SPIE Vol. 9785 97850Z-5

Downloaded From: on 04/17/2016 Terms of Use:

100 500 600

Figure 6: Visualizaation of 100 random weiights in the first

f layer RB


Figure 7:
7 Visualizatiion of 100 raandom weighhts in the seccond layer RBM

Figuure 8: Visuallization of 1000 random weights

w of thhe neurons inn the first layyer of SDAE
The compparison of allgorithm acccuracies wass shown in Table
T 2. From
m these threee deep learniing
ms, we found d DBNs achiieved the besst performannce in terms of accuracy on testing data d
and meann squared errror on traininng data. Forr the compariison reason, we also testted the
traditionaal CADx sysstem on the same
s datasett. Our featurre set containns one groupp of texture
features and
a one grou up of densityy features. When
W we only use texturee features, thhe accuracy is
0.7409 att the thresho
old of 0.62577 which maxximize the arrea of largestt rectangularr under the ROC,

Proc. of SPIE Vol. 9785 97850Z-6

Downloaded From: on 04/17/2016 Terms of Use:

and AUC is 0.7914. When only use density features, the accuracy is 0.7814 and AUC is 0.8342.
When combine these features together, we get the accuracy of 0.7940 and AUC is 0.8427.
To analyze the influence of nodule size on our algorithm accuracy, we measured the nodule
pixels for the mislabeled cases, and reported the mean and standard deviation in the Table 3.
Since the distribution of nodule areas do not follow normal distribution, we conducted the Mann-
Whitney test, the p value for the test of any pair of the two groups is less than 0.0001.
Table 2: Accuracy comparison of three deep learning algorithms

Mean squared error Accuracy on

on training data testing data

CNN 0.1347 0.7976

DBN 0.1204 0.8119

SDAE 0.1333 0.7929

Table 3: Comparison of mislabeled nodule size of different algorithms.

Mean of Standard
Group nodule deviation of
size nodule size
Mislabeled cases
200 204
in DBN
Mislabeled cases
in traditional 192 196
All tested cases 247 245

4. Conclusions:
In this study, we tested the feasibility of using deep structured algorithms in lung cancer image
diagnosis. We implemented and compared the performance of three different deep learning
algorithms: CNN, DBNs, and SDAE, and the highest accuracy we get is 0.8119 using DBNs.
This accuracy is slightly higher than 0.7940 computed from traditional CADx system. The
comparison results demonstrated the great potential for deep structured algorithm and computer
learned features used in medical imaging area.
Defining the size of ROI is a very important step to apply deep learning algorithms to lung image
diagnosis. In many other image recognition tasks, like ImageNet classification challenge, the size
of the objects doesn’t have significant impact on the classification results, so all the images can
be simply down sampled to the same size. However, for lung cancer image diagnosis, the size of
nodules and how you crop the nodule areas are important, because the absolute nodule size is

Proc. of SPIE Vol. 9785 97850Z-7

Downloaded From: on 04/17/2016 Terms of Use:

one of the most important measurements for malignancy likelihood. We compared the nodule
size in pixels of the mislabeled cases using DBNs and traditional CADx, and the mislabeled
nodules using DBNs are 4% larger. One possible explanation is the larger nodules have to be
down sampled to fit into our selected ROIs, that might lose some of the shape information.
This study is a preliminary study to use deep learning algorithms to diagnose lung cancer, the
results showed very promising performance of the deep learning algorithms. In future, we will
test more deep structured schemes for lung cancer diagnosis, and find out more efficient way to
minimize the down sample effect.
[1] Geoffrey E. Hinton, Simon Osindero, and Yee W. Teh. A fast learn- ing algorithm for deep
belief nets. Neural Comput., 18(7):1527– 1554, July 2006.
[2] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep
convolutional neural networks. In Advances in neural information processing systems (pp. 1097-
[3] Cireşan, D. C., Giusti, A., Gambardella, L. M., & Schmidhuber, J. (2013). Mitosis detection
in breast cancer histology images with deep neural networks. In Medical Image Computing and
Computer-Assisted Intervention–MICCAI 2013 (pp. 411-418). Springer Berlin Heidelberg.
[4] Samala, R. K., Chan, H. P., Lu, Y., Hadjiiski, L. M., Wei, J., & Helvie, M. A. (2014). Digital
breast tomosynthesis: computer-aided detection of clustered microcalcifications on planar
projection images. Physics in medicine and biology, 59(23), 7457.
[5] Samala, R. K., Chan, H. P., Lu, Y., Hadjiiski, L. M., Wei, J., & Helvie, M. A. (2015).
Computer-aided detection system for clustered microcalcifications in digital breast
tomosynthesis using joint information from volumetric and planar projection images. Physics in
medicine and biology, 60(21), 8457.
[6] Palm, R. B. (2012). Prediction as a candidate for learning deep hierarchical models of data.
Technical University of Denmark.
[7] Bengio, Y. (2009). Learning deep architectures for AI. Foundations and trends® in Machine
Learning, 2(1), 1-127.
[8] Bengio, Y., Lamblin, P., Popovici, D., & Larochelle, H. (2007). Greedy layer-wise training of
deep networks. Advances in neural information processing systems, 19, 153.

Proc. of SPIE Vol. 9785 97850Z-8

stats on 04/17/2016 Terms of Use: