Beruflich Dokumente
Kultur Dokumente
Nusantara University
2 Computer Science Department, BINUS Graduate Program - Doctor of Computer Science, Bina
Nusantara University
2 Computer Engineering Department, Faculty of Engineering, Bina Nusantara University
3 Computer Science Department, School of Computer Science, Bina Nusantara University
3 Bioinformatics and Data Science Research Center, Bina Nusantara University
92
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
decision making by the interconnection between neu- hensive annotated image database that can be used as
rons (the smallest unit of the human nerve) [26, 27]. standard in object detection and classification. Thus, to
Because of its ability to adapt and extract complex fea- solve this problem, ImageNet is created to provide a hi-
tures by conducting hierarchical and abstract learning, erarchical solution to image recognition by annotating
deep learning, especially CNN, is the first choice for images on the internet [14, 28, 31, 33]. AlexNet [33],
completing various computer vision tasks. It includes Overfeat [34], and Region-Based Convolutional Neural
medical image analysis [3, 26]. Network (R-CNN) family [35–38] are some of the pop-
A typical CNN framework generally consists of ular CNN using ImageNet database to train its model.
several convolutional layers and subsampling or pool- After successfully proving that the CNN method as the
ing layers. Those are fully connected to a multilayer state-of-the-art in computer vision tasks, some devel-
perceptron. To analyze 2D images, the dimensions of opments made in object detection and classification are
the convolutional layer are generally set to two to take transfer learning methods [14, 28, 33, 39, 40], action
the local spatial pattern of the object under research. recognition using stream CNN on kinetics datasets
The size of the convolutional layer is smaller than (such as Two-Stream Inflated 3D ConvNet) [41], Re-
the input layer and can be stretched with multiple gion Proposal Network (RPN) and Region of Inter-
parallel map features. Feature maps can be interpreted est (RoI) used by Faster R-CNN and Mask R-CNN
as input or map images that are convolved with linear (two-stage object detectors) [37, 38]. The presence of
filters (sigmoid functions). They are parameterized RetinaNet as single-stage object detectors is faster and
with synaptic weights and biases [28–30]. Neighboring outperform two-stage object detector (Faster R-CNN)
hidden units on the feature map are the result of and its predecessor (YOLO and DSSD) performance
replication of the unit through the same weight vector by balancing foreground and background classes with
and bias parameters to reduce the number of free the function of focal loss cross-entropy and feature
parameters that can be examined. Each subsampling pyramid network (FPN) for multiscale extraction on
layer performs non-linear subsampling of the feature image feature maps [42–47].
map. Non-linear subsampling chooses the maximum
Previous researchers propose several techniques for
value of each overlapping subregion on the input map.
lung nodule detection. First, it is multi-scaling CNN
The main function of the subsampling layer is to
layers combined with machine learning methods (Sup-
reduce the learning complexity of the upper layer and
port Vector Machine (SVM) or Random Forest (RF))
is invariant to the translation effect of image input.
as classifiers after CNN [3, 7, 26]. Second, it transfers
The CNN model can be built from training data with
learning from 2D CNN [3, 7, 9, 15, 21, 48]. Third,
the gradient back-propagation method of multiple layer
they use multiple parallel layers 2D CNN (2.5D CNN).
layers connecting the top convolutional layer, down to
Fourth, they use 3D CNN trained from scratch [3, 8,
the next convolutional layer, until reaching the lowest
15, 20–23, 26, 48]. Fifth, it is transferring learning
convolutional screen. It is by adjusting the weight and
from pre-trained 3D CNN with previously trained
bias between layers [9, 30, 31].
natural images or lung nodule database weights [15,
Generally, the close layer to the input image has
20, 23, 24, 39, 48]. Sixth, it is using Dense Neural
fewer filters. In the farther layer from the input image,
Network (DenseNet) or dual path Fully Convolutional
the number of filters will increase. Although CNN’s
Network (Dual FCN) to classify, detect, or segment
performance is tremendously good, it has the following
lung nodules [15, 20, 24, 49, 50].
shortcomings. First, CNN requires much training data
that has been labeled, which is difficult to fulfill, espe- Most medical imaging studies implement 2D and
cially reading expertise from experts like radiologists. 2.5D CNN for medical image analysis. It is because
It requires expensive costs, and the number of specific the image size is large compared to small objects
diseases is small in number. Second, training deep of interest, such as lung nodules. 3D CNN is the
CNN requires computational ability, large memory most accurate approach, but it uses more comput-
allocation, and time-consuming. Third, there is the ing resources (memory and GPU). It is also difficult
possibility of overfitting and convergence problems, so to apply because of the depth dynamics presenting
that it requires repeated adjustments to the architecture in medical images [12, 15]. All methods mentioned
and parameters of the neural network to ensure all before are implemented to classify lung nodule ma-
layers of learning with good speed [3, 22, 29, 30, 32]. lignancy on public lung nodule datasets. However,
The rapid growth of image data on the Internet has in healthcare professional points of view, it is pre-
provided opportunities for models and algorithms to sumptuous for medical detection tools to diagnose
index, organize, and interact with the existing image patients with cancer. The diagnosis is not made by CT-
and multimedia data. However, there is still no compre- scan interpretation, but by requiring higher abstraction,
93
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
negotiation, decision making, and emotions (empathy) processes. First, it is matching the annotation coor-
to deal with patients clinically condition. The CAD dinates of each patient with a CT scan of the chest.
is unable to fulfill psychological needs and anxiety Second, it is matching the z coordinates (depth) with
resulting from diagnosis. Thus, instead of directly the DICOM metadata information that each CT image
verdict patients with the final diagnosis, this research slice has a thorax scan (unique alphanumeric). Third,
sees CAD as a tool to help radiologists and clinicians it is generating pseudo mask coordinates (pixel-wise
(pulmonologists) to achieve a clinical diagnosis. It is by annotation) and converting CSV annotations to XML
providing lung nodule location and texture information. format.
To the best of researchers’ knowledge, there is still no
research regarding implementing 3D CNN to detect B. Data Cleaning, Filtering, and Data Compatibility
and classify lung nodule texture. The processing of the Moscow private dataset con-
sists of 546 patients with a total of 674 thorax
III. R ESEARCH M ETHOD (body/tissue) CT or lung CT scans with slice thickness
(distance/thickness between slices) of 0.5 mm or 1 mm.
The method proposed in this research consists of The annotation contains information on the patients’
three main phases. Those are data profiling, data code (or the name of the main folder), doctors’ code
cleaning, filtering, and data compatibility, 3D CNN (000, 001, 002, 003, 004, 006, 007, 008, 008, 009, 010,
architecture design and transfer learning, and training 011, 012, 013, 014), 3D coordinates of picture dataset
and evaluation. based on x, y, and z coordinates of lesions and texture
(solid, subsolid, ground-glass) of lung nodule lesions.
Moscow private dataset requires data cleaning because
A. Data Profiling
of several reasons.
This research aims to implement 3D CNN on First, there is no agreement on coordinates naming
Moscow private dataset. Moscow private dataset is by radiologists. The origin coordinates system is sup-
obtained from the Medical Radiology Center of posedly located in the upper left/upper right side on the
Moscow Healthcare Department in collaboration with corner of the patient. As for the z = 0 coordinates, the
NVIDIA Asia-Pacific via a private link provided by majority of doctors still use the benchmark in which
the NVIDIA-Binus AI R&D Center. Moscow private the closest location to the head will be designated as
dataset has 546 patient data annotated by at least z = 0 (contrary to the World Coordinate System stan-
three radiologists for each patient. Moscow private dard). Meanwhile, other doctors use World Coordinate
dataset also provides annotation information on nod- System standards in which the closest location to the
ule coordinates (x, y, and z), nodule size in mm, foot is designated as z = 0.
and nodule texture (solid/subsolid/ground-glass). All Second, data cleaning is 10% of coordinating the
annotation information for 546 patients is presented annotation data (x, y, and z) by pointing to areas
in one Microsoft Excel (.xlsx) file in Russian and outside the lungs (bone, liver, or neck). It may be
one Microsoft Excel (.xlsx) file in English. From 546 caused by the coordinating process carried out before
patients, 472 patients have lung lesions (pulmonary the CT scan slice thickness. It is standardized to ensure
nodules), and 74 have no pulmonary nodules. About 0.5 mm and 1.0 mm slice thickness. There are also
472 patients may contain one or a combination of image data providing two types of slice thickness (0.5
two or three types of pulmonary nodules. From pa- mm and 1.0 mm) with single annotation not specifying
tients who have pulmonary nodules, the number of slice thickness. Thus, it increases the difficulty of
solid: subsolid: ground-glass annotations are obtained determining whether the coordinates given are suitable
as many as 2136 : 778 : 384 = 5.56 : 2.02 : 1. To for images with 0.5 mm or 1.0 mm of slice thickness.
reduce the imbalance classes between solid, subsolid, The 0.5 mm slice thickness is not necessarily two times
and ground-glass classes, non-solid classes are formed. the number of 1.0 mm slice thickness.
Those are the sum of the subsolid classes and ground- Third, the data cleaning is needed as the patients
glass classes. The final ratio obtained after averaging who do not have nodules are mixed together with
the doctor’s expertise for each nodule observation gives positive nodule data. Thus, it results in 96 non-nodule
a ratio of solid nodules: non-solid (2136 : 1420 = datasets to be excluded from the training process.
1.83 : 1). The detailed data class distribution can Fourth, the quality of the resolution of the CT scan
be seen in Table I. However, due to the annotation in Moscow private dataset has a bad quality of im-
structure, annotation agreement, and raw image quality age resolution compared to other publicized CT scan
in Moscow private dataset resulting in poor quality, a datasets such as LIDC public dataset [3, 7–9, 15, 20–
data cleaning process is needed. It consists of several 23, 26, 39, 48]. It may be caused by the suboptimal
94
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
TABLE I
T HE D ETAILED C LASS D ISTRIBUTION OF M OSCOW P RIVATE DATASET.
regulation of power source (kilovolt peak, tube current, inconsistent references by Moscow radiologists. After
and tube speed rotation), noise filtering, or non-optimal matching the coordinates with the DICOM metadata
iterative reconstruction. The Poisson noise by subop- information, the data cleaning process is continued
timal iterative reconstruction causes bright and dark by dynamically generating pseudo mask pixel-wise
lines, so lowers the image quality [16, 25, 51]. annotation coordinate and is saved in XML format.
Fifth, Moscow private dataset also includes patients After cleaning the data for the Moscow private dataset,
who have pulmonary radiological lesions besides the each DICOM file (standardized file of medical images)
pulmonary nodules into datasets (interstitial lung and of the Moscow private dataset is converted into 3D
pleura inflammation or infection such as pneumonia, volumetric Nearly Raw Raster Data (NRRD) format
bronchiolitis, interstitial lung disease, pleuritis, hy- to ensure data volatility and intensity (Hounsfield unit
drothorax, and post-pneumonectomy) in addition to (HU)) normalization [2, 6, 53]. As for the Moscow
nodule datasets. However, it is understandable since private dataset, due to a large number of DICOM files
most terminally ill lung cancer patients have a high risk in which the metadata corruption arises, the DICOM
of lung infection and inflammation. Sixth, manual data datasets are reconstructed and converted one by one
cleaning in the existing datasets cannot be viewed us- manually using the third-party apps (3D Slicer) [52].
ing DICOM viewer programs such as Aliza and MITK. However, the outputs of NRRD files are still lower
It can only be viewed and reconstructed correctly using in terms of quality compared to LIDC public dataset
DICOM reconstruction tool applications such as 3D because of Poisson noise.
Slicer [52] due to error in orientation. It is suspected After the conversion process of DICOM to NRRD
due to the corrupted metadata information caused by format, the next step is to convert annotation infor-
machine malfunctions. mation from XML to NifTi and planar (.pf) format.
The steps taken to overcome the first to third prob- Those serve as a pseudo mask. The NRRD datasets
lems are the researchers review the raw CT scan dataset are converted to 4D NumPy array with dimensions of c
manually and match them carefully with their respec- (picture channels), x (length of picture datasets, default
tive annotations one by one. It is to ensure that none = 512), y (width of picture datasets, default = 512),
of the annotations is out of bounds of the lung area. and z (depth of picture datasets, variable depending
The reviews of the pulmonary CT scan dataset show on patient age, and body size)= 2, 512, 512 for the
that there is only one consistent doctor (code number: images and Region of Interest (RoI) mask. NumPy
011) out of a total of 15 radiologists participating in array image size adjustment methods are as follows.
reading pulmonary CT scan data. Each NumPy array with dimensions (channel: z, x, y)
To overcome the z coordinate standardization prob- = (number of slices: 512, 512, 2) will be standardized
lem mentioned in the first problem, the researchers in size by readjusting the fixed pixel spacing (distance
match the depth coordinates by using spatial infor- between pixels on the x and y axis is 0.7 mm) and
mation on DICOM metadata (code: 0020, 0032). It is fixed slice thickness (the distance between the images
to accurately describe z coordinates, rather than using on the z-axis is 1.25 mm). The reason for using 0.7
95
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
mm pixel spacing is to accommodate the smallest C. 3D CNN Architecture Design and Transfer Learn-
C. pulmonary
3D CNN Architecture
nodule size Design
of 3and
mm Transfer LearningThus, ing
(4.3 pixels).
it The
can 3D
be CNN
read architecture adopts thedetection
with the smallest I3D backbone and single-stage of state-of-the-art object detector and RetinaNet 3D,
anchors
which
size,each the source
which is 4 × code can beHowever,
4 pixels. downloaded from
since Github public
Moscow Therepository.
3D CNNThe researchers
architecture also include
adopts the I3DFeature
backbone Pyramid
Network usage as original RetinaNet architecture by adjusting the feature size with anchors resizing
private dataset is high in-depth variable, slice thickness, and single-stage of state-of-the-art object detector andto one-eight of original
RetinaNet paper. It results in the smallest anchors size equalling to 4 x 4 pixels to enable nodule
and pixel spacing, it is impossible to be fitted into RetinaNet 3D, which each the source code can be (small object) detection. Instead
of using generic ResNet-50 and ResNet-101 backbone for featuredownloaded extraction, thefrom Github public repository. The re- by
researchers use I3D backbone. It is acquired
permanently fixed input of 96 × 96 × 96 as proposed
inflating Inception V2 (2D CNN), which has been already proven to be state-of-the-art architecture for kinetic video datasets
by Ref. [50]. First, Moscow dataset uses 0.5 mm and 1 searchers also include Feature Pyramid Network usage
feature extraction. It can also transfer learning using pre-trained weight from the natural image database (ImageNet) to 3D CNN.
mm slice thickness withdatasets
high pixels spacing variable. as original RetinaNet architecture by adjusting the fea-
Because 3D medical image consist of many slices, Itit can also be viewed as multiple frames of kinetics dataset such as
is widely ranging from 0.54–1.04 pixels spacing, which ture size with anchors resizing to one-eight of original
the kinetics dataset database used by two-stream I3D research for action recognition classification by Ref. [41]. The researchers
encompasses adults, geriatric, and pediatric population.
can empirically justify the use of the pre-trained I3D backbone to RetinaNet paper.
replace the It results
backbone in the
ResNet usedsmallest anchorsRetinaNet
by the original size
Meanwhile, LIDC in DeepLesion uses 1.25 mm and equalling to 4 × 4 pixels to
architecture to enable transfer learning [41]. The illustration of the 3D CNN architecture can be seen on Fig.1. enable nodule (small
2.5 mm slice thickness with 0.6–0.9 pixels spacing. object) detection. Instead of using generic ResNet-
Because most of LIDC population are adults, Moscow 50 and ResNet-101 backbone for feature extraction,
D. Training & Evaluation setup
private datasets need the standardization of using 0.7 the researchers use I3D backbone. It is acquired by
mmMoscow
pixels.private
Second,dataset
thereisare
separated with theofratio
a proportion of 3:1:1 (346:116:116)
Moscow for training,
inflating Inception V2 (2Dvalidation,
CNN),andwhich
testing.hasExperimental
been
design for 3D CNN evaluation in detecting and classifying texture for Moscow private dataset uses 100
datasets contain a-4 large nodule (around 113 mm). It is already proven to be state-of-the-art architecture for hyperparameter epochs,
-
learning rate of 1x10 , 15 batch size, 115 batch numbers, 10 validationkinetic video datasets feature extraction. It can also1x10
samplings, Non-Maximum Suppression (NMS) of
5 larger than 96 pixels in width and length [50]. Thus,
, and temporal ensemble of 5 top epochs which are maximized by GPU memory consumption on 16GB Tesla P100 server.
the approach using dynamically larger pixel length and transfer learning using pre-trained weight from the
width with a smaller depth size (0.5*length, IV. 0.5*width, natural image database (ImageNet) to 3D CNN. Be-
RESULTS & DISCUSSION
64) reduces the size of the input data to minimize RAM cause 3D medical image datasets consist of many
This research evaluates 3D CNN performance with andslices, without transfer
it can alsolearning
be viewedand as
separates
multipleit into
framesthree
of sub-
and GPU memory. It also preserves volumetricity or
experiments. Those are 3D CNN with pre-trained ImageNet I3Dkinetics weight (3Ddataset such as the kinetics dataset database from
CNN T-ImageNet) versus 3D CNN trained
3D spatial information of a lung CT scan.
zero (3D CNN T-Zero) versus 3D CNN with pre-trained LIDC weight (3D CNN T-LIDC) for 578 Moscow private dataset.
used by two-stream I3D research for action recognition
Figure 2 shows the output of the successful true positive prediction of lung nodule texture detection and classification
classification by Ref. [41]. The researchers can empir-
generated by 3D CNN. Then, Fig. 3 presents an example of false-positive non-nodule artifact detected or wrongly classified
class. False-positive prediction in detection may be caused by a ically justify
very small the use
nodule of the
or very pre-trained
lucent I3D2:backbone
nodule (Fig. Patient 6 and
to replace the backbone ResNet used
Patient 9), ambiguous nodule that retains both solid and non-solid characteristics (semi-opaque) predicted as other by the original
classes (Fig.
2: Patient 7 and Patient 9), nodule-like structure (Fig. 2: Patient 8RetinaNet
and Patient architecture to enable
9), or possibly missedtransfer learning [41].
or misdiagnosed nodule by
96
private dataset data. Additionally, this study also uses a unique approach compared to the previous study. All previous studies
only classify malignancy status using the LIDC public dataset. This research detects and classifies the texture information
contained in Moscow private dataset. The information about textures can help the initial screening, reduce the time needed for
consensus (comparing opinion between radiologist), and indirectly score the process to determine lung malignancies.
In research involving specific medical images and requiring greater accuracy, the researchers suggested the future
researchers
Cite thiscollaborate
article as:with
I. W.medical
Harsono,institutes. Thus, they and
S. Liawatimena, can T.
obtain highly accurate
W. Cenggoro, “Lungannotations with high-quality
Nodule Texture images.
Detection and
Annotation processes
Classification should
Using 3DbeCNN”,
assisted using special
CommIT programs that&can
(Communication directly record
Information metadata information
Technology) from91–103,
Journal 13(2), each image
in detail
2019.(coordinates, pixel spacing, slice thickness, the origin of the image, and pixel-wise perimeter) with the friendly user
interface. Then, it can be used by specialist health workers (radiologists) without needs for special training.
Fig.2.2.TheThe
Fig. example
example of patch
of 20 20 patch sequential
sequential patches
patches (10x2) × 2) from
(10 taken taken five
fromdifferent
five different patients.
patients. It is correctly
It is correctly detected detected and predicted
and predicted by 3D byCNN
3D CNN architecture with
architecture pre-trained
with LIDC
pre-trained LIDCweight forfor
weight Detection
Detectionand Classification
and Classificationofof
lung nodule
lung noduletexture
texturein in
Moscow
Moscow private
privatedataset
dataset.
The illustration of the 3D CNN architecture can be Figure 2 shows the output of the successful true
seen on Fig. 1. positive prediction of lung nodule texture detection
and classification generated by 3D CNN. Then, Fig. 3
D. Training and Evaluation Setup presents an example of false-positive non-nodule arti-
Moscow private dataset is separated with the ratio fact detected or wrongly classified class. False-positive
of 3:1:1 (346:116:116) for training, validation, and prediction in detection may be caused by a very small
testing. Experimental design for 3D CNN evaluation nodule or very lucent nodule (Fig. 2: Patient 6 and
in detecting and classifying texture for Moscow private Patient 9), ambiguous nodule that retains both solid and
dataset uses 100 hyperparameter epochs, learning rate non-solid characteristics (semi-opaque) predicted as
of 1 × 10−4 , 15 batch size, 115 batch numbers, 10 val- other classes (Fig. 2: Patient 7 and Patient 9), nodule-
idation samplings, Non-Maximum Suppression (NMS) like structure (Fig. 2: Patient 8 and Patient 9), or
of 1 × 10−5 , and temporal ensemble of 5 top epochs possibly missed or misdiagnosed nodule by annotating
which are maximized by GPU memory consumption radiologists.
on 16GB Tesla P100 server. Figure 2 is a compilation of the results of CT
scans of five patients. They have nodules of different
IV. R ESULTS AND D ISCUSSION sizes and texture (solid, subsolid, and ground glass).
This research evaluates 3D CNN performance with They are presented sequentially from left to right. The
and without transfer learning and separates it into three top picture of each subsample (below the patient’s
sub-experiments. Those are 3D CNN with pre-trained number) is a picture of raw data patches. The second
ImageNet I3D weight (3D CNN T-ImageNet) versus row shows a yellow pixel mask ground-truth with a
3D CNN trained from zero (3D CNN T-Zero) versus red bounding box and a class description (1 = non-
3D CNN with pre-trained LIDC weight (3D CNN T- solid, 2 = solid) below the box. The second row also
LIDC) for 578 Moscow private dataset. shows a cyan prediction anchor box and a dark blue
97
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
Fig. 3. Fig.
The 3. The example
example of 20sequential
of 20 patch patch sequential
patchespatches
(10x2) (10 × 2)
which arewhich
takenare taken
from fourfrom four different
different patients.patients.
The FalseThePositive
False Positive
Rate is Rate is
predicted by 3D
predicted by 3D CNNwith
CNN architecture architecture withLIDC
pre-trained pre-trained
weightLIDC weight forand
for detection detection and classification
classification of lungtexture
of lung nodule nodule intexture
Moscowin Moscow
privateprivate
dataset.
dataset.
prediction box with a prediction class and confidence misdiagnosed by annotating radiologists. Next, Patient
score. For example, dark blue box with description 8 shows a nodule-like structure that has a high non-
2 | 100 signifies a prediction of a solid class with a solid confidence score (91%) caused by underlying
100% confidence score. The third row is the filter-out disease besides lung nodules such as tuberculosis and
mask result, so it has similarities to the second row lung infection or possibly missed out by annotating
except there is no pixel mask ground-truth. The last radiologists.
row is the result of combining the third row with the Meanwhile, Patient 9 shows an ambiguous nodule
first row (original patches raw data). that is falsely classified as a solid nodule rather than
Moreover, Fig. 3 is a compilation of the results a non-solid nodule (85% vs 57%) near the falsely de-
of CT scans of five patients who have nodules of tected nodule. There is some hazy structure in the lung
different sizes and texture (solid, subsolid, and ground- detected as a non-solid nodule with a low confidence
glass). It is presented sequentially from left to right. score of 21%. It can be caused by other lung diseases
Patient 6 shows a very small detected nodule but or missed by radiologists.
poorly classified (solid nodule is predicted as non-solid Figures 4 and 5 show the Receiver Operating Char-
with a score of 53% confidence and solid with 46% acteristics (ROC) curve for solid and nonsolid nod-
confidence). Then, Patient 7 shows a detected semi- ules. The complete performance summary is shown in
opaque solid nodule. It is falsely classified as a non- Fig. 6. The 3D CNN T-LIDC and 3D CNN T-ImageNet
solid nodule with confidence as high as 91%. This performance are not different, but these two models
type of misprediction is caused by the semi-opacity of implementing transfer learning consistently show supe-
the nodule in the CT scan. It is generally interpreted rior results compared to 3D CNN T-Zero (mean mAP
Fig. 4 Receiver Operating Characteristics (ROC) curve for 3D CNN T-ImageNet vs. 3D CNN T-Zero vs. 3D CNN T-LIDC for Moscow private
as a non-solid (subsolid) nodule. However, there datasetisfor solid
22.86% vs 22.16% vs 9.70%, AUC 70.36% vs 69.00%
nodule
a probability that this type of ambiguous nodule is vs 56.10%, Sen 65.46% vs 65.90% vs 47, 54%, FPR
98
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
Fig. 3. The example of 20 patch sequential patches (10x2) which are taken from four different patients. The False Positive Rate is predicted by
CNN architecture with pre-trained LIDC weight for detection and classification of lung nodule texture in Moscow private dataset.
imbalance between solid and nonsolid on the Moscow [1] G. D. Meanwhile, thedetection
Rubin, "Lung nodule and cancer noise cantomography
in computed be suppressed by vol.
screening," J Thorac Imaging, ade-
30, pp. 130-8, Mar 2015.
[2] S. J. Swensen, R. W. Viggiano, D. E. Midthun, N. L. Müller, A. Sherrick, K. Yamashita, et al., "Lung Nodule Enhancement at CT: Multi
private dataset. Fifth, the Moscow private dataset also quate
Study," iterative
Radiology, vol. 214, pp.reconstruction
73-80, 2000. on raw uncompressed CT
[3] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, et al., "A survey on deep learning in medical image analysis,
includes patients who have pulmonary radiological Image scan Anal, data
vol. 42, pp. (*.raw).
60-88, Dec 2017. Although the architecture is not
lesions other than pulmonary nodules, such as patients satisfactory enough due to the limited amount and
with lung and pleural inflammation (pneumonia, bron- low quality of the private dataset, the researchers suc-
chiolitis, interstitial lung disease, pleurisy, hydrothorax, cessfully implement and prove the benefit of transfer
and post-pneumonectomy). It increases the difficulty of learning of pre-trained weight on 3D CNN. It has
datasets to be abstractly interpreted by CNN. higher performance metrics (higher mAP, AUC, and
training time) to non-transferred ones.
V. C ONCLUSION The performance metric of 3D CNN pre-trained
The techniques propose a unique way for lung CT LIDC weight (similar datasets) is the architecture hav-
scan feature extraction. This model can standardize ing the best performance metrics. It is followed by
various age groups and body size (reflected in highly slightly lower 3D CNN pre-trained ImageNet and poor
variable slice thickness and pixel spacing) CT scan data performance metrics of 3D CNN trained from zero, as
into a dynamic dimension of (0.5*length, 0.5*width, shown in Fig. 6. Pre-trained LIDC architecture shows a
64). It can also give solutions to the unforeseen issue slightly higher FPR compared to pre-trained ImageNet
of data. In this case, it is an error in orientation caused architecture. It is probably caused by high variable
by machine malfunctions or sudden movement inside characteristics of data on Moscow private datasets
CT scan machines. The data corruption can be partially compared to LIDC public datasets. The patients with
recovered by using a high-end reconstruction library mixed lesions (non-nodule lesions) are also included
like 3D Slicer. in Moscow private datasets. Thus, it can be concluded
99
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
[1] G. D. Rubin, "Lung nodule and cancer detection in computed tomography screening," J Thorac Imaging, vol. 30, pp. 130-8, Mar 2015.
[2] S. J. Swensen, R. W. Viggiano, D. E. Midthun, N. L. Müller, A. Sherrick, K. Yamashita, et al., "Lung Nodule Enhancement at CT: Multicenter
Study," Radiology, vol. 214, pp. 73-80, 2000. 100
[3] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, et al., "A survey on deep learning in medical image analysis," Med
Image Anal, vol. 42, pp. 60-88, Dec 2017.
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
101
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
102
Cite this article as: I. W. Harsono, S. Liawatimena, and T. W. Cenggoro, “Lung Nodule Texture Detection and
Classification Using 3D CNN”, CommIT (Communication & Information Technology) Journal 13(2), 91–103,
2019.
103