Sie sind auf Seite 1von 4

Brain Tumor Grading Based on Neural Networks and

Convolutional Neural Networks


Yuehao Pan, Weimin Huang, Zhiping Lin, Wanzheng Zhu, Jiayin Zhou, Jocelyn Wong, Zhongxiang Ding
Abstract—This paper studies brain tumor grading using Different models, like Support Vector Machine (SVM)
multiphase MRI images and compares the results with and Neural Network (NN), are widely used in the previous
various configurations of deep learning structure and baseline researches. These models showed a good performance on
Neural Networks. The MRI images are used directly into the tumor classification. However, manually selection of features
learning machine, with some combination operations is normally carried out before classification. In the research
between multiphase MRIs. Compared to other researches, on brain tumor grading, Soltaninejad et al. [3] classified
which involve additional effort to design and choose feature tumors with different grades based on SVM using features of
sets, the approach used in this paper leverages the learning 38 first-order or second-order statistical measurements. Their
capability of deep learning machine. We present the grading results showed over 80% on accuracy over 21 patients
performance on the testing data measured by the sensitivity applying different grading combinations. The procedure takes
and specificity. The results show a maximum improvement the segmented tumor slices as training examples and the
of 18% on grading performance of Convolutional Neural features are carefully selected before training. Another
Networks based on sensitivity and specificity compared to research, introduced by Zacharaki et al. [4], showed a better
Neural Networks. We also visualize the kernels trained in result of 87% in accuracy on two-class neoplasms
different layers and display some self-learned features classification, using SVM-RFE. The other research,
obtained from Convolutional Neural Networks. conducted by Sudha et al. [5], showed an accuracy of 96.7%
using Back Propagation Neural Networks in classifying
I. INTRODUCTION abnormal tumors from normal ones. The method also
The studies on brain tumor detection and segmentation involved some calculations of features such as Low
get increasingly popular over the past few years. By the GrayLevel Run Emphasis, and the optimal features were
increasing power of computing, Magnetic Resonance carefully selected using fuzzy entropy measure.
Imaging (MRI), which is a major imaging technology for The current state of the art deep learning models such as
clinical brain tumor detection, can be now used more Convolutional Neural Network showed a good performance
efficiently. With its unique non-ionizing radiation, details of over object classifications. Moreover, deep learning models
important body organs, such as brain, can be shown clearly. are generally unsupervised learning model, which the
However, brain tumor detection over MRIs has its limitation. features of object are learned spontaneously from input. Le et
Because the shape and size of brain tumors varies over al. [6], in the research based on the ImageNet dataset,
different patients, predictions on tumor can be very difficult. introduced a high performance object detector using the
The results of Multimodal Brain Tumor Image Segmentation concepts of Convolutional Neural Networks. The detector
Benchmark (BRATS), organized jointly with the MICCAI obtained an accuracy of 15.8% on ImageNet dataset, which
2012 and 2013 conferences, show that quantitative achieved 70% relative improvement over other researches
assessments uncovered significant contradiction between the done in the past. The visualization of the kernel from cat face
human raters in segmenting different tumor sub-locales. And detector showed a rough figure of a cat’s head that simply
the Dice Scores, which is the measure of results, ranges from concisely concludes the face features of a cat.
74% to 85% [1].
In our experiment, we compared the grading performance
One of the major difficulties for brain tumor detection on both Back Propagation Neural Network and
and segmentation is feature selection. There are many Convolutional Neural Network. The training and testing data
innovative methods in processing features, which, at the same are downloaded from BRATS 2014, which consists of MRI
time, are proved to help increase the accuracy for brain tumor images of 213 patients. A quantitative comparison between
detection and segmentation. Huang et al. [2] achieved an the best results obtained from different NN and CNN
accuracy of 74.75% in tumor segmentation on the mean structure is done based on sensitivity and specificity. The
Overlapped Volume using subspace feature mapping. kernels learned from CNN are visualized in this paper to
present the features learned by unsupervised learning
procedure.
Yuehao Pan, Zhiping Lin and Wanzheng Zhu are with the School of
EEE, Nanyang Technological University, Singapore; email: ({i110004,
II. RELATED WORKS
ezplin, zhuw0006}@ntu.edu.sg).
A. Convolutional Neural Networks
Weimin Huang, Jiayin Zhou are with the Institute for Infocomm
Research, 1 Fusionopolis Way, #21-01 Connexis, Singapore 138632 The Convolutional Neural Networks (CNN) is a
(+65-6408-2516; fax: +65-64082000; email: ({wmhuang, deep-structured learning model that has improved a lot in the
jzhou}@i2r.a-star.edu.sg). late twentieth century. It imitates the structure of human brain
Jocelyn Wong is with the Department of Diagnostic Imaging, National
University Hospital, Singapore; email: (jocelyn_yl_wong@nuhs.edu.sg).
in perceiving objects. Moreover, researches also show a good
Zhongxiang Ding is with the Department of Radiology, Zhejiang performance on CNN-based algorithms, especially in the
Provincial People's Hospital, Hangzhou, Zhejiang, China; email: field of 2D data classification. For example, LeNet-5, a
(hangzhoudzx73@126.com).

978-1-4244-9270-1/15/$31.00 ©2015 IEEE 699


model with CNN structure trained by MNIST database of our Class one data to 3*170=510. On the other hand, Class
handwritten digits, has a test error rate less than 1% [7]. For two data is selected from the two slices with 8% deviated
that reason, CNN is widely used in the area of image from its center. By this manner, we have 6 images on each
processing. patient. We have 6*25=150 Class two data. Nevertheless, the
training data are still very biased because the Class two data
are only about half of Class one. Also considering the tumor
features are not obvious at the initial phase, when these
tumors are normally graded as low, we come out with a way
to enhance the feature: rotation. Since tumor cells proliferate
randomly to all directions, we rotate each Class two image to
3 different degrees (90 180 270) to enhance the features.
After rotation, we have 3*150=450 examples of low-graded
data. The ratio of two different classes is now 510:450, not
Figure 1. A typical 2-layer CNN structure perfectly even but relatively good enough.

Figure 1 shows a typical structure of Convolutional All the slices are reconstructed in the same size of 60*60
Neural Networks. It comprises of one classifier and two pixels. Also the intensity value of each image is normalized
layers in which convolution and subsampling are performed. between 0 and 1. Among all the data we have, 300 out of 450
The beauty of Convolutional Neuron Networks is that each Class one images are chosen as low-grade training data.
kernel in different layers is learned spontaneously so that no Another 300 Class two images are selected as high-labeled
setting of features is needed before hand. Hence, CNN is training data in order to balance the training set. By this
considered as an unsupervised deep-learning model. Because manner, a total of 600 training data and 360 testing data are
of that, the number of training example becomes critical. The used in the experiment.
MNIST dataset used in Le-Net 5 comprises of 60,000 C. Accuracy Criteria
training examples and 10,000 testing examples.
In order to measure the performance of our model in the
III. BRAIN TUMOR GRADING USING NN AND CNN best scientific way, sensitivity and specificity classification
rule is used in our experiment. In a binary classification test,
A. Input Data Cleaning sensitivity is the ratio of predicted positive data over total
The MRI data are downloaded from BRATS 2014, which true positive data, while specificity is the proportion of
consists of 213 patients’ image samples that can be used for predicted negative data on total real negative data. In this
training. Out of all the training data, only 25 of them labeled experiment, sensitivity is the predicted high-labeled examples
low, the rest, 188 of these training examples are high-labeled over all the high-labeled data in test example. And specificity
data. For each sample from training set, one ground truth is the predicted low-grade examples over all low labeled
image and four MR images are given. The ground truth testing data.
images are labeled as zeros and non-zero values where all the
zeros means normal pixel and non-zeros means the parts of 𝑇𝑟𝑢𝑒  𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 =
tumor cells. 𝑇𝑟𝑢𝑒  𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒  𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
Among all the training data obtained from BRATS 2014, 𝑁𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑  𝐻𝑖𝑔ℎ  𝐺𝑟𝑎𝑑𝑒𝑑  𝐷𝑎𝑡𝑎
we examine 100 of high-labeled tumors to learn the clinical
 = 𝑁𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝐻𝑖𝑔ℎ  𝐺𝑟𝑎𝑑𝑒𝑑  𝐷𝑎𝑡𝑎
(1)
criteria for grading. We remove 18 of the examined data 𝑇𝑟𝑢𝑒  𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒
because of incomplete or impropriate feature disclosure. 𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 =
After deletion, 195 patient data, consisting 170 high-labeled 𝑇𝑟𝑢𝑒  𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒 + 𝐹𝑎𝑙𝑠𝑒  𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒
and 25 low-grade patients’ data, are selected. From all the 𝑁𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑  𝐿𝑜𝑤  𝐺𝑟𝑎𝑑𝑒𝑑  𝐷𝑎𝑡𝑎
selected data, the blocks that contain the whole tumor are = (2)
𝑁𝑢𝑚𝑏𝑒𝑟  𝑜𝑓  𝐿𝑜𝑤  𝐺𝑎𝑑𝑒𝑑  𝐷𝑎𝑡𝑎
extracted. Based on the ground truth image, the starting and
ending plane tangent to each tumor along x, y and z-axes are In the above equation, true positives are the high-labeled
found. The obtained six planes define a unique cuboid in data that are correctly graded and true negatives are the
which the entire tumor is totally fitted. correctly predicted low-grade tumor slices. Whereas, false
positives are the low-graded images that are classified
B. Data Selection wrongly while false negatives are the high-labeled data that
In this way, 195 tumor blocks are obtained. However, it is are classified wrongly.
considered relatively insufficient for a deep learning machine.
Worse over, a major problem that contains potentiality to fail IV. EXPERIMENT RESULTS
CNN is the uneven distribution of training class, which about A. Results Obtained from Neural Network
12.8% of it belongs to low labeled data (25 out of 195).
Before CNN is adopted in the project, testing on input
In order to solve imbalance and insufficient problems of data of Neural Network is done to see the performance on
data, we discriminately select data between high-labeled NN. We selected a 2-layer NN structure and also a 3-layer
(Class one) and low-grade (Class two) tumors. For those with NN structure for this pre-examining. However, the results do
high label, only the image slice in the center of the block, not show a good accuracy on NN with various thresholds.
which normally contains the MRI with the largest tumor
segment, is selected from the three aspects, which expands

700
grading criteria called CE- Heterogeneity, which the shape of
tumor is enhanced so that different shapes of tumor, like
nodules or flower shape, are formed. The last row shows
medium sized tumors. Combining features of the second and
the third rows, a basic criterion of differentiating severe
tumors with banal ones, which is to examine the size of the
Figure 2. Sensitivity and specificity value with different threshold using
cancer, matches what we obtain from the kernels.
2-layer (left) and 3-layer (right) NN In the second layer of CNN, the training images are
As shown in the figure above, after enough times of subsampled by a scale of 2, which means a 2*2 pixel patch is
iterations, 2-layer and 3-layer structured NN gave an examined on each output image after convolution. The
intersection point of 0.55 and 0.5677 separately, which is average value for that 4 pixels are selected to form the output
slightly better than the random guess. However, as we for this CNN layer. Subsampling reduces the size of output
compared two graphs, threshold values near the intersecting image in order to make the machine to learn features in a
point have a better overall performance on 3-layered NN, higher dimension. For this reason, the kernels of the second
that is, if threshold equals to 59, we get a higher specificity layer should more resemble signal filters like Gaussian filter.
(0.6) with relatively good sensitivity (0.55) on 3-layered NN. Some kernels from the second layer of CNN are shown
This result appears that as the learning structure goes deeper, below.
the grading results may be improved. Because of that, we
also tested a 4-layered NN. The intersection value drops
down to 0.5167, which performs worse than both of the
2-layered and 3-layered NN. Moreover, the sensitivity and
specificity do not outperform for other threshold values in
the range.
B. Results Obtained from Convolutional Neural Network
One of the important results obtained from the CNN is
the kernel learned from the unsupervised learning process. It
functions as the noise suppressor as well as the feature
enhancer of the model. By visualizing the kernels of different Figure 4. Visualizations of kernels in the second layer
layers, the properties and features can be seen. In this project,
we adopted different structures of CNN to find the best
performance. Some kernels demonstrate interesting features

Figure 5. Evolution of kernel


The kernel is initialized randomly in small values
between -0.5*10^-5 and 0.5*10^-5 (see the upper-left image
in Figure 5). As the training goes on, a distinctive gap forms
to separate two segments shown in the kernel, which leads to
the final kernel shown in the bottom-right picture in Figure
5. This process is unsupervised, which the kernel is learned
learned from the algorithm. The following graph illustrates by the CNN spontaneously.
some kernels learned by CNN model from different layers. The CNN results below show a maximum 10 % of
Figure 3. Visualizations of kernels in the first layer improvement on accuracy in one layer structure, but the more
complex structure does not suggest better results by any
The visualization of kernels in the first layer shown above means. The sensitivity and specificity VS threshold graph of
indicates several features of the tumor. From the first row of different structure are shown below.
three kernels, a distinct edge of tumor is represented in
different directions. These kernels enhanced the edges of
tumors so that a distinct edge between tumor cells and normal
cells is enhanced. Coincidently, edge detection is actually a
commonly used pre-processing procedure used in other
classification research based on machine vision. The second
row of three kernels displays a small-sized and distributed
tumor cells. This kind of kernels matches with one of the Figure 6.1. Sensitivity and Figure 6.2. Sensitivity and
specificity VS threshold using 25 specificity VS threshold using 16

701
kernels with size 21*21 kernels with size 21*21 Figure 8. Sensitivity and specificity VS threshold using 3-layered
structure, with 16 kernels of size 21*21 in the first layer, 12 kernels of size
11*11 in the second layer and 8 kernels of size 5*5 in the third layer.
Blocker

One of the limitations of this study is that the training


samples, especially the low-grade data, are relatively small.
We have total samples of 190 while only 25 of them belong
to low-grade tumors. For deep learning machine, training
Figure 6.4. Sensitivity and sample size is always the key factor to the learning machine.
Figure 6.3. Sensitivity and
specificity VS threshold using specificity VS threshold using LeNet-5 uses MNIST database that consists of 60,000
2-layered CNN structure with 16 3-layered structure, with 1 kernels of training examples while in the youtube cat experiment,
kernels with size 21*21 in the first size 21*21 in the first layer, 16
kernels of size 11*11 in the second frames from 10 million youtube videos are used as training
layer and 8 kernels with size 11*11
in the second layer. layer and 12 kernels of size 5*5 in examples [6]. Also suggested by Simard et al. [8], the most
the third layer. important practice to obtain a good result for learning system
The CNN with only one layer structure shows the best is getting a training set as large as possible. Nevertheless, in
performance over both sensitivity and specificity with our study, based on the image data, we show that, with
intersected value of 0.6667. The rest of the figures show a certain structure of the convolutional learning network, we
sensitivity intersect with specificity at value of 0.64, 0.5867 can get much improved results than conventional Neural
and 0.6 respectively on 1-layered structure with 25 kernels Network. In our experiments, we increased our data by using
with size 21*21, 1-layered structure with using 41 kernels, multiple slices of the each patient’s 3D volume. Furthermore,
2-layered CNN structure with 16 kernels in the first layer and we added the rotated slices in the training data, which helped
8 kernels with size 11*11 in the second layer, and 3-layered to improve the grading performance.
structure with 1 kernel of size 21*21 in the first layer, 16
kernels of size 11*11 in the second layer and 12 kernels of VI. CONCLUSION
size 5*5 in the third layer.
In this paper, we developed grading processes based on
V. DISCUSSION ON RESULTS CNN deep learning structure on brain tumor classification.
Our experiment results also show that more layers of deep The results show a maximum improvement of 18% on
learning structure may not necessarily improve the grading performance of CNN based on sensitivity and
performance for brain tumor grading. We find that the specificity compared to NN. The visualizations of the
training results at the final convolutional layer, which is the kernels and output results at different layers show that the
layer before the classifier, do not show a clear distinction in tumor feature can be closely resembled by the learned
image for different types of tumors. kernels. However, we also observed that more complex
CNN structure might not outperform the results of simple
structured CNNs.

REFERENCES
[1] B. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, et al.,
“The Multimodal Brain Tumor Image Segmentation Benchmark
(BRATS),” IEEE Trans-actions on Medical Imaging, Institute of
Electrical and Electronics Engineers (IEEE), 2014, pp.33.
[2] W. Huang, Y. Yang, et al., "Random Feature Subspace Ensemble
Based Extreme Learning Machine for Liver Tumor Detection and
Segmentation". International Conference of the IEEE Engineering in
Medicine and Biology Society (EMBC), 2014, Aug 29.
[3] M. Soltaninejad, et al., "Brain Tumour Grading in Different MRI
Protocols using SVM on Statistical Features," Proceedings of the
Figure 7. Three training samples obtained at the end of each layer
MIUA, 2014, Jul.
Figure 7 shows the output of three types of tumor [4] E.I. Zacharaki, et al., "MRI-Based Classification of Brain Tumor Type
passing through different layers. Although different kernels and Grade Using SVM-RFE," Biomedical Imaging: From Nano to
Macro. ISBI '09. IEEE International Symposium, 2009, Jun 28-Jul 1,
are applied into both images, the outputs at the final layer pp1035-1038.
return similar results. In the view of classifier, the three [5] B. Sudha, P.Gopikannan, et al., "Classification of Brain Tumor Grades
tumors are identical to each other, which leads the same using Neural Network," Proceedings of the World Congress on
grading result. In our experiments, the best result is from a Engineering 2014 Vol I, WCE 2014, London, U.K, July 2 – 4.
3-layer structure CNN with certain values of initialization, [6] Q.V. Le, et al., "Building High-level Features Using Large Scale
Unsupervised Learning," Cornel University Library, arXiv:1112.6209,
which improves nearly 18% over the results of the baseline 2011, Dec 29.
Neural Network. [7] Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, "Gradient-based
learning applied to document recognition," Proceedings of the IEEE
86 (11): 1998, Nov, pp2278–2324.
[8] P. Y. Simard, D. Steinkraus, J. C. Platt, "Best Practices for
Convolutional Neural Networks Applied to Visual Document
Analysis". Proceedings of 7th International Conference, Aug 3-6.
2003, pp. 958-963.

702