Masters Thesis

電資學院外國學生專班
碩士學位論文
Automatic Brain Tumor Segmentation with a

3-Dimensional Generative Adversarial
Neural Network.
研究生：Mpendulo Mamba
指導教授：黃有評
中華民國 107 年 1 月
ABSTRACT
Title : Automatic Brain Tumor Segmentation with a 3-Dimensional Generative
Adversarial Neural Network.
Pages: 54
School: National Taipei University of Technology (NTUT)
Department: International Master of Science in Electrical Engineering and Computer
Science (IMEECS)
Time: July 2018
Degree: Master
Author: Mpendulo Mamba
Advisor: Prof. Yo-Ping Huang
Keywords: Deep learning, tumor, magnetic resonance imaging (MRI), generative
adversarial network (GAN), loss function, high grade glioma (HGG), low grade
glioma (LGG)
Brain tumor segmentation is a very crucial task in medical image processing. Early
diagnosis of brain tumors plays an important role in improving treatment possibilities and
increases the survival rate of the patients. Manual segmentation of the brain tumors for
cancer diagnosis, from large amounts of magnetic resonance images (MRI) generated in
clinical routine, is a difficult and time consuming task. There is a need for automatic brain
image segmentation. In this work, we demonstrate a deep neural network for volumetric
segmentation that learns from a series of annotated volumetric images given in the
Neuroimaging Informatics Technology Initiative (NIfTI) format. Recently, automatic
segmentation using deep learning methods proved effective since these methods achieve
state-of-the-art results and can address the problem better than other methods. Deep
learning methods can also enable efficient processing and objective evaluation of the large
i
amounts of MRI-based images. We investigate 3D conditional adversarial networks as a
novel solution to 3D image segmentation for medical segmentation problems. These
networks not only learn the mapping from input images to output images, but also learn
a loss function to train the mapping between them. This makes it possible to apply the
same generic approach to problems that traditionally would require very different loss
formulations. We show that this method is effective at generating slices of segmentation
data from 3D labelled maps. We utilize a dataset from the medical image computing and
computer assisted intervention (MICCAI), which consists of MRI scans of high-grade
gliomas (HGG) which are tumors of the central nervous system and low-grade gliomas
(LGG) which are referred to as slow-growing tumors. The proposed model is able to
discriminate between well segmented and poorly segmented images and the
generative model can create segmentation image masks around the tumors and
achieves an 84.93% dice score when compared with the dataset.
ii
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to all those who supported and helped
me while I was writing this thesis. Special thankfulness to my advisor, Professor Yo-
Ping Huang, for his guidance and support. Professor Yo-Ping Huang is an incredible
mentor through his invaluable academic and professional supervision. I deeply
appreciate him for the useful suggestions, comments, remarks, genuine concern for
my well-being and support through the learning process of this thesis and my overall
stay in the school.
On the financial support for my study at Taipei Tech, I would like to thank the
government of the Kingdom of Eswatini, the Ministry of Education and Royal Science
and Technology Park (RSTP) in Eswatini for providing the master degree scholarship
in Taiwan.
My truthful appreciation to all my colleagues and lab mates for the support,
guidance, inspiration, freedom and a wonderful working environment, Nontobeko,
Basanta, Howard, James, Yu an, Peter, Danny, Eric, Justin, Lungile, Mthunzi,
Trizaurah, Chun Ming, Sakhile, Mluleki, Mduduzi, Lucky, Likhanyiso, Muzi-wandile,
all Eswatini friends, all Taiwanese friends, and foreign friends. Not only it is an honor
to work in such a wonderful place, but also it is such a privilege for me to live here
amongst you guys.
iii
Finally, and most I would love thank to my incredible parents, family, my best
friend Lwazi and my girlfriend Gugu whose inspiration always patiently, morally,
emotionally and financially support me.
Siyabonga Kakhulu!
Contents
ABSTRACT .................................................................................................................. i
ACKNOWLEDGMENTS ........................................................................................... iii
List of Figures ........................................................................................................... vi
List of Tables ............................................................................................................ vii

Chapter 1 Introduction...................................................................................................1
1.1 Background .....................................................................................................1
1.2 Related Work ...................................................................................................5
1.3 Research Objective ..........................................................................................7
1.4 Limitations.......................................................................................................8
1.5 Thesis Development.........................................................................................8
Chapter 2 Literature Review ................................................................................... 10
2.1 Glioma ........................................................................................................... 10
2.2 Magnetic Resonance Imaging (MRI).............................................................. 14
2.3 Deep Learning ............................................................................................... 19
2.3.2 Deep Learning Algorithms ........................................................................20
2.3.3 Convolution Layer ....................................................................................21
2.3.4 Subsampling Layer ................................................................................... 23
iv
2.3.5 Fully Connected Layer.............................................................................. 24
Chapter 3 Proposed Methods ....................................................................................... 25
3.1 Model Architecture ........................................................................................ 25
3.1.1 Pre-Processing Layer ................................................................................ 26
3.1.2 Generator .................................................................................................. 28
3.1.3 Discriminator ............................................................................................31
3.1.1 DCGAN ..................................................................................................... 33
3.2 Evaluation Methods .......................................................................................34
3.2.1 Objective Function .................................................................................... 34
3.2.2 Loss Function ............................................................................................35
3.2.3 Dice Coefficient .........................................................................................36
3.2.4 Perceptual Evaluation ............................................................................... 36
Chapter 4 Experimental Setup and Training ................................................................. 38
4.1 Experimental Setup ........................................................................................ 38
Chapter 5 Results and Discussion ................................................................................ 42
Chapter 6 Conclusions and Future Work ......................................................................51
6.1 Conclusions ................................................................................................... 51
6.2 Future Work ................................................................................................... 51
References................................................................................................................... 52
v
List of Figures
Figure 1. Example of an HGG Tumor ............................................................................2
Figure 2. Example of an LGG Tumor ............................................................................3
Figure 3. Low-grade brain glioma in a 28-year-old male.............................................. 11
Figure 4. Patient being positioned for MR study of the head and abdomen .................. 15
Figure 5. Schematic of construction of a cylindrical superconducting MR scanner. ...... 16
Figure 6. Examples of T1 weighted, T2 weighted and PD weighted MRI scans ........... 18
Figure 7. Restricted Boltzmann Machines....................................................................20
Figure 8. Convolutional Neural Network Architecture .................................................21
Figure 9. Image Representation and Convolutional Matrix .......................................... 22
Figure 10. Example Convolution Calculation ..............................................................23
Figure 11. Example subsampling layer ................................................................. 24
Figure 12. Proposed GAN Architecture ................................................................ 25
Figure 13. Dataset file naming structure ....................................................................26
Figure 14. MRI 2D slices after preprocessing from 3D file ..........................................27
Figure 15. Auto-encoder and U-net Architecture .......................................................... 28
Figure 16. 68 Layer Deep Neural Network Generator .................................................. 29
Figure 17. 6 Layer Deep Neural Network Discriminator ............................................ 31
Figure 18. Deep Conditional GAN (DCGAN) .............................................................33
Figure 19. Different sections of tumor annotated in colors ...........................................37
Figure 20. MRI 2D viewpoints, transverse, sagittal and coronal ................................... 39
Figure 21. FLAIR experiment (a) Training losses (b) Dice Coefficient Training ........ 41
Figure 22. T2 experiment (a) Training losses (b) Dice Coefficient Training. ................ 42
Figure 23. A Sample Result From the FLAIR Experiment ........................................... 43
Figure 24. A Sample Erroneous Result from the T1 Experiment ..................................44
vi
Figure 25. A Sample Brain Structure Mistaken for Tumor............................................ 45
Figure 26. A Sample Complete Miss ............................................................................ 46
Figure 27. Bar Chart Comparison of all Experiments ................................................... 47
List of Tables
Table 1. Standard Display of MRI Images ................................................................... 18
Table 2. Summary of MRI sequences in the dataset ..................................................... 27
Table 3. Generator - Layer Implementation Details ...................................................... 30
Table 4. Discriminator Layer Implementation Details .................................................. 32
Table 5. DCGAN Layer Implementation Details..........................................................34
Table 6. Dataset Training, Testing and Validation Ratio ............................................... 38
Table 7. Average Similarity Scores from Experiments ................................................. 47
Table 8. Human Plausibilty Score ................................................................................48
Table 9. Comparison with Related Studies ................................................................... 48
vii
viii
Chapter 1 Introduction
1.1 Background
Over the last few decades, the rapid development of noninvasive brain imaging
technologies has opened new horizons in analyzing and studying the brain anatomy and
function. Enormous progress in accessing brain injury and exploring brain anatomy has
been made using magnetic resonance imaging (MRI). The advances in brain MR imaging
have also provided large amount of data with an increasingly high level of quality. The
analysis of these large and complex MRI datasets has become a tedious and complex task
for clinicians, who have to manually extract important information. This manual analysis
is often time-consuming and prone to errors due to various inter- or intra-operator
variability studies. These difficulties in brain MRI data analysis required inventions in
computerized methods to improve disease diagnosis and testing. Nowadays,
computerized methods for MR image segmentation, registration, and visualization have
been extensively used to assist doctors in qualitative diagnosis. Brain MRI segmentation
is an essential task in many clinical applications because it influences the outcome of the
entire analysis. This is because different processing steps rely on accurate segmentation
of anatomical regions. For example, MRI segmentation is commonly used for measuring
and visualizing different brain structures, for delineating lesions, for analyzing brain
development, and for image-guided interventions and surgical planning. This diversity of
image processing applications has led to development of various segmentation techniques
of different accuracy and degree of complexity. These techniques can be used to perform
automatic segmentation on patient scans with conditions that are caused by tumors, such
as dementia. Brain tumors can be classified into two types for purposes of this study, high-
grade gliomas (HGG) or low-grade gliomas (LGG). Highly malignant or high-grade
1
gliomas (HGG) are tumors of the central nervous system (CNS). They are solid tumors
arising from transformed cells of the brain and/or the spinal cord. Since they directly
originate from the CNS, they are also called primary CNS tumors, thereby differentiating
them from malignant tumors of other organs that have spread (metastasized) to the CNS.
HGG in children and adolescents are rare. However, they show considerably
malignant behavior since they usually grow fast and frequently destroy healthy brain
tissue. By being able to migrate within the CNS for various centimeters, HGG can induce
the development of new tumors. Without the appropriate therapy, HGG can be lethal
within only a few months. Due to the usually rapid and infiltrating growth of these tumors,
treatment is difficult. High-grade gliomas account for approximately 15 to 20 % of CNS
tumors in children and adolescents. They appear in all age groups; yet, children aged
younger than three years are rarely affected. Each year, about 60 to 80 children and
adolescents younger than 15 years of age are newly diagnosed with a high-grade glioma.
This corresponds to an incidence rate of 5 to 10 new diagnoses per 1.000.000 children per
year. Boys and girls are almost equally affected.
Figure 1. Example of an HGG tumor.
2
Low-grade gliomas are tumors of the CNS as well. They are solid tumors arising from
malignantly transformed cells of the brain or spinal cord. Since they develop directly from
CNS cells, they are also called primary CNS tumors in order to distinguish them from
cancers of other body parts that have spread to the CNS (metastasis).
Low-grade gliomas can be found in all parts of the nervous system, most of them, however,
are situated in the cerebellum and the central regions of the cerebrum, such as the optic
pathway (optic pathway gliomas) and the hypothalamic-pituitary axis. They usually grow
very slowly. Nevertheless, for a growing lesion, the space in the bony skull is limited. As
a consequence, vital areas of the brain may be damaged by the space occupying, growing
tumor. Therefore, low-grade gliomas can become life threatening in the course of the
disease. With a ratio of about 30 to 40 %, low-grade gliomas are the most common CNS
tumors among children and adolescents. They occur at all ages with a mean age at
diagnosis between five and seven years. In Germany, about 250 children and adolescents
under 18 years of age are newly diagnosed with low-grade glioma each year. This
corresponds to an incidence rate of 2 to 3 per 100000 children. There is a slight male
preponderance (gender ratio: 1.2 to 1).
Figure 2. Example of an LGG tumor.
3
Compared to traditional segmentation methods, deep learning does not rely on the
generation of handcrafted features to distinguish tumor from normal brain anatomy.
Instead, raw image intensities are taken as input subjected to many layers of convolutions,
to calculate an output signal. The many degrees of freedom and inclusion of non-
linearities allow the algorithm to learn complex patterns with a high level of abstraction.
Up until this point many of the deep learning algorithms that have been applied to brain
tumor segmentation have been 2D Convolutional Neural Networks (CNNs), which do not
take advantage of the full breadth of volumetric information. Recently, there has been an
increase in popularity of 3D CNNs [22, 23], which have been shown to be effective for
this task, though at the expense of additional computational complexity. For example, the
3D U-Net architecture was successfully applied for the segmentation of the Xenopus
kidney, a complex and highly variable structure [23].
The MICCAI 2018 BraTS Challenge challenges participants to develop a fully
automatic or semi-automatic multi-modality tumor segmentation tool for enhancing, non-
enhancing, and edema in glioblastoma patients [24]. The 2018 BraTs Challenge patient
cohort includes glioblastoma and low-grade glioma pre-operative patients. Participants
are provided with coregistered and skullstripped T2, pre-contrast T1, post-contrast-T1,
and FLAIR images, and are then asked to generate segmentations that can then be
compared against ground-truth segmentations of edema, enhancing tumor, and non-
enhancing tumor. Ground-truth segmentations are manually drawn by one to four raters
and then approved by expert neuro-radiologists. Our proposed segmentation method for
BraTS 2018 involves the training of several 3D generative adversarial network (GAN).
The result is a fully automatic segmentation model requiring no additional data aside from
four coregistered input modalities and a practical computation time for batch processing.
4
1.2 Related Work
A study that explains the use of a fully convolutional neural network (FCNN) for
segmentation of gliomas on Magnetic Resonance Images (MRI) was conducted in this
paper [25]. They were able to achieve a fully automatic voxel based classification by
training a 23 layer deep FCNN on 2-D slices that were obtained from patient volumes.
Their model was trained on slices obtained from 130 patients and was validated on 50
patients. The false positives in segmentation map generated by their FCNN were removed
by connected component analysis. On the MICCAI BraTS 2017 validation dataset, this
model achieved an average whole tumor, tumor core and active dice score of 0.83, 0.69
and 0.69 respectively.
In another work from this paper [ref] they present a solution for brain tumor
segmentation in from MICCAI BraTS 2017 dataset. They used three different
convolutional neural networks with the same 3D U-Net architecture, they were trained
for each of the tumor segmentation targets (whole tumor, tumor core and enhancing tumor)
with 3D patches as inputs. The preprocessing step was done in each case separately, that
allowed for an equalizing histogram on whole tumor and normalizing voxels on all
modalities. This solution led to Dice coefficients of 0.91, 0.9118 and 0.827 after testing
with 30% of the training set and 0.8844, 0.7674 and 0.7261 on the leaderboard validation
set (respectively for each segmentation target) provided by MICCAI for validation.
In another study focused on the competition provided by MICCAI 2017 BraTS
which challenges participants to develop a fully automatic or semi-automatic multi-
modality tumor segmentation tool for enhancing, non-enhancing, and edema in
glioblastoma patients, the team in this paper [ref] entered to the competition with a fully
automatic pipeline that involves chaining together several unique 3D U-Nets, a type of
3D patch-based convolutional neural network. Their pipeline takes advantage of the prior
5
knowledge from previous studies that enhancing and nonenhancing tumors are likely to
be found within regions of edema and within proximity to each other by feeding the
prediction outputs of earlier networks into later networks. They achieved greater context
for this patch-based sampling method by predicting downsampled labels and then
upsampling them using a separate 3D U-Net. They used a fine-tuning network and a
candidate evaluation network to account for tissue border discrepancies and catastrophic
segmentation failure. Early results for an unoptimized version of this pipeline on
validation data with unknown ground truth segmentations had average dice coefficients
of 0.78, 0.67, and 0.68 for whole tumor, enhancing, and non-enhancing tissue respectively.
In another study, they realized that identification and localization of brain tumor
tissues plays an important role in diagnosis and treatment planning of gliomas [26], hence
fully automated superpixel wise tumor tissue segmentation algorithm using random forest
was proposed and implemented in their study. They extracted features for the random
forest classifier by constructing a tensor from multi-parametric MRI data and applying
multi-linear singular value decomposition. The method was trained and tested on high
grade glioma (HGG) patients from the BRATS 2017 training dataset. It achieved a
performance of 83%, 76% and 78% Dice scores for whole tumor, enhancing tumor and
tumor core, respectively.
A cascade of fully convolutional neural networks was proposed to segment multi-
modality MR images with brain tumor into background and three subregions: enhanced
tumor core, whole tumor and tumor core in this study [27]. The cascade was designed to
decompose the multi-class segmentation into a sequence of three binary segmentations
according to the sub region hierarchy. Segmentation of the first (second) step is used as a
binary mask for the second (third) step. Each network consists of multiple layers of
anisotropic and dilated convolution filters that were obtained by training each network
6
end-to-end. Residual connections and multi-scale predictions were employed in these
networks to boost the segmentation performance. Experiments on BRATS 2017 online
validation set predicted average Dice scores of 0.764, 0.897, 0.825 for enhanced tumor
core, whole tumor and tumor core respectively.
A SegNet is a deep encoder-decoder architecture for multi-class pixelwise
segmentation researched and developed by members of the Computer Vision and
Robotics Group at the University of Cambridge, UK. In a study from this work [28], they
developed an automatic segmentation of brain tumor using a SegNet, a method based on
a two-dimensional convolutional neural network and used the HGG data sets (n = 210)
of BraTS 2017 for network training. They compared training schemes including or
excluding slices without labeled tumor regions. The input images were FLAIR images.
From the results, the dice similarity coefficients (dice=0.74, 2-fold cross-validation)
obtained with training data sets excluding slices without labeled tumor region was
significantly higher than those obtained with all slices (P<0.05, paired T- test). In the
preliminary results, they were able to perform fully automated segmentations of whole
tumor region using SegNet.
1.3 Research Objective
Many studies have been discussed in the previous section regarding brain tumor
segmentation methods with their performance scores. With the aim of improving those
performance scores, in this research we proposed the use of the emerging innovation of
generative adversarial networks (GANs), which are a variant architecture of deep neural
networks, which improve their performance by means of a computational battle. Here,
we use our own architecture of the GAN derived from the original GAN paper [19]. Due
to the incredible precision required in medical procedures, we chose the use of a U-Net
architecture for our generative half of the model, which is known to eliminate bottlenecks
7
that bring about loss of data in basic auto-encoders [7]. Moreover, we apply a Markovian
Patch discriminator for the second half our GAN, this allows to feed a whole image slice
as a patch during classification [9]. The purpose of this study is to develop an AI model
for MRI brain tumor segmentation, which can be used by experts in the medical field to
speed up and verify their processes of identifying and segmenting tumors and possibly
diagnose them in their early stages. Also, due to the nature of GANs we expect to obtain
a trained model that can not only generate segmentations but can identify incorrectly
segmented images.
1.4 Limitations
There are several limitations in this research and they are as follows:
1. This research only focuses on brain tumor segmentation based on MRI scans
only.
2. The MRI scans are transposed to one view point, the transverse view only.
3. The research dataset comes annotated such that enhanced tumor core, whole
tumor and tumor core can all be trained upon, we only focus on whole tumor.
4. This research focuses on HGG and LGG tumor types only.
5. This research applies only deep machine learning techniques to try and tackle
the problem.
1.5 Thesis Development

The thesis will lead the reader all the way through the stages of the research
development. The outline of this thesis is organized as follows. Chapter 2 presents
literature review and related technology on the subject of brain tumor segmentation.
Chapter 3 provides the proposed methodology that has been done in this research. The
experimental setup and training of the proposed model is discussed in Chapter 4. Chapter
8
5 covers our results and discussions. As a final point, Chapter 6 covers the conclusions
and suggestions for future improvements.
9
Chapter 2 Literature Review
2.1 Glioma
A glioma is a type of tumor that starts in the glial cells of the brain or the spine.
Gliomas comprise about 30 per cent of all brain tumors and central nervous system tumors,
and 80 per cent of all malignant brain tumors.
2.1.1 Classification
Gliomas are classified by cell type, by grade, and by location.
By type of cell: Gliomas are named according to the specific type of cell with which they
share histological features, but not necessarily from which they originate. The main types
of gliomas are:
 Ependymomas: ependymal cells
 Astrocytoma: astrocytes (glioblastoma multiforme is a malignant astrocytoma
and the most common primary brain tumor among adults).
 Oligodendrogliomas: oligodendrocytes
 Brainstem glioma: develop in the brain stem
 Optic nerve glioma: develop in or around the optic nerve
 Mixed gliomas, such as oligoastrocytomas, contain cells from different types of
glia
By grade: Gliomas are further categorized according to their grade, which is determined
by pathologic evaluation of the tumor. The neuropathological evaluation and diagnostics
of brain tumor specimens is performed according to WHO Classification of Tumors of
the Central Nervous System.
10
Figure 3. Low-grade brain glioma in a 28-year-old male.
 Low-grade gliomas [WHO grade II] are well-differentiated (not anaplastic); these
tend to exhibit benign tendencies and portend a better prognosis for the patient.
However, they have a uniform rate of recurrence and increase in grade over time
so should be classified as malignant.
 High-grade [WHO grade III–IV] gliomas are undifferentiated or anaplastic; these
are malignant and carry a worse prognosis.
Of numerous grading systems in use, the most common is the World Health
Organization (WHO) grading system for astrocytoma, under which tumors are graded
from I (least advanced disease—best prognosis) to IV (most advanced disease—worst
prognosis).
By location: Gliomas can be classified according to whether they are above or below a
membrane in the brain called the tentorium. The tentorium separates
the cerebrum (above) from the cerebellum (below).
 The supratentorial is above the tentorium, in the cerebrum, and mostly found in
adults (70%).
 The infratentorial is below the tentorium, in the cerebellum, and mostly found in
children (70%).
11
 The pontine tumors are located in the pons of the brainstem. The brainstem has
three parts (pons, midbrain, and medulla); the pons controls critical functions such
as breathing, making surgery on these extremely dangerous.
2.1.2 Pathophysiology
High-grade gliomas are highly vascular tumors and have a tendency to infiltrate.
They have extensive areas of necrosis and hypoxia. Often, tumor growth causes a
breakdown of the blood–brain barrier in the vicinity of the tumor. As a rule, high-grade
gliomas almost always grow back even after complete surgical excision, so are commonly
called recurrent cancer of the brain. Conversely, low-grade gliomas grow slowly, often
over many years, and can be followed without treatment unless they grow and cause
symptoms. Several acquired (not inherited) genetic mutations have been found in
gliomas. Tumor suppressor protein 53 (p53) is mutated early in the disease, p53 is the
"guardian of the genome", which, during DNA and cell duplication, makes sure the DNA
is copied correctly and destroys the cell (apoptosis) if the DNA is mutated and cannot be
fixed. When p53 itself is mutated, other mutations can survive. Phosphatase and tensin
homolog (PTEN), another tumor suppressor gene, is itself lost or mutated. Epidermal
growth factor receptor, a growth factor that normally stimulates cells to divide, is
amplified and stimulates cells to divide too much. Together, these mutations lead to cells
dividing uncontrollably, a hallmark of cancer. Recently, mutations
in IDH1 and IDH2 were found to be part of the mechanism and associated with a more
favorable prognosis.
2.1.3 Prognosis
Gliomas are rarely curable. The prognosis for patients with high-grade gliomas is
12
generally poor, and is especially so for older patients. Of 10,000 Americans diagnosed
each year with malignant gliomas, about half are alive one year after diagnosis, and 25%
after two years. Those with anaplastic astrocytoma survive about three years.
Glioblastoma multiforme has a worse prognosis with less than a 12-month average
survival after diagnosis, though this has extended to 14 months with more recent
treatments.
Low grade: For low-grade tumors, the prognosis is somewhat more optimistic. Patients
diagnosed with a low-grade glioma are 17 times as likely to die as matched patients in the
general population. The age-standardized 10-year relative survival rate was 47%. One
study reported that low-grade oligodendroglioma patients have a median survival of 11.6
years; another reported a median survival of 16.7 years.
High grade: This group comprises anaplastic astrocytomas and glioblastoma multiforme.
Whereas the median overall survival of anaplastic (WHO grade III) gliomas is
approximately 3 years, glioblastoma multiforme has a poor median overall survival of c.
15 months.
Diffuse intrinsic pontine glioma: Diffuse intrinsic pontine glioma primarily affects
children, usually between the ages of 5 and 7. The median survival time with DIPG is
under 12 months. Surgery to attempt tumour removal is usually not possible or advisable
for DIPG. By their very nature, these tumours invade diffusely throughout the brain stem,
growing between normal nerve cells. Aggressive surgery would cause severe damage to
neural structures vital for arm and leg movement, eye movement, swallowing, breathing,
and even consciousness. Trials of drug candidates have been unsuccessful. The disease is
primarily treated with radiation therapy alone.
IDH1 and IDH2-mutated glioma: Patients with glioma carrying mutations in
either IDH1 or IDH2 have a relatively favorable survival, compared with patients with
13
glioma with wild-type IDH1/2 genes. In WHO grade III glioma, IDH1/2-mutated glioma
have a median prognosis of approximately 3.5 years, whereas IDH1/2 wild-type glioma
perform poor with a median overall survival of c. 1.5 years. In glioblastoma, the
difference is larger. There, IDH1/2 wild-type glioblastoma have a median overall survival
of 1 year, whereas IDH1/2-mutated glioblastoma have a median overall survival of more
than 3 years.
2.2 Magnetic Resonance Imaging (MRI)

Magnetic resonance imaging (MRI) is a safe and painless test that uses a magnetic
field and radio waves to produce detailed pictures of the body's organs and structures. An
MRI differs from a CAT scan (also called a CT scan or a computed axial tomography
scan) because it doesn't use radiation. An MRI scanner consists of a large doughnut-
shaped magnet that often has a tunnel in the center. Patients are placed on a table that
slides into the tunnel. Some centers have open MRI scanners that have larger openings
and are helpful for patients with claustrophobia. MRI scanners are located in hospitals
and radiology centers. During the examination, radio waves manipulate the magnetic
position of the atoms of the body, which are picked up by a powerful antenna and sent to
a computer. The computer performs millions of calculations, resulting in clear, cross-
sectional black-and-white images of the body. These images can be converted into three-
dimensional (3-D) pictures of the scanned area. These images help to pinpoint problems
in the body.
2.2.1 Why it is done
MRI is used to detect a variety of conditions, including problems of the brain, spinal cord,
skeleton, chest, lungs, abdomen, pelvis, wrists, hands, ankles, and feet. In some cases, it
14
can provide clear images of body parts that can't be seen as well with an X-ray, CAT scan,
or ultrasound. MRI is particularly valuable for diagnosing problems with the eyes, ears,
heart, and circulatory system.
An MRI's ability to highlight contrasts in soft tissue makes it useful in deciphering
problems with joints, cartilage, ligaments, and tendons. MRI can also be used to identify
infections and inflammatory conditions or to rule out problems such as tumors.
Figure 4. Patient being positioned for MR study of the head and abdomen
2.2.2 Construction and Physics
To perform a study, the person is positioned within an MRI scanner that forms a strong
magnetic field around the area to be imaged. In most medical applications, protons
(hydrogen atoms) in tissues containing water molecules create a signal that is processed
to form an image of the body. First, energy from an oscillating magnetic field
temporarily is applied to the patient at the appropriate resonance frequency. The excited
hydrogen atoms emit a radio frequency signal, which is measured by a receiving coil.
15
The radio signal may be made to encode position information by varying the main
magnetic field using gradient coils. As these coils are rapidly switched on and off they
create the characteristic repetitive noise of an MRI scan. The contrast between different
tissues is determined by the rate at which excited atoms return to the equilibrium state.
Exogenous contrast agents may be given to the person to make the image clearer. The
major components of an MRI scanner are the main magnet, which polarizes the sample,
the shim coils for correcting shifts in the homogeneity of the main magnetic field, the
gradient system which is used to localize the MR signal and the RF system, which
excites the sample and detects the resulting NMR signal. The whole system is controlled
by one or more computers.
Figure 5. Schematic of construction of a cylindrical superconducting MR scanner.
16
MRI requires a magnetic field that is both strong and uniform. The field strength of the
magnet is measured in teslas – and while the majority of systems operate at 1.5 T,
commercial systems are available between 0.2 and 7 T. Most clinical magnets are
superconducting magnets, which require liquid helium. Lower field strengths can be
achieved with permanent magnets, which are often used in "open" MRI scanners for
claustrophobic patients. Recently, MRI has been demonstrated also at ultra-low fields,
i.e., in the microtesla-to-millitesla range, where sufficient signal quality is made
possible by prepolarization (on the order of 10-100 mT) and by measuring the Larmor
precession fields at about 100 microtesla with highly sensitive superconducting
quantum interference devices (SQUIDs).
2.2.3 T1 and T2
Each tissue returns to its equilibrium state after excitation by the independent relaxation
processes of T1 (spin-lattice; that is, magnetization in the same direction as the static
magnetic field) and T2 (spin-spin; transverse to the static magnetic field). To create a T1-
weighted image, magnetization is allowed to recover before measuring the MR signal by
changing the repetition time (TR). This image weighting is useful for assessing the
cerebral cortex, identifying fatty tissue, characterizing focal liver lesions and in general
for obtaining morphological information, as well as for post-contrast imaging. To create
a T2-weighted image, magnetization is allowed to decay before measuring the MR signal
by changing the echo time (TE). This image weighting is useful for detecting edema and
inflammation, revealing white matter lesions and assessing zonal anatomy in the prostate
and uterus.
17
Figure 6. Examples of T1 weighted, T2 weighted and PD weighted MRI scans.
The standard display of MRI images is to represent fluid characteristics in black and white
images, where different tissues turn out as follows:
Table 1. Standard Display of MRI images.
Signal T1-weighted T2-weighted

High  Fat  More water content, as in
 Subacute hemorrhage edema, tumor
 Melanin  Extracellularly located
 Protein-rich fluid methemoglobin in subacute
 Slowly flowing blood hemorrhage
 Paramagnetic substances,
such as gadolinium
 Cortical pseudolaminar
necrosis
Intermediate Gray matter darker than white White matter darker than grey
matter matter
Low  Bone  Bone
 Urine  Air
 CSF  Fat
 Air  Low proton density, as in
 More water content, as in calcification and fibrosis
edema, tumor  Paramagnetic material, such
 Low proton density as in as deoxyhemoglobin
calcification  Protein-rich fluid
18
2.3 Deep Learning
Deep learning is a subfield of machine learning, which aims to learn a hierarchy of
features from input data. Nowadays, researchers have intensively investigated deep
learning algorithms for solving challenging problems in many areas such
as image classification, speech recognition, signal processing, and natural language
processing.
2.3.1 Deep Learning Methods
Deep learning methods are a group of machine learning methods that can
learn features hierarchically from lower level to higher level by building
a deep architecture. The deep learning methods have the ability to automatically
learn features at multiple levels, which makes the system be able to learn complex
mapping function directly from data, without the help of the human-crafted features.
The most characterizing feature of deep learning methods is that their models all have
deep architectures. A deep architecture means it has multiple hidden
layers in the network. In contrast, a shallow architecture has only a few hidden
layers (1 to 2 layers). Deep Convolutional Neural networks are successfully applied in
various areas. Regression, Classification, dimensionality reduction, modeling motion,
modeling textures, information retrieval, natural language processing, robotics, fault
diagnosis, and road crack detection.
19
2.3.2 Deep Learning Algorithms
Deep learning algorithms have been extensively studied in recent years. As a
consequence, there are a large number of related approaches. Generally speaking, these
algorithms can be grouped into two categories based on their architectures:
1. Restricted Boltzmann machines (RBMs).
2. Convolutional neural networks (CNNs).
Restricted Boltzmann machines (RBMs): RBM is an energy-based probabilistic
generative model. It is composed of one layer of visible units and one layer of hidden
units. The visible units represent the input vector of a data sample and the hidden units
represent features that are abstracted from the visible units. Every visible unit is connected
to every hidden unit, whereas no connection exists within the visible layer or hidden layer.
Figure 7 illustrates the graphical model of restricted Boltzmann machine.
Figure 7. Restricted Boltzmann Machines.
Convolutional neural networks (CNNs): During the last seven years, the quality of
image classification and object detection has been dramatically improved due to the deep
20
learning method. Convolutional neural networks (CNNs) brought a revolution in the
computer vision area. It not only have been continuously advancing the image
classification accuracy, but also play an important role for generic feature extraction such
as scene classification, object detection, semantic segmentation, image retrieval, and
image caption. Convolutional neural network (CNNs) is one of the most
powerful classes of deep neural networks in image processing tasks. It is
highly effective and commonly used in computer vision applications. The convolution
neural network contains three types of layers: convolution layers, subsampling layers, and
full connection layers. The whole architecture of the convolutional neural network is
shown in Figure 8. A brief introduction to each type of layer is provided in the following
paragraphs.
Figure 8. Convolutional Neural Network Architecture.
2.3.3 Convolution Layer
As Figure 8 shows, in convolution layer, the left matrix is the input, which is a
digital image, and the right matrix is a convolution matrix. The convolution layer takes
21
the convolution of the input image with the convolution matrix and generates the output
image.
Figure 9. Digital Image Representation and Convolutional Matrix.
Usually, the convolution matrix is called filter and the output image is called filter
response or filter map. An example of convolution calculation is demonstrated in Figure
9. Each time, a block of pixels is convoluted with a filter and generates a pixel in a new
image.
22
Figure 10. Example convolution calculation.
2.3.4 Subsampling Layer
The subsampling layer is an important layer to the convolutional neural network.
This layer is mainly to reduce the input image size in order to
give the neural network more invariance and robustness. The most used method
for subsampling layer in image processing tasks is max pooling. So the subsampling layer
is frequently called max pooling layer. The max pooling method is shown in Figure 10.
The image is divided into blocks and the maximum value of each block is the
corresponding pixel value of the output image. The reason to use
subsampling layer is as follows. First, the subsampling layer has fewer parameters and
it is faster to train. Second, a subsampling layer makes convolution layer tolerate
translation and rotation among the input pattern.
23
Figure 11. Example subsampling layer.
2.3.5 Fully Connected Layer
Full connection layers are similar to the traditional feed-forward neural layer, see
Figure 11. They make the neural network fed forward into vectors with a predefined
length. We could fit the vector into certain categories or take it as a representation vector
for further processing.
24
Chapter 3 Proposed Methods
3.1 Model Architecture
The model we have designed for the task of autonomous brain segmentation has at its
core the basic design of a conditional generative adversarial network. Conditional GANs
have the structure illustrated in the Figure 12 below:
Figure 12. Proposed GAN Architecture.
Our model builds off of the GAN architecture in a pretty different way. In the GAN
structure we have our generator G and a discriminator D, which are trained in as
adversarial manner. The generator is trained to create realistic image masks from a noise
input z, and the discriminator is trained to differentiate between the real image mask x and
those produced by the generator G(x). Using the feedback from the discriminator, our
generator can improve its ability to produce images so as to trick the discriminator into
classifying them as real in the future. Doing so produces more accurate image masks for
segmentation.
25
The most obvious change when going from a traditional GAN to our model is that instead
of a noise vector z, the generator is fed an actual image x, which we want to convert into
another structurally similar image mask y. Our generator should now produce G(x), which
we want to eventually be indistinguishable from y.
In addition to the traditional GAN losses, we also apply an L1 loss, which is just a pixel-
wise absolute value loss on the generated image masks. In this situation, we force the
generator to approximate G(x) = y with the additional loss:
𝐿1 = |𝐺(𝑥) − 𝑦|.
In a traditional GAN we would never apply such a loss because it would prevent the
generator from producing new images. In the case of image translation however, we care
about precise image translations rather than new ones. This need for precise images is
also why we don’t entirely throw away the GAN aspect of our network. An L1 loss by
itself would produce blurry or washed-out images by virtue of attempting to generate
images that are “on average” correct. By keeping the GAN losses, we encourage the
network to produce crisp images that are visually indistinguishable from the real ones.
3.1.1 Pre-Processing Layer
The files are organized as shown in Figure 13 below from the file system.
Figure 13. Dataset file naming structure.
26
Table 2 below outlines differences in the file types in the dataset.
Table 2. Summary of MRI sequences in the dataset.
TR(msec) TE(msec)
T1-Weighted 500 14
(short TR and TE)
T2-Weighted 4000 90
(long TR and TE)
Flair 9000 114
(very long TR and TE)
Figure 14. MRI 2D slices after preprocessing from 3D file.
27
3.1.2 Generator
While trying to ensure accurate images, the third addition to our model is the
implementation of a U-Net architecture in the generator. Put simply, the U-Net is an auto-
encoder in which the outputs from the encoder-half of the network are concatenated with
their mirrored counterparts in the decoder-half of the network. By including some of these
skip connections, we prevent the middle part of the network from becoming a bottleneck
on the nature of information flow.
Figure 15. Auto-encoder and U-net Architecture.
In the case of our model, our input is the image x that we want to convert and the output
G(x) is the image we want it to become. By concatenating mirrored layers, we are able to
ensure that the structure of the original image is passed over to the decoder-half of the
network directly. When thinking about the task of segmentation, the representations
learned at each scale of the encoder are extremely useful for the decoder in terms of
providing the structure of the segmented image mask.
So taking all the points mentioned above about the design of our generator model, we
came up with a 68 layer deep neural network implementation, which we can quickly go
over by looking at the Figure 16 below:

28
Figure 16. 68 Layer Deep Neural Network Generator.
Due to the complexity of the model, we have color coded the layers and provided legends
on the bottom right of the image for reference. Table 3 is also provided below with
implementation details of the layers, please note that the table below is not complete as
this is a summary of the model, only the first 10 and the last layer are shown.
Table 3. Generator - Layer Implementation Details.
Layer Layer Name Layer Type Input Shape Output Shape
1 Unet_input Input Layer (n, 1, 256, 256) (n, 1, 256, 256)
2 Convolution_1 Convolution Layer (n, 1, 256, 256) (n, 64, 128, 128)
3 Leaky_Relu_1 Leaky Relu (n, 64, 128, 128) (n, 64, 128, 128)
29
5 Batch_Norm_2 Batch Normalization (n, 128, 64, 64) (n, 128, 64, 64)
8 Batch_Norm_3 Batch Normalization (n, 256, 32, 32) (n, 256, 32, 32)
A defining feature of image-to-image translation problems is that they map a high
resolution input grid to a high resolution output grid. In addition, for the problems we
consider, the input and output differ in surface appearance, but both are renderings of the
same underlying structure. Therefore, structure in the input is roughly aligned with
structure in the output. We design the generator architecture around these considerations.
Many previous solutions [1, 2, 3, 4, 5] to problems in this area have used an encoder-
decoder network [6]. In such a network, the input is passed through a series of layers that
progressively downsample, until a bottleneck layer, at which point the process is reversed.
Such a network requires that all information flow pass through all the layers, including
the bottleneck. For many image translation problems, there is a great deal of low-level
information shared between the input and output, and it would be desirable to shuttle this
information directly across the net. For example, in the case of image colorization, the
input and output share the location of prominent edges.
To give the generator a means to circumvent the bottleneck for information like this, we
add skip connections, following the general shape of a “U-Net” [7]. Specifically, we add
skip connections between each layer 𝑖 and layer 𝑛 − 𝑖, where n is the total number of
layers. Each skip connection simply concatenates all channels at layer i with those at layer
30
𝑛 − 𝑖.
3.1.3 Discriminator
In our model, we utilize a different king of discriminator architecture known as a
PatchGAN. The concept behind the PatchGAN is that instead of outputting a single
discriminator score for the whole image, a set of separate scores are produced for every
patch of the image, and then an average of the scores is taken to produce a final score.
This approach is known to improve performance by relying on fewer parameters, and a
precisely tuned patch size can yield improved image quality. A PatchGAN with a patch
size configured to be the same as the image size is therefore equivalent to a traditional
discriminator architecture. For the sake of simplicity on our work, our model’s
implementation utilizes a traditional discriminator, or a PatchGAN over the entire image.
Figure 17. 6 Layer Deep Neural Network Discriminator.
31
Table 4 below, lists the input shape details of the discriminator part of our proposed model:
Table 4. Discriminator Layer Implementation Details.

1 Patch_gan_input Input Layer (n, 1, 256, 256) (n, 1, 256, 256)
2 Patch_gan Model (n, 1, 256, 256) [(n, 2), (n, 512)]
3 Dense_1 Dense (n, 512) (n, 500)
4 Reshape_1 Reshape (n, 500) (n, 100, 5)
5 Lambda_1 Lambda (n, 100, 5) (n, 100)
6 Merge_8 Merge [(n, 2), (n, 100)] (n, 102)
7 Disc_output Dense (n, 102) (none, 2)
It is well known that the L2 loss and L1 produces blurry results on image generation
problems [8]. Although these losses fail to encourage high frequency crispness, in many
cases they nonetheless accurately capture the low frequencies. For problems where this
is the case, we do not need an entirely new framework to enforce correctness at the low
frequencies. L1 will already do. This motivates restricting the GAN discriminator to only
model high-frequency structure, relying on an L1 term to force low-frequency correctness
(Eqn. 4). In order to model high-frequencies, it is sufficient to restrict our attention to the
structure in local image patches. Therefore, we design a discriminator architecture –
which we term a PatchGAN – that only penalizes structure at the scale of patches. This
discriminator tries to classify if each N _ N patch in an image is real or fake. We run this
discriminator convolutationally across the image, averaging all responses to provide the
ultimate output of D.
In Section 4.4, we demonstrate that N can be much smaller than the full size of the image
and still produce high quality results. This is advantageous because a smaller PatchGAN
has fewer parameters, runs faster, and can be applied on arbitrarily large images.
Such a discriminator effectively models the image as a Markov random field, assuming
32
independence between pixels separated by more than a patch diameter. This connection
was previously explored in [9], and is also the common assumption in models of texture
[10, 11] and style [12, 13, 14, 15]. Our PatchGAN can therefore be understood as a form
of texture/style loss.
3.1.1 DCGAN
GANs are a type of generative model that learns a mapping from a random noise vector
z to an output image y, G: z  y. On the other hand, conditional GANs learn a mapping
from an observed image x and a random noise vector z to y, G: {x, z}  y. The generator
G is trained to produce outputs that cannot be distinguished from “real” images by an
adversarial trained discriminator, D, which is trained to do well at possibly detecting the
generator’s “fakes”. This training procedure is shown in Figure 18.
Figure 18. Deep Conditional GAN (DCGAN).
Table 5 outlines the implementation details of the input and output fields of the whole
33
deep conditional generative adversarial network.
Table 5. DCGAN Layer Implementation Details.

1 DCGAN_input Input Layer (n, 1, 256, 256) (n, 1, 256, 256)
2 Unet_gen Model (n, 1, 256, 256) (n, 1, 256, 256)
3 Lambda_2 Lambda (n, 1, 256, 256) (n, 1, 256, 256)
4 Disc_nn Model (n, 1, 256, 256) (n, 2)
We adapt our generator and discriminator architectures from those in [16]. Both generator
and discriminator use modules of the form convolution-BatchNorm-ReLu [17].
3.2 Evaluation Methods

To test and evaluate the performance of our model we utilize a couple of methods. One
for checking if we have reached an equilibrium point in the training and the other for
checking the performance of our model against previous studies as mentioned in the
related work section.
3.2.1 Objective Function
The objective of a conditional GAN can be expressed as
𝑂 (𝐺, 𝐷 ) = 𝐸 , [𝑙𝑜𝑔𝐷(𝑥, 𝑦)] + 𝐸 , [log(1 − 𝐷(𝑥, 𝐺(𝑥, 𝑧)]. (1)
Where 𝐺 tries to minimize this objective against an adversarial D that tries to maximize
it, i.e. 𝐺 ∗ = 𝑎𝑟𝑔 min max 𝑂 (𝐺, 𝐷)
To test the importance of conditioning the discriminator, we also compare to an
unconditional variant in which the discriminator does not observe 𝑥
𝑂 (𝐺, 𝐷 ) = 𝐸 [log(𝐷(𝑦)] + 𝐸 , [log(1 − 𝐷(𝐺(𝑥, 𝑧)]. (2)
Previous approaches have found it beneficial to mix the GAN objective with a more
traditional loss, such as L2 distance [1]. The discriminator’s job remains unchanged, but
34
the generator is tasked to not only fool the discriminator but also to be near the ground
truth output in an L2 sense. We also explore this option, using L1 distance rather than L2
as L1 encourages less blurring:
𝑂 (𝐺 ) = 𝐸 , , [|𝑦 − 𝐺 (𝑥, 𝑧)| ]. (3)
Our final objective is
𝐺 ∗ = 𝑎𝑟𝑔 min max 𝑂 (𝐺, 𝐷 ) + 𝜆𝑂 (𝐺). (4)
Without z, the net could still learn a mapping from x to y, but would produce deterministic
outputs, and therefore fail to match any distribution other than a delta function.
Past conditional GANs have acknowledged this and provided Gaussian noise z as an input
to the generator, in addition to x (e.g., [2]). In initial experiments, we did not find this
strategy effective – the generator simply learned to ignore the noise – which is consistent
with Mathieu et al. [18]. Instead, for our final models, we provide noise only in the form
of dropout, applied on several layers of our generator at both training and test time.
Despite the dropout noise, we observe only minor stochasticity in the output of our nets.
Designing conditional GANs that produce highly stochastic output, and thereby capture
the full entropy of the conditional distributions they model, is an important question left
open by the present work.
3.2.2 Loss Function
To explain our choice of the loss function, we need to look into two types of loss functions
that are suitable for this kind of problem, namely the L1 and L2 loss.
L1 loss function is used to minimize the error which is the sum of all the absolute
differences between the true value and predicted value:
35
𝐿1 = |𝑦 −𝑦 |
L2 loss function is used to minimize the error which is the sum of all the squared
differences between the true value and the predicted value:
𝐿2 = 𝑦 −𝑦
3.2.3 Dice Coefficient
The Dice coefficient (also known as Dice similarity index) is the same as the F1 score,
but it's not the same as accuracy. The main difference might be the fact that accuracy takes
into account true negatives while Dice coefficient and many other measures just handle
true negatives as uninteresting defaults, the dice coefficient is given by:

2|𝑋 ∩ 𝑌|
𝐷𝑆𝐶 =
|𝑋| + |𝑌|
When applied to Boolean data, using the definition of true positive (TP), false positive
(FP), and false negative (FN), it can be written as:

2𝑇𝑃
𝐷𝑆𝐶 =
2𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
To evaluate the performance of our model and validate it’s capability to generate
segmentations, create an algorithm to compute the dice score of the between 2 images
and return a percentage signifying the similarity of the two.
Dice coefficient as a floating range between 0 and 1.
o Maximum similarity = 1
o No similarity = 0
3.2.4 Perceptual Evaluation
The method of perceptual evaluation, basically implores a couple of human subjects to
36
give some feedback on what they perception is about the results. So they can issue a
human plausibility score, which is an approximation in terms of percentage.
37
Chapter 4 Experimental Setup and Training
4.1 Experimental Setup

To perform experiments in this work, we utilized the python programming language
which was setup on a linux Ubuntu distribution. The main libraries used were TensorFlow
and Keras. The machine was equipped with two NVIDEA 1080i GPUs.
Figure 19. Different sections of tumor annotated in colors.
Our dataset was obtained from the Medical Image Computing and Computer Assisted
Intervention Society (“The MICCAI Society”).
The datasets used in this work have been updated, since 2016, with more routine
clinically-acquired 3T multimodal MRI scans and all the ground truth labels have been
manually-revised by expert board-certified neuroradiologists.
Ample multi-institutional routine clinically-acquired pre-operative multimodal MRI
scans of glioblastoma (GBM/HGG) and lower grade glioma (LGG), 840 and 300 patients
respectively, with pathologically confirmed diagnosis and available OS, are provided as
the training, validation and testing data. These multimodal scans describe a) native (T1)
and b) post-contrast T1-weighted (T1Gd), c) T2-weighted (T2), and d) T2 Fluid
Attenuated Inversion Recovery (FLAIR) volumes, and were acquired with different
clinical protocols and various scanners from about 19 institutions, mentioned as data
38
contributors in the MICCAI website. All the imaging datasets have been segmented
manually, by one to four raters, following the same annotation protocol, and their
annotations were approved by experienced neuro-radiologists. Annotations comprise the
GD-enhancing tumor, the peritumoral edema, and the necrotic and non-enhancing tumor
core, as described in the BraTS reference paper, see Figure 19 above, published in IEEE
Transactions for Medical Imaging. The provided data are distributed after their pre-
processing, i.e. co-registered to the same anatomical template, interpolated to the same
resolution (1 𝑚𝑚 ) and skull-stripped.
To test the performance of our model and design, we perform four experiments based on
the nature of our dataset. To understand the setup, see Table 6, Figure 20 below and Table
1:
Table 6. Dataset Training, Testing and Validation Ratio.
Type Total Training Testing Validation

(50%) (30%) (20%)
HGG 840 420 252 168
LGG 300 150 90 60
Total 1140 570 342 228
Table 6 above summarizes the setup of our training data, we split the available dataset
into 50% for training, 30% for testing and optimization and finally 20% for validation on
unseen data so that we can test how well our model performs against new data.
39
The test set and cross validation set have different purposes. If we drop either one, we
lose its benefits:
 The cross validation set is used to help detect over-fitting and to assist in hyper-
parameter search such as the learning rate.
 The test set is used to measure the performance of the model.
We cannot use the cross validation set to measure performance of our model accurately,
because we would deliberately tune our results to get the best possible metric, over maybe
hundreds of variations of our parameters. The cross validation result is therefore likely to
be too optimistic. For the same reason, we cannot drop the cross validation set and use
the test set for selecting hyper parameters, because then we are pretty much guaranteed
to be overestimating how good our model is. In the ideal world we use the test set just
once, or use it in a neutral fashion to compare different experiments.
If we cross validate, find the best model, and then add in the test data to train, it
is possible (and in some situations perhaps quite likely) our model would be improved.
However, we have no way to be sure whether that has actually happened, and even if it
has, we do not have any unbiased estimate of what the new performance is.
As mentioned above, we perform four experiments, 1) All image types 2) T1 and T1c
image types 3) T2 image types, and finally 4) FLAIR image types. We transpose the 3D
image slices information coming from the MRI scans into 3 basic viewpoints, transverse
view, sagittal view and coronal view, see Figure 20 below. We perform our experiments
on only the transverse view.
40
Figure 20. MRI 2D viewpoints, transverse, sagittal and coronal.
To perform optimization on our neural network, we follow the standard approach from
[19]: where we alternate between one gradient descent step on the discriminator D, then
one step on the generator G. As it was suggested in the original GAN paper, rather than
training G to minimize 𝑙𝑜𝑔(1 − 𝐷(𝑥, 𝐺𝑥, 𝑧)) we instead train to maximize
log 𝐷 𝑥, 𝐺 (𝑥, 𝑧) [19]. Added to that, we divide the objective by 2 while optimizing D,
which should slow down the rate at which D learns with respect to G. We utilize minibatch
stochastic gradient descent (SGD) and apply the Adam solver [20], we set the learning
rate to 0.0002, and also the momentum parameters 𝛽 = 0.5, 𝛽 = 0.999. During the
inference period, we run the generator network in precisely the same manner as during
the training phase. This is different from the usual protocol in that dropout is applied at
during testing, and then we apply batch normalization [17] using the statistics of the test
batch, rather than aggregated statistics of the training batch. This kind of approach to
batch normalization, when the batch size is set to 1, has been given the term “instance
normalization” and has been demonstrated to be very effective in image generation tasks
[21]. In our experiments, we use batch sizes between 1 and 5 depending on the experiment.
We also add a data generator to perform image augmentations, such that our inputs and
targets are rotated at angles of 90°, 180° 𝑎𝑛𝑑 270° about the origin. Augmentations
also include flipping the images in a top to bottom and right to left manner.
41
Chapter 5 Results and Discussion
Figure 21 shows the training curves for the four different experiment setups
configurations. The performance seems to differ based on the image type used in the
experiment. We explore the possible causes of this difference in Dice performance in the
results that follow below.
(a)
(b)
42
Figure 21. FLAIR experiment (a) Training losses (b) Dice Coefficient Training.
As it can be seen from the graphs on Figure 21. Above, training, testing and validation
losses decrease over time.
(a)
43
(b)
Figure 22. T2 experiment (a) Training losses (b) Dice Coefficient Training.
Consider Figure 23 below, it shows a sample result after a prediction or sampling

operation was run of the FLAIR model:
Figure 23. A Sample Result From the FLAIR Experiment.
44
The FLAIR image type experiment is the one that gives the highest similarity score (Dice
coefficient) of 84.93%, as can be been seen from Figure 23 above, the model correctly
identifies the location of the tumor and outlines its boundaries very closely to what an
expert would have dove. We believe the reason the FLAIR image type experiment out
performs the others, is due to the tumor pixels being much brighter with respect to the
rest of the brain.
Consider Figure 24 below, it outlines the erroneous results from the other
experiments, such as T1 image types. The tumor may be correctly identified as you
progress through the slices, however due to the fact that the tumor is almost as bright as
the rest of the brain, some parts of the brain may be returned as part of the tumor, which
degrades the similarity score severely.
Figure 24. A Sample Erroneous Result from the T1 Experiment.
On other cases, some parts of the brain that show up just as bright as the tumor, tend to
45
make it to the segmentation result, this is because, in this work we have not built in a
mechanism for the model to detect brain parts and ignore them if they are not infected.
Figure 25 below demonstrates this issue from sampling.
Figure 25. A Sample Brain Structure Mistaken for Tumor.
In some other cases, a complete miss occurs and a tumor is predicted yet there isn’t one,
or it is not detected at all even when it is present. Figure 26 shows the former and latter
respectively.
46
(a) (b)
Figure 26. A Sample Complete Miss.
It can be seen that for case a) the predicted availability of a tumor is due to the whole
brain structure being bright, yet we have established that our model seems to have learned
that, the somewhat bright structures in the images are possible tumors. For the second
case b) the whole brain image seems to be completely dark with an absence of any
outstanding brighter pixels, hence no tumor is predicted even though according to experts
there should be something there.
Table 7 below, displays the individual average Dice scores for the all four experiments.
47
Table 7. Average Similarity Scores from Experiments.
It can be seen from the results that the FLAIR images have the best similarity score and
combining all the image types, greatly reduces the performance of the model. The bar
chart below outlines a better visual comparison of the experiments and the overall average
score.
Figure 27. Bar Chart Comparison of all Experiments.
Following that, we then try to see what actual human beings think of the experiment, so
ten participants from our lab were picked and given one of the image results after
sampling on the model. They were asked to give an approximate score of how similar
they think the segmentations produced by the model are compared to the ones created by
the experts. Table 8 below displays their responses.
48
Table 8. Human Plausibilty Score.
The average of this test is approximately 7% below the average of the Dice score, and the
highest score on this test is approximately 1% below the highest score on the Dice score.
We finally perform a comparison between the related works we mentioned earlier in this
document, to see how our own work compares with similar studies. The results are shown
in Table 9 below.
Table 9. Comparison with Related Studies.
49
We decided to pin our best model result from the FLAIR image types experiment. It
follows that our model ranks number 2 when compared with these similar studies. It worth
pointing out that these comparisons are on the whole tumor basis. On identifying things
like tumor core, the other studies performed really well, but our work was not focused on
such, it was all about the whole tumor, hence we make the comparison on that.
50
Chapter 6 Conclusions and Future Work
6.1 Conclusions
The proposed model is found to be capable of generating the image segmentation
masks given a 3D MRI image. The accuracy or similarity score at the current level only
good enough to identify the position of a tumor but it can not be used in an actual medical
environment, since very high precision is required when it to patient’s well being. It seems
using all MRI sequences types to train the model at once greatly diminishes the similarity
score of the generated image maks, hence it it better to tackle such a problem focusing on
one image type at a time. The FLAIR MRI sequence image type is the one well suited to
identify and segment brain tumors using our methodology.
6.2 Future Work

The current model can be improved by separating the HGG and LGG datasets and
training for longer epochs. Further using the Dice coefficient as the loss function would
probably be an advantage over the L1 loss.
51
References
[1] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A.Efros. Context
encoders: Feature learning by inpainting. CVPR, 2016.
[2] X. Wang and A. Gupta. Generative image modeling using style and structure
adversarial networks. ECCV, 2016.
[3] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer
and super-resolution. 2016.
[4] Y. Zhou and T. L. Berg. Learning temporal transformations from time-lapse
videos. In ECCV, 2016.
[5] D. Yoo, N. Kim, S. Park, A. S. Paek, and I. S. Kweon. Pixellevel domain transfer.
ECCV, 2016.
[6] S. Hwang, J. Park, N. Kim, Y. Choi, and I. So Kweon. Multispectral pedestrian
detection: Benchmark dataset and baseline. In CVPR, pages 1037–1045, 2015.
[7] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for
biomedical image segmentation. In MICCAI, pages 234–241. Springer, 2015.
[8] A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels
using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.
[9] C. Li and M. Wand. Precomputed real-time texture synthesis with markovian
generative adversarial networks. ECCV, 2016.
[10] A. A. Efros and T. K. Leung. Texture synthesis by nonparametric sampling. In
ICCV, volume 2, pages 1033–1038. IEEE, 1999.
[11] L. A. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis and the controlled
generation of natural stimuli using convolutional neural networks. arXiv preprint
arXiv:1505.07376, 12, 2015.
52
[12] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with
neural networks. Science, 313(5786):504–507, 2006.
[13] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image
analogies. In SIGGRAPH, pages 327–340. ACM, 2001.
[14] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional
neural networks. CVPR, 2016.
[15] C. Li and M. Wand. Combining markov random fields and convolutional neural
networks for image synthesis. CVPR, 2016.
[16] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with
deep convolutional generative adversarial networks. arXiv preprint
arXiv:1511.06434, 2015.
[17] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training
by reducing internal covariate shift. 2015. 3, 4
[18] M. Mathieu, C. Couprie, and Y. LeCun. Deep multi-scale video prediction
beyond mean square error. ICLR, 2016.
[19] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A.
Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
[20] D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.
[21] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing
ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
[22] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B.
Freymann, K. Farahani, C. Davatzikos, Advancing the cancer genome atlas
glioma mri collections with expert segmentation labels and radiomic features,
Nature Scientific Data, 2017.
53
[23] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B.
Freymann, K. Farahani, C. Davatzikos, Segmentation labels and radiomic
features for the pre-operative scans of the tcga-gbm collection, The Cancer
Imaging Archive, 2017.
[24] A. Beers, K. Chang, J. Brown, E. Sartor, C. Mammen, E. Gerstner, B. Rosen:
“Sequential 3D U-Nets for Biologically-Informed Brain Tumor Segmentation”,
arXix preprint arXiv:1709.02967, 2017.
[25] V. Alex, M. Safwan: “Automatic Segmentation and Overall Survival Prediction
in Gliomas using Fully Convolutional Neural Network and Texture Analysis”,
arXiv preprint arXiv:1712.02066, 2017.
[26] H. Bharath, S. Colleman, D. Sima, S. Van Huffel, Tumor Segmentation from
Multimodal MRI Using Random Forest with Superpixel and Tensor Based
Feature Extraction. In: Crimi A., Bakas S., Kuijf H., Menze B., Reyes M. (eds)
Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries.
BrainLes 2017. Lecture Notes in Computer Science, vol 10670. Springer, Cham
[27] Guotai Wang, Wenqi Li, Sebastien Ourselin: “Automatic Brain Tumor
Segmentation using Cascaded Anisotropic Convolutional Neural Networks”,
arXiv preprint arXiv:1709.00382, 2017.
[28] Automatic segmentation of brain tumor from MR images using SegNet: selection
of training data sets (NTUST)
54
55

Masters Thesis

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Masters Thesis

Hochgeladen von

Copyright:

Verfügbare Formate

電資學院外國學生專班

Automatic Brain Tumor Segmentation with a

Adversarial Neural Network.

School: National Taipei University of Technology (NTUT)

Department: International Master of Science in Electrical Engineering and Computer

Time: July 2018

Author: Mpendulo Mamba

Advisor: Prof. Yo-Ping Huang

Keywords: Deep learning, tumor, magnetic resonance imaging (MRI), generative

Neuroimaging Informatics Technology Initiative (NIfTI) format. Recently, automatic

novel solution to 3D image segmentation for medical segmentation problems. These

formulations. We show that this method is effective at generating slices of segmentation

computer assisted intervention (MICCAI), which consists of MRI scans of high-grade

achieves an 84.93% dice score when compared with the dataset.

mentor through his invaluable academic and professional supervision. I deeply

stay in the school.

guidance, inspiration, freedom and a wonderful working environment, Nontobeko,

Trizaurah, Chun Ming, Sakhile, Mluleki, Mduduzi, Lucky, Likhanyiso, Muzi-wandile,

amongst you guys.

emotionally and financially support me.

ACKNOWLEDGMENTS ........................................................................................... iii

List of Figures ........................................................................................................... vi

List of Tables ............................................................................................................ vii

1.1 Background .....................................................................................................1

1.2 Related Work ...................................................................................................5

1.3 Research Objective ..........................................................................................7

1.5 Thesis Development.........................................................................................8

Chapter 2 Literature Review ................................................................................... 10

2.1 Glioma ........................................................................................................... 10

2.2 Magnetic Resonance Imaging (MRI).............................................................. 14

2.3 Deep Learning ............................................................................................... 19

2.3.2 Deep Learning Algorithms ........................................................................20

2.3.3 Convolution Layer ....................................................................................21

2.3.4 Subsampling Layer ................................................................................... 23

Chapter 3 Proposed Methods ....................................................................................... 25

3.1 Model Architecture ........................................................................................ 25

3.1.1 Pre-Processing Layer ................................................................................ 26

3.1.2 Generator .................................................................................................. 28

3.1.3 Discriminator ............................................................................................31

3.1.1 DCGAN ..................................................................................................... 33

3.2 Evaluation Methods .......................................................................................34

3.2.1 Objective Function .................................................................................... 34

3.2.2 Loss Function ............................................................................................35

3.2.3 Dice Coefficient .........................................................................................36

3.2.4 Perceptual Evaluation ............................................................................... 36

Chapter 4 Experimental Setup and Training ................................................................. 38

4.1 Experimental Setup ........................................................................................ 38

Chapter 5 Results and Discussion ................................................................................ 42

Chapter 6 Conclusions and Future Work ......................................................................51

6.1 Conclusions ................................................................................................... 51

6.2 Future Work ................................................................................................... 51

Figure 2. Example of an LGG Tumor ............................................................................3

Figure 3. Low-grade brain glioma in a 28-year-old male.............................................. 11

Figure 5. Schematic of construction of a cylindrical superconducting MR scanner. ...... 16

Figure 6. Examples of T1 weighted, T2 weighted and PD weighted MRI scans ........... 18

Figure 7. Restricted Boltzmann Machines....................................................................20

Figure 8. Convolutional Neural Network Architecture .................................................21

Figure 9. Image Representation and Convolutional Matrix .......................................... 22

Figure 10. Example Convolution Calculation ..............................................................23

Figure 11. Example subsampling layer ................................................................. 24

Figure 12. Proposed GAN Architecture ................................................................ 25

Figure 13. Dataset file naming structure ....................................................................26

Figure 14. MRI 2D slices after preprocessing from 3D file ..........................................27

Figure 15. Auto-encoder and U-net Architecture .......................................................... 28