Beruflich Dokumente
Kultur Dokumente
碩士學位論文
研究生:Mpendulo Mamba
指導教授:黃有評
中華民國 107 年 1 月
ABSTRACT
Title : Automatic Brain Tumor Segmentation with a 3-Dimensional Generative
Pages: 54
Science (IMEECS)
Degree: Master
adversarial network (GAN), loss function, high grade glioma (HGG), low grade
glioma (LGG)
Brain tumor segmentation is a very crucial task in medical image processing. Early
diagnosis of brain tumors plays an important role in improving treatment possibilities and
increases the survival rate of the patients. Manual segmentation of the brain tumors for
cancer diagnosis, from large amounts of magnetic resonance images (MRI) generated in
clinical routine, is a difficult and time consuming task. There is a need for automatic brain
image segmentation. In this work, we demonstrate a deep neural network for volumetric
segmentation that learns from a series of annotated volumetric images given in the
segmentation using deep learning methods proved effective since these methods achieve
state-of-the-art results and can address the problem better than other methods. Deep
learning methods can also enable efficient processing and objective evaluation of the large
i
amounts of MRI-based images. We investigate 3D conditional adversarial networks as a
networks not only learn the mapping from input images to output images, but also learn
a loss function to train the mapping between them. This makes it possible to apply the
same generic approach to problems that traditionally would require very different loss
data from 3D labelled maps. We utilize a dataset from the medical image computing and
gliomas (HGG) which are tumors of the central nervous system and low-grade gliomas
(LGG) which are referred to as slow-growing tumors. The proposed model is able to
discriminate between well segmented and poorly segmented images and the
generative model can create segmentation image masks around the tumors and
ii
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to all those who supported and helped
me while I was writing this thesis. Special thankfulness to my advisor, Professor Yo-
Ping Huang, for his guidance and support. Professor Yo-Ping Huang is an incredible
appreciate him for the useful suggestions, comments, remarks, genuine concern for
my well-being and support through the learning process of this thesis and my overall
On the financial support for my study at Taipei Tech, I would like to thank the
government of the Kingdom of Eswatini, the Ministry of Education and Royal Science
and Technology Park (RSTP) in Eswatini for providing the master degree scholarship
in Taiwan.
My truthful appreciation to all my colleagues and lab mates for the support,
Basanta, Howard, James, Yu an, Peter, Danny, Eric, Justin, Lungile, Mthunzi,
all Eswatini friends, all Taiwanese friends, and foreign friends. Not only it is an honor
to work in such a wonderful place, but also it is such a privilege for me to live here
iii
Finally, and most I would love thank to my incredible parents, family, my best
friend Lwazi and my girlfriend Gugu whose inspiration always patiently, morally,
Siyabonga Kakhulu!
Contents
ABSTRACT .................................................................................................................. i
1.4 Limitations.......................................................................................................8
iv
2.3.5 Fully Connected Layer.............................................................................. 24
References................................................................................................................... 52
v
List of Figures
Figure 1. Example of an HGG Tumor ............................................................................2
Figure 4. Patient being positioned for MR study of the head and abdomen .................. 15
Figure 21. FLAIR experiment (a) Training losses (b) Dice Coefficient Training ........ 41
Figure 22. T2 experiment (a) Training losses (b) Dice Coefficient Training. ................ 42
vi
Figure 25. A Sample Brain Structure Mistaken for Tumor............................................ 45
List of Tables
Table 1. Standard Display of MRI Images ................................................................... 18
vii
viii
Chapter 1 Introduction
1.1 Background
Over the last few decades, the rapid development of noninvasive brain imaging
technologies has opened new horizons in analyzing and studying the brain anatomy and
function. Enormous progress in accessing brain injury and exploring brain anatomy has
been made using magnetic resonance imaging (MRI). The advances in brain MR imaging
have also provided large amount of data with an increasingly high level of quality. The
analysis of these large and complex MRI datasets has become a tedious and complex task
for clinicians, who have to manually extract important information. This manual analysis
variability studies. These difficulties in brain MRI data analysis required inventions in
been extensively used to assist doctors in qualitative diagnosis. Brain MRI segmentation
is an essential task in many clinical applications because it influences the outcome of the
entire analysis. This is because different processing steps rely on accurate segmentation
of anatomical regions. For example, MRI segmentation is commonly used for measuring
and visualizing different brain structures, for delineating lesions, for analyzing brain
development, and for image-guided interventions and surgical planning. This diversity of
of different accuracy and degree of complexity. These techniques can be used to perform
automatic segmentation on patient scans with conditions that are caused by tumors, such
as dementia. Brain tumors can be classified into two types for purposes of this study, high-
1
gliomas (HGG) are tumors of the central nervous system (CNS). They are solid tumors
arising from transformed cells of the brain and/or the spinal cord. Since they directly
originate from the CNS, they are also called primary CNS tumors, thereby differentiating
them from malignant tumors of other organs that have spread (metastasized) to the CNS.
HGG in children and adolescents are rare. However, they show considerably
malignant behavior since they usually grow fast and frequently destroy healthy brain
tissue. By being able to migrate within the CNS for various centimeters, HGG can induce
the development of new tumors. Without the appropriate therapy, HGG can be lethal
within only a few months. Due to the usually rapid and infiltrating growth of these tumors,
tumors in children and adolescents. They appear in all age groups; yet, children aged
younger than three years are rarely affected. Each year, about 60 to 80 children and
adolescents younger than 15 years of age are newly diagnosed with a high-grade glioma.
This corresponds to an incidence rate of 5 to 10 new diagnoses per 1.000.000 children per
2
Low-grade gliomas are tumors of the CNS as well. They are solid tumors arising from
malignantly transformed cells of the brain or spinal cord. Since they develop directly from
CNS cells, they are also called primary CNS tumors in order to distinguish them from
cancers of other body parts that have spread to the CNS (metastasis).
Low-grade gliomas can be found in all parts of the nervous system, most of them, however,
are situated in the cerebellum and the central regions of the cerebrum, such as the optic
pathway (optic pathway gliomas) and the hypothalamic-pituitary axis. They usually grow
very slowly. Nevertheless, for a growing lesion, the space in the bony skull is limited. As
a consequence, vital areas of the brain may be damaged by the space occupying, growing
tumor. Therefore, low-grade gliomas can become life threatening in the course of the
disease. With a ratio of about 30 to 40 %, low-grade gliomas are the most common CNS
tumors among children and adolescents. They occur at all ages with a mean age at
diagnosis between five and seven years. In Germany, about 250 children and adolescents
under 18 years of age are newly diagnosed with low-grade glioma each year. This
3
Compared to traditional segmentation methods, deep learning does not rely on the
Instead, raw image intensities are taken as input subjected to many layers of convolutions,
to calculate an output signal. The many degrees of freedom and inclusion of non-
linearities allow the algorithm to learn complex patterns with a high level of abstraction.
Up until this point many of the deep learning algorithms that have been applied to brain
tumor segmentation have been 2D Convolutional Neural Networks (CNNs), which do not
take advantage of the full breadth of volumetric information. Recently, there has been an
increase in popularity of 3D CNNs [22, 23], which have been shown to be effective for
this task, though at the expense of additional computational complexity. For example, the
3D U-Net architecture was successfully applied for the segmentation of the Xenopus
enhancing, and edema in glioblastoma patients [24]. The 2018 BraTs Challenge patient
are provided with coregistered and skullstripped T2, pre-contrast T1, post-contrast-T1,
and FLAIR images, and are then asked to generate segmentations that can then be
enhancing tumor. Ground-truth segmentations are manually drawn by one to four raters
and then approved by expert neuro-radiologists. Our proposed segmentation method for
BraTS 2018 involves the training of several 3D generative adversarial network (GAN).
The result is a fully automatic segmentation model requiring no additional data aside from
four coregistered input modalities and a practical computation time for batch processing.
4
1.2 Related Work
A study that explains the use of a fully convolutional neural network (FCNN) for
paper [25]. They were able to achieve a fully automatic voxel based classification by
training a 23 layer deep FCNN on 2-D slices that were obtained from patient volumes.
Their model was trained on slices obtained from 130 patients and was validated on 50
patients. The false positives in segmentation map generated by their FCNN were removed
by connected component analysis. On the MICCAI BraTS 2017 validation dataset, this
model achieved an average whole tumor, tumor core and active dice score of 0.83, 0.69
In another work from this paper [ref] they present a solution for brain tumor
segmentation in from MICCAI BraTS 2017 dataset. They used three different
convolutional neural networks with the same 3D U-Net architecture, they were trained
for each of the tumor segmentation targets (whole tumor, tumor core and enhancing tumor)
with 3D patches as inputs. The preprocessing step was done in each case separately, that
allowed for an equalizing histogram on whole tumor and normalizing voxels on all
modalities. This solution led to Dice coefficients of 0.91, 0.9118 and 0.827 after testing
with 30% of the training set and 0.8844, 0.7674 and 0.7261 on the leaderboard validation
set (respectively for each segmentation target) provided by MICCAI for validation.
glioblastoma patients, the team in this paper [ref] entered to the competition with a fully
automatic pipeline that involves chaining together several unique 3D U-Nets, a type of
3D patch-based convolutional neural network. Their pipeline takes advantage of the prior
5
knowledge from previous studies that enhancing and nonenhancing tumors are likely to
be found within regions of edema and within proximity to each other by feeding the
prediction outputs of earlier networks into later networks. They achieved greater context
for this patch-based sampling method by predicting downsampled labels and then
upsampling them using a separate 3D U-Net. They used a fine-tuning network and a
candidate evaluation network to account for tissue border discrepancies and catastrophic
validation data with unknown ground truth segmentations had average dice coefficients
of 0.78, 0.67, and 0.68 for whole tumor, enhancing, and non-enhancing tissue respectively.
In another study, they realized that identification and localization of brain tumor
tissues plays an important role in diagnosis and treatment planning of gliomas [26], hence
fully automated superpixel wise tumor tissue segmentation algorithm using random forest
was proposed and implemented in their study. They extracted features for the random
forest classifier by constructing a tensor from multi-parametric MRI data and applying
multi-linear singular value decomposition. The method was trained and tested on high
grade glioma (HGG) patients from the BRATS 2017 training dataset. It achieved a
performance of 83%, 76% and 78% Dice scores for whole tumor, enhancing tumor and
modality MR images with brain tumor into background and three subregions: enhanced
tumor core, whole tumor and tumor core in this study [27]. The cascade was designed to
according to the sub region hierarchy. Segmentation of the first (second) step is used as a
binary mask for the second (third) step. Each network consists of multiple layers of
anisotropic and dilated convolution filters that were obtained by training each network
6
end-to-end. Residual connections and multi-scale predictions were employed in these
validation set predicted average Dice scores of 0.764, 0.897, 0.825 for enhanced tumor
Robotics Group at the University of Cambridge, UK. In a study from this work [28], they
a two-dimensional convolutional neural network and used the HGG data sets (n = 210)
of BraTS 2017 for network training. They compared training schemes including or
excluding slices without labeled tumor regions. The input images were FLAIR images.
From the results, the dice similarity coefficients (dice=0.74, 2-fold cross-validation)
obtained with training data sets excluding slices without labeled tumor region was
significantly higher than those obtained with all slices (P<0.05, paired T- test). In the
preliminary results, they were able to perform fully automated segmentations of whole
Many studies have been discussed in the previous section regarding brain tumor
segmentation methods with their performance scores. With the aim of improving those
performance scores, in this research we proposed the use of the emerging innovation of
generative adversarial networks (GANs), which are a variant architecture of deep neural
we use our own architecture of the GAN derived from the original GAN paper [19]. Due
to the incredible precision required in medical procedures, we chose the use of a U-Net
architecture for our generative half of the model, which is known to eliminate bottlenecks
7
that bring about loss of data in basic auto-encoders [7]. Moreover, we apply a Markovian
Patch discriminator for the second half our GAN, this allows to feed a whole image slice
as a patch during classification [9]. The purpose of this study is to develop an AI model
for MRI brain tumor segmentation, which can be used by experts in the medical field to
speed up and verify their processes of identifying and segmenting tumors and possibly
diagnose them in their early stages. Also, due to the nature of GANs we expect to obtain
a trained model that can not only generate segmentations but can identify incorrectly
segmented images.
1.4 Limitations
There are several limitations in this research and they are as follows:
1. This research only focuses on brain tumor segmentation based on MRI scans
only.
2. The MRI scans are transposed to one view point, the transverse view only.
3. The research dataset comes annotated such that enhanced tumor core, whole
tumor and tumor core can all be trained upon, we only focus on whole tumor.
5. This research applies only deep machine learning techniques to try and tackle
the problem.
literature review and related technology on the subject of brain tumor segmentation.
Chapter 3 provides the proposed methodology that has been done in this research. The
experimental setup and training of the proposed model is discussed in Chapter 4. Chapter
8
5 covers our results and discussions. As a final point, Chapter 6 covers the conclusions
9
Chapter 2 Literature Review
2.1 Glioma
A glioma is a type of tumor that starts in the glial cells of the brain or the spine.
Gliomas comprise about 30 per cent of all brain tumors and central nervous system tumors,
2.1.1 Classification
By type of cell: Gliomas are named according to the specific type of cell with which they
share histological features, but not necessarily from which they originate. The main types
of gliomas are:
Oligodendrogliomas: oligodendrocytes
glia
By grade: Gliomas are further categorized according to their grade, which is determined
10
Figure 3. Low-grade brain glioma in a 28-year-old male.
Low-grade gliomas [WHO grade II] are well-differentiated (not anaplastic); these
tend to exhibit benign tendencies and portend a better prognosis for the patient.
However, they have a uniform rate of recurrence and increase in grade over time
Of numerous grading systems in use, the most common is the World Health
Organization (WHO) grading system for astrocytoma, under which tumors are graded
prognosis).
By location: Gliomas can be classified according to whether they are above or below a
The supratentorial is above the tentorium, in the cerebrum, and mostly found in
adults (70%).
The infratentorial is below the tentorium, in the cerebellum, and mostly found in
children (70%).
11
The pontine tumors are located in the pons of the brainstem. The brainstem has
three parts (pons, midbrain, and medulla); the pons controls critical functions such
2.1.2 Pathophysiology
High-grade gliomas are highly vascular tumors and have a tendency to infiltrate.
They have extensive areas of necrosis and hypoxia. Often, tumor growth causes a
breakdown of the blood–brain barrier in the vicinity of the tumor. As a rule, high-grade
gliomas almost always grow back even after complete surgical excision, so are commonly
called recurrent cancer of the brain. Conversely, low-grade gliomas grow slowly, often
over many years, and can be followed without treatment unless they grow and cause
symptoms. Several acquired (not inherited) genetic mutations have been found in
gliomas. Tumor suppressor protein 53 (p53) is mutated early in the disease, p53 is the
"guardian of the genome", which, during DNA and cell duplication, makes sure the DNA
is copied correctly and destroys the cell (apoptosis) if the DNA is mutated and cannot be
fixed. When p53 itself is mutated, other mutations can survive. Phosphatase and tensin
homolog (PTEN), another tumor suppressor gene, is itself lost or mutated. Epidermal
growth factor receptor, a growth factor that normally stimulates cells to divide, is
amplified and stimulates cells to divide too much. Together, these mutations lead to cells
in IDH1 and IDH2 were found to be part of the mechanism and associated with a more
favorable prognosis.
2.1.3 Prognosis
Gliomas are rarely curable. The prognosis for patients with high-grade gliomas is
12
generally poor, and is especially so for older patients. Of 10,000 Americans diagnosed
each year with malignant gliomas, about half are alive one year after diagnosis, and 25%
after two years. Those with anaplastic astrocytoma survive about three years.
Glioblastoma multiforme has a worse prognosis with less than a 12-month average
survival after diagnosis, though this has extended to 14 months with more recent
treatments.
Low grade: For low-grade tumors, the prognosis is somewhat more optimistic. Patients
diagnosed with a low-grade glioma are 17 times as likely to die as matched patients in the
general population. The age-standardized 10-year relative survival rate was 47%. One
study reported that low-grade oligodendroglioma patients have a median survival of 11.6
High grade: This group comprises anaplastic astrocytomas and glioblastoma multiforme.
Whereas the median overall survival of anaplastic (WHO grade III) gliomas is
15 months.
Diffuse intrinsic pontine glioma: Diffuse intrinsic pontine glioma primarily affects
children, usually between the ages of 5 and 7. The median survival time with DIPG is
under 12 months. Surgery to attempt tumour removal is usually not possible or advisable
for DIPG. By their very nature, these tumours invade diffusely throughout the brain stem,
growing between normal nerve cells. Aggressive surgery would cause severe damage to
neural structures vital for arm and leg movement, eye movement, swallowing, breathing,
and even consciousness. Trials of drug candidates have been unsuccessful. The disease is
either IDH1 or IDH2 have a relatively favorable survival, compared with patients with
13
glioma with wild-type IDH1/2 genes. In WHO grade III glioma, IDH1/2-mutated glioma
have a median prognosis of approximately 3.5 years, whereas IDH1/2 wild-type glioma
perform poor with a median overall survival of c. 1.5 years. In glioblastoma, the
difference is larger. There, IDH1/2 wild-type glioblastoma have a median overall survival
than 3 years.
field and radio waves to produce detailed pictures of the body's organs and structures. An
MRI differs from a CAT scan (also called a CT scan or a computed axial tomography
scan) because it doesn't use radiation. An MRI scanner consists of a large doughnut-
shaped magnet that often has a tunnel in the center. Patients are placed on a table that
slides into the tunnel. Some centers have open MRI scanners that have larger openings
and are helpful for patients with claustrophobia. MRI scanners are located in hospitals
and radiology centers. During the examination, radio waves manipulate the magnetic
position of the atoms of the body, which are picked up by a powerful antenna and sent to
sectional black-and-white images of the body. These images can be converted into three-
dimensional (3-D) pictures of the scanned area. These images help to pinpoint problems
in the body.
MRI is used to detect a variety of conditions, including problems of the brain, spinal cord,
skeleton, chest, lungs, abdomen, pelvis, wrists, hands, ankles, and feet. In some cases, it
14
can provide clear images of body parts that can't be seen as well with an X-ray, CAT scan,
or ultrasound. MRI is particularly valuable for diagnosing problems with the eyes, ears,
problems with joints, cartilage, ligaments, and tendons. MRI can also be used to identify
Figure 4. Patient being positioned for MR study of the head and abdomen
To perform a study, the person is positioned within an MRI scanner that forms a strong
magnetic field around the area to be imaged. In most medical applications, protons
(hydrogen atoms) in tissues containing water molecules create a signal that is processed
to form an image of the body. First, energy from an oscillating magnetic field
temporarily is applied to the patient at the appropriate resonance frequency. The excited
hydrogen atoms emit a radio frequency signal, which is measured by a receiving coil.
15
The radio signal may be made to encode position information by varying the main
magnetic field using gradient coils. As these coils are rapidly switched on and off they
create the characteristic repetitive noise of an MRI scan. The contrast between different
tissues is determined by the rate at which excited atoms return to the equilibrium state.
Exogenous contrast agents may be given to the person to make the image clearer. The
major components of an MRI scanner are the main magnet, which polarizes the sample,
the shim coils for correcting shifts in the homogeneity of the main magnetic field, the
gradient system which is used to localize the MR signal and the RF system, which
excites the sample and detects the resulting NMR signal. The whole system is controlled
16
MRI requires a magnetic field that is both strong and uniform. The field strength of the
magnet is measured in teslas – and while the majority of systems operate at 1.5 T,
commercial systems are available between 0.2 and 7 T. Most clinical magnets are
superconducting magnets, which require liquid helium. Lower field strengths can be
achieved with permanent magnets, which are often used in "open" MRI scanners for
claustrophobic patients. Recently, MRI has been demonstrated also at ultra-low fields,
possible by prepolarization (on the order of 10-100 mT) and by measuring the Larmor
2.2.3 T1 and T2
Each tissue returns to its equilibrium state after excitation by the independent relaxation
processes of T1 (spin-lattice; that is, magnetization in the same direction as the static
magnetic field) and T2 (spin-spin; transverse to the static magnetic field). To create a T1-
changing the repetition time (TR). This image weighting is useful for assessing the
cerebral cortex, identifying fatty tissue, characterizing focal liver lesions and in general
by changing the echo time (TE). This image weighting is useful for detecting edema and
inflammation, revealing white matter lesions and assessing zonal anatomy in the prostate
and uterus.
17
Figure 6. Examples of T1 weighted, T2 weighted and PD weighted MRI scans.
The standard display of MRI images is to represent fluid characteristics in black and white
18
2.3 Deep Learning
Deep learning is a subfield of machine learning, which aims to learn a hierarchy of
features from input data. Nowadays, researchers have intensively investigated deep
processing.
Deep learning methods are a group of machine learning methods that can
a deep architecture. The deep learning methods have the ability to automatically
learn features at multiple levels, which makes the system be able to learn complex
mapping function directly from data, without the help of the human-crafted features.
The most characterizing feature of deep learning methods is that their models all have
layers in the network. In contrast, a shallow architecture has only a few hidden
19
2.3.2 Deep Learning Algorithms
consequence, there are a large number of related approaches. Generally speaking, these
generative model. It is composed of one layer of visible units and one layer of hidden
units. The visible units represent the input vector of a data sample and the hidden units
represent features that are abstracted from the visible units. Every visible unit is connected
to every hidden unit, whereas no connection exists within the visible layer or hidden layer.
Convolutional neural networks (CNNs): During the last seven years, the quality of
image classification and object detection has been dramatically improved due to the deep
20
learning method. Convolutional neural networks (CNNs) brought a revolution in the
computer vision area. It not only have been continuously advancing the image
classification accuracy, but also play an important role for generic feature extraction such
highly effective and commonly used in computer vision applications. The convolution
neural network contains three types of layers: convolution layers, subsampling layers, and
full connection layers. The whole architecture of the convolutional neural network is
shown in Figure 8. A brief introduction to each type of layer is provided in the following
paragraphs.
As Figure 8 shows, in convolution layer, the left matrix is the input, which is a
digital image, and the right matrix is a convolution matrix. The convolution layer takes
21
the convolution of the input image with the convolution matrix and generates the output
image.
Usually, the convolution matrix is called filter and the output image is called filter
9. Each time, a block of pixels is convoluted with a filter and generates a pixel in a new
image.
22
Figure 10. Example convolution calculation.
give the neural network more invariance and robustness. The most used method
for subsampling layer in image processing tasks is max pooling. So the subsampling layer
is frequently called max pooling layer. The max pooling method is shown in Figure 10.
The image is divided into blocks and the maximum value of each block is the
subsampling layer is as follows. First, the subsampling layer has fewer parameters and
23
Figure 11. Example subsampling layer.
Full connection layers are similar to the traditional feed-forward neural layer, see
Figure 11. They make the neural network fed forward into vectors with a predefined
length. We could fit the vector into certain categories or take it as a representation vector
24
Chapter 3 Proposed Methods
3.1 Model Architecture
The model we have designed for the task of autonomous brain segmentation has at its
core the basic design of a conditional generative adversarial network. Conditional GANs
Our model builds off of the GAN architecture in a pretty different way. In the GAN
adversarial manner. The generator is trained to create realistic image masks from a noise
input z, and the discriminator is trained to differentiate between the real image mask x and
those produced by the generator G(x). Using the feedback from the discriminator, our
generator can improve its ability to produce images so as to trick the discriminator into
classifying them as real in the future. Doing so produces more accurate image masks for
segmentation.
25
The most obvious change when going from a traditional GAN to our model is that instead
of a noise vector z, the generator is fed an actual image x, which we want to convert into
another structurally similar image mask y. Our generator should now produce G(x), which
In addition to the traditional GAN losses, we also apply an L1 loss, which is just a pixel-
wise absolute value loss on the generated image masks. In this situation, we force the
𝐿1 = |𝐺(𝑥) − 𝑦|.
In a traditional GAN we would never apply such a loss because it would prevent the
generator from producing new images. In the case of image translation however, we care
about precise image translations rather than new ones. This need for precise images is
also why we don’t entirely throw away the GAN aspect of our network. An L1 loss by
images that are “on average” correct. By keeping the GAN losses, we encourage the
network to produce crisp images that are visually indistinguishable from the real ones.
The files are organized as shown in Figure 13 below from the file system.
26
Table 2 below outlines differences in the file types in the dataset.
TR(msec) TE(msec)
T1-Weighted 500 14
T2-Weighted 4000 90
27
3.1.2 Generator
While trying to ensure accurate images, the third addition to our model is the
implementation of a U-Net architecture in the generator. Put simply, the U-Net is an auto-
encoder in which the outputs from the encoder-half of the network are concatenated with
their mirrored counterparts in the decoder-half of the network. By including some of these
skip connections, we prevent the middle part of the network from becoming a bottleneck
In the case of our model, our input is the image x that we want to convert and the output
G(x) is the image we want it to become. By concatenating mirrored layers, we are able to
ensure that the structure of the original image is passed over to the decoder-half of the
network directly. When thinking about the task of segmentation, the representations
learned at each scale of the encoder are extremely useful for the decoder in terms of
So taking all the points mentioned above about the design of our generator model, we
came up with a 68 layer deep neural network implementation, which we can quickly go
Due to the complexity of the model, we have color coded the layers and provided legends
on the bottom right of the image for reference. Table 3 is also provided below with
implementation details of the layers, please note that the table below is not complete as
this is a summary of the model, only the first 10 and the last layer are shown.
2 Convolution_1 Convolution Layer (n, 1, 256, 256) (n, 64, 128, 128)
3 Leaky_Relu_1 Leaky Relu (n, 64, 128, 128) (n, 64, 128, 128)
29
4 Convolution_2 Convolution Layer (n, 64, 128, 128) (n, 128, 64, 64)
5 Batch_Norm_2 Batch Normalization (n, 128, 64, 64) (n, 128, 64, 64)
6 Leaky_Relu_2 Leaky Relu (n, 128, 64, 64) (n, 128, 64, 64)
7 Convolution_3 Convolution Layer (n, 128, 64, 64) (n, 256, 32, 32)
8 Batch_Norm_3 Batch Normalization (n, 256, 32, 32) (n, 256, 32, 32)
9 Leaky_Relu_3 Leaky Relu (n, 256, 32, 32) (n, 256, 32, 32)
10 Convolution_4 Convolution Layer (n, 256, 32, 32) (n, 512, 16, 16)
resolution input grid to a high resolution output grid. In addition, for the problems we
consider, the input and output differ in surface appearance, but both are renderings of the
same underlying structure. Therefore, structure in the input is roughly aligned with
structure in the output. We design the generator architecture around these considerations.
Many previous solutions [1, 2, 3, 4, 5] to problems in this area have used an encoder-
decoder network [6]. In such a network, the input is passed through a series of layers that
progressively downsample, until a bottleneck layer, at which point the process is reversed.
Such a network requires that all information flow pass through all the layers, including
the bottleneck. For many image translation problems, there is a great deal of low-level
information shared between the input and output, and it would be desirable to shuttle this
information directly across the net. For example, in the case of image colorization, the
To give the generator a means to circumvent the bottleneck for information like this, we
add skip connections, following the general shape of a “U-Net” [7]. Specifically, we add
skip connections between each layer 𝑖 and layer 𝑛 − 𝑖, where n is the total number of
layers. Each skip connection simply concatenates all channels at layer i with those at layer
30
𝑛 − 𝑖.
3.1.3 Discriminator
PatchGAN. The concept behind the PatchGAN is that instead of outputting a single
discriminator score for the whole image, a set of separate scores are produced for every
patch of the image, and then an average of the scores is taken to produce a final score.
precisely tuned patch size can yield improved image quality. A PatchGAN with a patch
size configured to be the same as the image size is therefore equivalent to a traditional
discriminator architecture. For the sake of simplicity on our work, our model’s
31
Table 4 below, lists the input shape details of the discriminator part of our proposed model:
It is well known that the L2 loss and L1 produces blurry results on image generation
problems [8]. Although these losses fail to encourage high frequency crispness, in many
cases they nonetheless accurately capture the low frequencies. For problems where this
is the case, we do not need an entirely new framework to enforce correctness at the low
frequencies. L1 will already do. This motivates restricting the GAN discriminator to only
(Eqn. 4). In order to model high-frequencies, it is sufficient to restrict our attention to the
which we term a PatchGAN – that only penalizes structure at the scale of patches. This
discriminator tries to classify if each N _ N patch in an image is real or fake. We run this
discriminator convolutationally across the image, averaging all responses to provide the
ultimate output of D.
In Section 4.4, we demonstrate that N can be much smaller than the full size of the image
and still produce high quality results. This is advantageous because a smaller PatchGAN
has fewer parameters, runs faster, and can be applied on arbitrarily large images.
Such a discriminator effectively models the image as a Markov random field, assuming
32
independence between pixels separated by more than a patch diameter. This connection
was previously explored in [9], and is also the common assumption in models of texture
[10, 11] and style [12, 13, 14, 15]. Our PatchGAN can therefore be understood as a form
of texture/style loss.
3.1.1 DCGAN
GANs are a type of generative model that learns a mapping from a random noise vector
from an observed image x and a random noise vector z to y, G: {x, z} y. The generator
Table 5 outlines the implementation details of the input and output fields of the whole
33
deep conditional generative adversarial network.
We adapt our generator and discriminator architectures from those in [16]. Both generator
for checking if we have reached an equilibrium point in the training and the other for
checking the performance of our model against previous studies as mentioned in the
Where 𝐺 tries to minimize this objective against an adversarial D that tries to maximize
Previous approaches have found it beneficial to mix the GAN objective with a more
traditional loss, such as L2 distance [1]. The discriminator’s job remains unchanged, but
34
the generator is tasked to not only fool the discriminator but also to be near the ground
truth output in an L2 sense. We also explore this option, using L1 distance rather than L2
Without z, the net could still learn a mapping from x to y, but would produce deterministic
outputs, and therefore fail to match any distribution other than a delta function.
Past conditional GANs have acknowledged this and provided Gaussian noise z as an input
to the generator, in addition to x (e.g., [2]). In initial experiments, we did not find this
strategy effective – the generator simply learned to ignore the noise – which is consistent
with Mathieu et al. [18]. Instead, for our final models, we provide noise only in the form
of dropout, applied on several layers of our generator at both training and test time.
Despite the dropout noise, we observe only minor stochasticity in the output of our nets.
Designing conditional GANs that produce highly stochastic output, and thereby capture
the full entropy of the conditional distributions they model, is an important question left
To explain our choice of the loss function, we need to look into two types of loss functions
that are suitable for this kind of problem, namely the L1 and L2 loss.
L1 loss function is used to minimize the error which is the sum of all the absolute
35
𝐿1 = |𝑦 −𝑦 |
L2 loss function is used to minimize the error which is the sum of all the squared
𝐿2 = 𝑦 −𝑦
The Dice coefficient (also known as Dice similarity index) is the same as the F1 score,
but it's not the same as accuracy. The main difference might be the fact that accuracy takes
into account true negatives while Dice coefficient and many other measures just handle
segmentations, create an algorithm to compute the dice score of the between 2 images
o Maximum similarity = 1
o No similarity = 0
36
give some feedback on what they perception is about the results. So they can issue a
37
Chapter 4 Experimental Setup and Training
which was setup on a linux Ubuntu distribution. The main libraries used were TensorFlow
and Keras. The machine was equipped with two NVIDEA 1080i GPUs.
Our dataset was obtained from the Medical Image Computing and Computer Assisted
The datasets used in this work have been updated, since 2016, with more routine
clinically-acquired 3T multimodal MRI scans and all the ground truth labels have been
scans of glioblastoma (GBM/HGG) and lower grade glioma (LGG), 840 and 300 patients
respectively, with pathologically confirmed diagnosis and available OS, are provided as
the training, validation and testing data. These multimodal scans describe a) native (T1)
Attenuated Inversion Recovery (FLAIR) volumes, and were acquired with different
clinical protocols and various scanners from about 19 institutions, mentioned as data
38
contributors in the MICCAI website. All the imaging datasets have been segmented
manually, by one to four raters, following the same annotation protocol, and their
GD-enhancing tumor, the peritumoral edema, and the necrotic and non-enhancing tumor
core, as described in the BraTS reference paper, see Figure 19 above, published in IEEE
Transactions for Medical Imaging. The provided data are distributed after their pre-
processing, i.e. co-registered to the same anatomical template, interpolated to the same
To test the performance of our model and design, we perform four experiments based on
the nature of our dataset. To understand the setup, see Table 6, Figure 20 below and Table
1:
Table 6 above summarizes the setup of our training data, we split the available dataset
into 50% for training, 30% for testing and optimization and finally 20% for validation on
unseen data so that we can test how well our model performs against new data.
39
The test set and cross validation set have different purposes. If we drop either one, we
The cross validation set is used to help detect over-fitting and to assist in hyper-
We cannot use the cross validation set to measure performance of our model accurately,
because we would deliberately tune our results to get the best possible metric, over maybe
hundreds of variations of our parameters. The cross validation result is therefore likely to
be too optimistic. For the same reason, we cannot drop the cross validation set and use
the test set for selecting hyper parameters, because then we are pretty much guaranteed
to be overestimating how good our model is. In the ideal world we use the test set just
If we cross validate, find the best model, and then add in the test data to train, it
is possible (and in some situations perhaps quite likely) our model would be improved.
However, we have no way to be sure whether that has actually happened, and even if it
has, we do not have any unbiased estimate of what the new performance is.
As mentioned above, we perform four experiments, 1) All image types 2) T1 and T1c
image types 3) T2 image types, and finally 4) FLAIR image types. We transpose the 3D
image slices information coming from the MRI scans into 3 basic viewpoints, transverse
view, sagittal view and coronal view, see Figure 20 below. We perform our experiments
40
Figure 20. MRI 2D viewpoints, transverse, sagittal and coronal.
To perform optimization on our neural network, we follow the standard approach from
[19]: where we alternate between one gradient descent step on the discriminator D, then
one step on the generator G. As it was suggested in the original GAN paper, rather than
log 𝐷 𝑥, 𝐺 (𝑥, 𝑧) [19]. Added to that, we divide the objective by 2 while optimizing D,
which should slow down the rate at which D learns with respect to G. We utilize minibatch
stochastic gradient descent (SGD) and apply the Adam solver [20], we set the learning
rate to 0.0002, and also the momentum parameters 𝛽 = 0.5, 𝛽 = 0.999. During the
inference period, we run the generator network in precisely the same manner as during
the training phase. This is different from the usual protocol in that dropout is applied at
during testing, and then we apply batch normalization [17] using the statistics of the test
batch, rather than aggregated statistics of the training batch. This kind of approach to
batch normalization, when the batch size is set to 1, has been given the term “instance
normalization” and has been demonstrated to be very effective in image generation tasks
[21]. In our experiments, we use batch sizes between 1 and 5 depending on the experiment.
We also add a data generator to perform image augmentations, such that our inputs and
targets are rotated at angles of 90°, 180° 𝑎𝑛𝑑 270° about the origin. Augmentations
also include flipping the images in a top to bottom and right to left manner.
41
Chapter 5 Results and Discussion
Figure 21 shows the training curves for the four different experiment setups
configurations. The performance seems to differ based on the image type used in the
experiment. We explore the possible causes of this difference in Dice performance in the
(a)
(b)
42
Figure 21. FLAIR experiment (a) Training losses (b) Dice Coefficient Training.
As it can be seen from the graphs on Figure 21. Above, training, testing and validation
(a)
43
(b)
Figure 22. T2 experiment (a) Training losses (b) Dice Coefficient Training.
44
The FLAIR image type experiment is the one that gives the highest similarity score (Dice
coefficient) of 84.93%, as can be been seen from Figure 23 above, the model correctly
identifies the location of the tumor and outlines its boundaries very closely to what an
expert would have dove. We believe the reason the FLAIR image type experiment out
performs the others, is due to the tumor pixels being much brighter with respect to the
Consider Figure 24 below, it outlines the erroneous results from the other
experiments, such as T1 image types. The tumor may be correctly identified as you
progress through the slices, however due to the fact that the tumor is almost as bright as
the rest of the brain, some parts of the brain may be returned as part of the tumor, which
On other cases, some parts of the brain that show up just as bright as the tumor, tend to
45
make it to the segmentation result, this is because, in this work we have not built in a
mechanism for the model to detect brain parts and ignore them if they are not infected.
In some other cases, a complete miss occurs and a tumor is predicted yet there isn’t one,
or it is not detected at all even when it is present. Figure 26 shows the former and latter
respectively.
46
(a) (b)
Figure 26. A Sample Complete Miss.
It can be seen that for case a) the predicted availability of a tumor is due to the whole
brain structure being bright, yet we have established that our model seems to have learned
that, the somewhat bright structures in the images are possible tumors. For the second
case b) the whole brain image seems to be completely dark with an absence of any
outstanding brighter pixels, hence no tumor is predicted even though according to experts
Table 7 below, displays the individual average Dice scores for the all four experiments.
47
Table 7. Average Similarity Scores from Experiments.
It can be seen from the results that the FLAIR images have the best similarity score and
combining all the image types, greatly reduces the performance of the model. The bar
chart below outlines a better visual comparison of the experiments and the overall average
score.
Following that, we then try to see what actual human beings think of the experiment, so
ten participants from our lab were picked and given one of the image results after
sampling on the model. They were asked to give an approximate score of how similar
they think the segmentations produced by the model are compared to the ones created by
48
Table 8. Human Plausibilty Score.
The average of this test is approximately 7% below the average of the Dice score, and the
highest score on this test is approximately 1% below the highest score on the Dice score.
We finally perform a comparison between the related works we mentioned earlier in this
document, to see how our own work compares with similar studies. The results are shown
in Table 9 below.
49
We decided to pin our best model result from the FLAIR image types experiment. It
follows that our model ranks number 2 when compared with these similar studies. It worth
pointing out that these comparisons are on the whole tumor basis. On identifying things
like tumor core, the other studies performed really well, but our work was not focused on
such, it was all about the whole tumor, hence we make the comparison on that.
50
Chapter 6 Conclusions and Future Work
6.1 Conclusions
The proposed model is found to be capable of generating the image segmentation
masks given a 3D MRI image. The accuracy or similarity score at the current level only
good enough to identify the position of a tumor but it can not be used in an actual medical
environment, since very high precision is required when it to patient’s well being. It seems
using all MRI sequences types to train the model at once greatly diminishes the similarity
score of the generated image maks, hence it it better to tackle such a problem focusing on
one image type at a time. The FLAIR MRI sequence image type is the one well suited to
training for longer epochs. Further using the Dice coefficient as the loss function would
51
References
[2] X. Wang and A. Gupta. Generative image modeling using style and structure
[3] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer
[5] D. Yoo, N. Kim, S. Park, A. S. Paek, and I. S. Kweon. Pixellevel domain transfer.
ECCV, 2016.
[11] L. A. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis and the controlled
52
[12] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with
[14] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional
[15] C. Li and M. Wand. Combining markov random fields and convolutional neural
arXiv:1511.06434, 2015.
[17] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training
[20] D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.
glioma mri collections with expert segmentation labels and radiomic features,
53
[23] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B.
features for the pre-operative scans of the tcga-gbm collection, The Cancer
Multimodal MRI Using Random Forest with Superpixel and Tensor Based
Feature Extraction. In: Crimi A., Bakas S., Kuijf H., Menze B., Reyes M. (eds)
BrainLes 2017. Lecture Notes in Computer Science, vol 10670. Springer, Cham
[27] Guotai Wang, Wenqi Li, Sebastien Ourselin: “Automatic Brain Tumor
[28] Automatic segmentation of brain tumor from MR images using SegNet: selection
54
55