Sie sind auf Seite 1von 65

電資學院外國學生專班

碩士學位論文

Automatic Brain Tumor Segmentation with a


3-Dimensional Generative Adversarial
Neural Network.

研究生:Mpendulo Mamba

指導教授:黃有評

中華民國 107 年 1 月
ABSTRACT
Title : Automatic Brain Tumor Segmentation with a 3-Dimensional Generative

Adversarial Neural Network.

Pages: 54

School: National Taipei University of Technology (NTUT)

Department: International Master of Science in Electrical Engineering and Computer

Science (IMEECS)

Time: July 2018

Degree: Master

Author: Mpendulo Mamba

Advisor: Prof. Yo-Ping Huang

Keywords: Deep learning, tumor, magnetic resonance imaging (MRI), generative

adversarial network (GAN), loss function, high grade glioma (HGG), low grade

glioma (LGG)

Brain tumor segmentation is a very crucial task in medical image processing. Early

diagnosis of brain tumors plays an important role in improving treatment possibilities and

increases the survival rate of the patients. Manual segmentation of the brain tumors for

cancer diagnosis, from large amounts of magnetic resonance images (MRI) generated in

clinical routine, is a difficult and time consuming task. There is a need for automatic brain

image segmentation. In this work, we demonstrate a deep neural network for volumetric

segmentation that learns from a series of annotated volumetric images given in the

Neuroimaging Informatics Technology Initiative (NIfTI) format. Recently, automatic

segmentation using deep learning methods proved effective since these methods achieve

state-of-the-art results and can address the problem better than other methods. Deep

learning methods can also enable efficient processing and objective evaluation of the large

i
amounts of MRI-based images. We investigate 3D conditional adversarial networks as a

novel solution to 3D image segmentation for medical segmentation problems. These

networks not only learn the mapping from input images to output images, but also learn

a loss function to train the mapping between them. This makes it possible to apply the

same generic approach to problems that traditionally would require very different loss

formulations. We show that this method is effective at generating slices of segmentation

data from 3D labelled maps. We utilize a dataset from the medical image computing and

computer assisted intervention (MICCAI), which consists of MRI scans of high-grade

gliomas (HGG) which are tumors of the central nervous system and low-grade gliomas

(LGG) which are referred to as slow-growing tumors. The proposed model is able to

discriminate between well segmented and poorly segmented images and the

generative model can create segmentation image masks around the tumors and

achieves an 84.93% dice score when compared with the dataset.

ii
ACKNOWLEDGMENTS

I would like to express my sincere gratitude to all those who supported and helped

me while I was writing this thesis. Special thankfulness to my advisor, Professor Yo-

Ping Huang, for his guidance and support. Professor Yo-Ping Huang is an incredible

mentor through his invaluable academic and professional supervision. I deeply

appreciate him for the useful suggestions, comments, remarks, genuine concern for

my well-being and support through the learning process of this thesis and my overall

stay in the school.

On the financial support for my study at Taipei Tech, I would like to thank the

government of the Kingdom of Eswatini, the Ministry of Education and Royal Science

and Technology Park (RSTP) in Eswatini for providing the master degree scholarship

in Taiwan.

My truthful appreciation to all my colleagues and lab mates for the support,

guidance, inspiration, freedom and a wonderful working environment, Nontobeko,

Basanta, Howard, James, Yu an, Peter, Danny, Eric, Justin, Lungile, Mthunzi,

Trizaurah, Chun Ming, Sakhile, Mluleki, Mduduzi, Lucky, Likhanyiso, Muzi-wandile,

all Eswatini friends, all Taiwanese friends, and foreign friends. Not only it is an honor

to work in such a wonderful place, but also it is such a privilege for me to live here

amongst you guys.

iii
Finally, and most I would love thank to my incredible parents, family, my best

friend Lwazi and my girlfriend Gugu whose inspiration always patiently, morally,

emotionally and financially support me.

Siyabonga Kakhulu!

Contents
ABSTRACT .................................................................................................................. i

ACKNOWLEDGMENTS ........................................................................................... iii

List of Figures ........................................................................................................... vi

List of Tables ............................................................................................................ vii


Chapter 1 Introduction...................................................................................................1

1.1 Background .....................................................................................................1

1.2 Related Work ...................................................................................................5

1.3 Research Objective ..........................................................................................7

1.4 Limitations.......................................................................................................8

1.5 Thesis Development.........................................................................................8

Chapter 2 Literature Review ................................................................................... 10

2.1 Glioma ........................................................................................................... 10

2.2 Magnetic Resonance Imaging (MRI).............................................................. 14

2.3 Deep Learning ............................................................................................... 19

2.3.2 Deep Learning Algorithms ........................................................................20

2.3.3 Convolution Layer ....................................................................................21

2.3.4 Subsampling Layer ................................................................................... 23

iv
2.3.5 Fully Connected Layer.............................................................................. 24

Chapter 3 Proposed Methods ....................................................................................... 25

3.1 Model Architecture ........................................................................................ 25

3.1.1 Pre-Processing Layer ................................................................................ 26

3.1.2 Generator .................................................................................................. 28

3.1.3 Discriminator ............................................................................................31

3.1.1 DCGAN ..................................................................................................... 33

3.2 Evaluation Methods .......................................................................................34

3.2.1 Objective Function .................................................................................... 34

3.2.2 Loss Function ............................................................................................35

3.2.3 Dice Coefficient .........................................................................................36

3.2.4 Perceptual Evaluation ............................................................................... 36

Chapter 4 Experimental Setup and Training ................................................................. 38

4.1 Experimental Setup ........................................................................................ 38

Chapter 5 Results and Discussion ................................................................................ 42

Chapter 6 Conclusions and Future Work ......................................................................51

6.1 Conclusions ................................................................................................... 51

6.2 Future Work ................................................................................................... 51

References................................................................................................................... 52

v
List of Figures
Figure 1. Example of an HGG Tumor ............................................................................2

Figure 2. Example of an LGG Tumor ............................................................................3

Figure 3. Low-grade brain glioma in a 28-year-old male.............................................. 11

Figure 4. Patient being positioned for MR study of the head and abdomen .................. 15

Figure 5. Schematic of construction of a cylindrical superconducting MR scanner. ...... 16

Figure 6. Examples of T1 weighted, T2 weighted and PD weighted MRI scans ........... 18

Figure 7. Restricted Boltzmann Machines....................................................................20

Figure 8. Convolutional Neural Network Architecture .................................................21

Figure 9. Image Representation and Convolutional Matrix .......................................... 22

Figure 10. Example Convolution Calculation ..............................................................23

Figure 11. Example subsampling layer ................................................................. 24

Figure 12. Proposed GAN Architecture ................................................................ 25

Figure 13. Dataset file naming structure ....................................................................26

Figure 14. MRI 2D slices after preprocessing from 3D file ..........................................27

Figure 15. Auto-encoder and U-net Architecture .......................................................... 28

Figure 16. 68 Layer Deep Neural Network Generator .................................................. 29

Figure 17. 6 Layer Deep Neural Network Discriminator ............................................ 31

Figure 18. Deep Conditional GAN (DCGAN) .............................................................33

Figure 19. Different sections of tumor annotated in colors ...........................................37

Figure 20. MRI 2D viewpoints, transverse, sagittal and coronal ................................... 39

Figure 21. FLAIR experiment (a) Training losses (b) Dice Coefficient Training ........ 41

Figure 22. T2 experiment (a) Training losses (b) Dice Coefficient Training. ................ 42

Figure 23. A Sample Result From the FLAIR Experiment ........................................... 43

Figure 24. A Sample Erroneous Result from the T1 Experiment ..................................44

vi
Figure 25. A Sample Brain Structure Mistaken for Tumor............................................ 45

Figure 26. A Sample Complete Miss ............................................................................ 46

Figure 27. Bar Chart Comparison of all Experiments ................................................... 47

List of Tables
Table 1. Standard Display of MRI Images ................................................................... 18

Table 2. Summary of MRI sequences in the dataset ..................................................... 27

Table 3. Generator - Layer Implementation Details ...................................................... 30

Table 4. Discriminator Layer Implementation Details .................................................. 32

Table 5. DCGAN Layer Implementation Details..........................................................34

Table 6. Dataset Training, Testing and Validation Ratio ............................................... 38

Table 7. Average Similarity Scores from Experiments ................................................. 47

Table 8. Human Plausibilty Score ................................................................................48

Table 9. Comparison with Related Studies ................................................................... 48

vii
viii
Chapter 1 Introduction

1.1 Background
Over the last few decades, the rapid development of noninvasive brain imaging

technologies has opened new horizons in analyzing and studying the brain anatomy and

function. Enormous progress in accessing brain injury and exploring brain anatomy has

been made using magnetic resonance imaging (MRI). The advances in brain MR imaging

have also provided large amount of data with an increasingly high level of quality. The

analysis of these large and complex MRI datasets has become a tedious and complex task

for clinicians, who have to manually extract important information. This manual analysis

is often time-consuming and prone to errors due to various inter- or intra-operator

variability studies. These difficulties in brain MRI data analysis required inventions in

computerized methods to improve disease diagnosis and testing. Nowadays,

computerized methods for MR image segmentation, registration, and visualization have

been extensively used to assist doctors in qualitative diagnosis. Brain MRI segmentation

is an essential task in many clinical applications because it influences the outcome of the

entire analysis. This is because different processing steps rely on accurate segmentation

of anatomical regions. For example, MRI segmentation is commonly used for measuring

and visualizing different brain structures, for delineating lesions, for analyzing brain

development, and for image-guided interventions and surgical planning. This diversity of

image processing applications has led to development of various segmentation techniques

of different accuracy and degree of complexity. These techniques can be used to perform

automatic segmentation on patient scans with conditions that are caused by tumors, such

as dementia. Brain tumors can be classified into two types for purposes of this study, high-

grade gliomas (HGG) or low-grade gliomas (LGG). Highly malignant or high-grade

1
gliomas (HGG) are tumors of the central nervous system (CNS). They are solid tumors

arising from transformed cells of the brain and/or the spinal cord. Since they directly

originate from the CNS, they are also called primary CNS tumors, thereby differentiating

them from malignant tumors of other organs that have spread (metastasized) to the CNS.

HGG in children and adolescents are rare. However, they show considerably

malignant behavior since they usually grow fast and frequently destroy healthy brain

tissue. By being able to migrate within the CNS for various centimeters, HGG can induce

the development of new tumors. Without the appropriate therapy, HGG can be lethal

within only a few months. Due to the usually rapid and infiltrating growth of these tumors,

treatment is difficult. High-grade gliomas account for approximately 15 to 20 % of CNS

tumors in children and adolescents. They appear in all age groups; yet, children aged

younger than three years are rarely affected. Each year, about 60 to 80 children and

adolescents younger than 15 years of age are newly diagnosed with a high-grade glioma.

This corresponds to an incidence rate of 5 to 10 new diagnoses per 1.000.000 children per

year. Boys and girls are almost equally affected.

Figure 1. Example of an HGG tumor.

2
Low-grade gliomas are tumors of the CNS as well. They are solid tumors arising from

malignantly transformed cells of the brain or spinal cord. Since they develop directly from

CNS cells, they are also called primary CNS tumors in order to distinguish them from

cancers of other body parts that have spread to the CNS (metastasis).

Low-grade gliomas can be found in all parts of the nervous system, most of them, however,

are situated in the cerebellum and the central regions of the cerebrum, such as the optic

pathway (optic pathway gliomas) and the hypothalamic-pituitary axis. They usually grow

very slowly. Nevertheless, for a growing lesion, the space in the bony skull is limited. As

a consequence, vital areas of the brain may be damaged by the space occupying, growing

tumor. Therefore, low-grade gliomas can become life threatening in the course of the

disease. With a ratio of about 30 to 40 %, low-grade gliomas are the most common CNS

tumors among children and adolescents. They occur at all ages with a mean age at

diagnosis between five and seven years. In Germany, about 250 children and adolescents

under 18 years of age are newly diagnosed with low-grade glioma each year. This

corresponds to an incidence rate of 2 to 3 per 100000 children. There is a slight male

preponderance (gender ratio: 1.2 to 1).

Figure 2. Example of an LGG tumor.

3
Compared to traditional segmentation methods, deep learning does not rely on the

generation of handcrafted features to distinguish tumor from normal brain anatomy.

Instead, raw image intensities are taken as input subjected to many layers of convolutions,

to calculate an output signal. The many degrees of freedom and inclusion of non-

linearities allow the algorithm to learn complex patterns with a high level of abstraction.

Up until this point many of the deep learning algorithms that have been applied to brain

tumor segmentation have been 2D Convolutional Neural Networks (CNNs), which do not

take advantage of the full breadth of volumetric information. Recently, there has been an

increase in popularity of 3D CNNs [22, 23], which have been shown to be effective for

this task, though at the expense of additional computational complexity. For example, the

3D U-Net architecture was successfully applied for the segmentation of the Xenopus

kidney, a complex and highly variable structure [23].

The MICCAI 2018 BraTS Challenge challenges participants to develop a fully

automatic or semi-automatic multi-modality tumor segmentation tool for enhancing, non-

enhancing, and edema in glioblastoma patients [24]. The 2018 BraTs Challenge patient

cohort includes glioblastoma and low-grade glioma pre-operative patients. Participants

are provided with coregistered and skullstripped T2, pre-contrast T1, post-contrast-T1,

and FLAIR images, and are then asked to generate segmentations that can then be

compared against ground-truth segmentations of edema, enhancing tumor, and non-

enhancing tumor. Ground-truth segmentations are manually drawn by one to four raters

and then approved by expert neuro-radiologists. Our proposed segmentation method for

BraTS 2018 involves the training of several 3D generative adversarial network (GAN).

The result is a fully automatic segmentation model requiring no additional data aside from

four coregistered input modalities and a practical computation time for batch processing.

4
1.2 Related Work
A study that explains the use of a fully convolutional neural network (FCNN) for

segmentation of gliomas on Magnetic Resonance Images (MRI) was conducted in this

paper [25]. They were able to achieve a fully automatic voxel based classification by

training a 23 layer deep FCNN on 2-D slices that were obtained from patient volumes.

Their model was trained on slices obtained from 130 patients and was validated on 50

patients. The false positives in segmentation map generated by their FCNN were removed

by connected component analysis. On the MICCAI BraTS 2017 validation dataset, this

model achieved an average whole tumor, tumor core and active dice score of 0.83, 0.69

and 0.69 respectively.

In another work from this paper [ref] they present a solution for brain tumor

segmentation in from MICCAI BraTS 2017 dataset. They used three different

convolutional neural networks with the same 3D U-Net architecture, they were trained

for each of the tumor segmentation targets (whole tumor, tumor core and enhancing tumor)

with 3D patches as inputs. The preprocessing step was done in each case separately, that

allowed for an equalizing histogram on whole tumor and normalizing voxels on all

modalities. This solution led to Dice coefficients of 0.91, 0.9118 and 0.827 after testing

with 30% of the training set and 0.8844, 0.7674 and 0.7261 on the leaderboard validation

set (respectively for each segmentation target) provided by MICCAI for validation.

In another study focused on the competition provided by MICCAI 2017 BraTS

which challenges participants to develop a fully automatic or semi-automatic multi-

modality tumor segmentation tool for enhancing, non-enhancing, and edema in

glioblastoma patients, the team in this paper [ref] entered to the competition with a fully

automatic pipeline that involves chaining together several unique 3D U-Nets, a type of

3D patch-based convolutional neural network. Their pipeline takes advantage of the prior

5
knowledge from previous studies that enhancing and nonenhancing tumors are likely to

be found within regions of edema and within proximity to each other by feeding the

prediction outputs of earlier networks into later networks. They achieved greater context

for this patch-based sampling method by predicting downsampled labels and then

upsampling them using a separate 3D U-Net. They used a fine-tuning network and a

candidate evaluation network to account for tissue border discrepancies and catastrophic

segmentation failure. Early results for an unoptimized version of this pipeline on

validation data with unknown ground truth segmentations had average dice coefficients

of 0.78, 0.67, and 0.68 for whole tumor, enhancing, and non-enhancing tissue respectively.

In another study, they realized that identification and localization of brain tumor

tissues plays an important role in diagnosis and treatment planning of gliomas [26], hence

fully automated superpixel wise tumor tissue segmentation algorithm using random forest

was proposed and implemented in their study. They extracted features for the random

forest classifier by constructing a tensor from multi-parametric MRI data and applying

multi-linear singular value decomposition. The method was trained and tested on high

grade glioma (HGG) patients from the BRATS 2017 training dataset. It achieved a

performance of 83%, 76% and 78% Dice scores for whole tumor, enhancing tumor and

tumor core, respectively.

A cascade of fully convolutional neural networks was proposed to segment multi-

modality MR images with brain tumor into background and three subregions: enhanced

tumor core, whole tumor and tumor core in this study [27]. The cascade was designed to

decompose the multi-class segmentation into a sequence of three binary segmentations

according to the sub region hierarchy. Segmentation of the first (second) step is used as a

binary mask for the second (third) step. Each network consists of multiple layers of

anisotropic and dilated convolution filters that were obtained by training each network

6
end-to-end. Residual connections and multi-scale predictions were employed in these

networks to boost the segmentation performance. Experiments on BRATS 2017 online

validation set predicted average Dice scores of 0.764, 0.897, 0.825 for enhanced tumor

core, whole tumor and tumor core respectively.

A SegNet is a deep encoder-decoder architecture for multi-class pixelwise

segmentation researched and developed by members of the Computer Vision and

Robotics Group at the University of Cambridge, UK. In a study from this work [28], they

developed an automatic segmentation of brain tumor using a SegNet, a method based on

a two-dimensional convolutional neural network and used the HGG data sets (n = 210)

of BraTS 2017 for network training. They compared training schemes including or

excluding slices without labeled tumor regions. The input images were FLAIR images.

From the results, the dice similarity coefficients (dice=0.74, 2-fold cross-validation)

obtained with training data sets excluding slices without labeled tumor region was

significantly higher than those obtained with all slices (P<0.05, paired T- test). In the

preliminary results, they were able to perform fully automated segmentations of whole

tumor region using SegNet.

1.3 Research Objective

Many studies have been discussed in the previous section regarding brain tumor

segmentation methods with their performance scores. With the aim of improving those

performance scores, in this research we proposed the use of the emerging innovation of

generative adversarial networks (GANs), which are a variant architecture of deep neural

networks, which improve their performance by means of a computational battle. Here,

we use our own architecture of the GAN derived from the original GAN paper [19]. Due

to the incredible precision required in medical procedures, we chose the use of a U-Net

architecture for our generative half of the model, which is known to eliminate bottlenecks
7
that bring about loss of data in basic auto-encoders [7]. Moreover, we apply a Markovian

Patch discriminator for the second half our GAN, this allows to feed a whole image slice

as a patch during classification [9]. The purpose of this study is to develop an AI model

for MRI brain tumor segmentation, which can be used by experts in the medical field to

speed up and verify their processes of identifying and segmenting tumors and possibly

diagnose them in their early stages. Also, due to the nature of GANs we expect to obtain

a trained model that can not only generate segmentations but can identify incorrectly

segmented images.

1.4 Limitations
There are several limitations in this research and they are as follows:

1. This research only focuses on brain tumor segmentation based on MRI scans

only.

2. The MRI scans are transposed to one view point, the transverse view only.

3. The research dataset comes annotated such that enhanced tumor core, whole

tumor and tumor core can all be trained upon, we only focus on whole tumor.

4. This research focuses on HGG and LGG tumor types only.

5. This research applies only deep machine learning techniques to try and tackle

the problem.

1.5 Thesis Development


The thesis will lead the reader all the way through the stages of the research

development. The outline of this thesis is organized as follows. Chapter 2 presents

literature review and related technology on the subject of brain tumor segmentation.

Chapter 3 provides the proposed methodology that has been done in this research. The

experimental setup and training of the proposed model is discussed in Chapter 4. Chapter

8
5 covers our results and discussions. As a final point, Chapter 6 covers the conclusions

and suggestions for future improvements.

9
Chapter 2 Literature Review

2.1 Glioma
A glioma is a type of tumor that starts in the glial cells of the brain or the spine.

Gliomas comprise about 30 per cent of all brain tumors and central nervous system tumors,

and 80 per cent of all malignant brain tumors.

2.1.1 Classification

Gliomas are classified by cell type, by grade, and by location.

By type of cell: Gliomas are named according to the specific type of cell with which they

share histological features, but not necessarily from which they originate. The main types

of gliomas are:

 Ependymomas: ependymal cells

 Astrocytoma: astrocytes (glioblastoma multiforme is a malignant astrocytoma

and the most common primary brain tumor among adults).

 Oligodendrogliomas: oligodendrocytes

 Brainstem glioma: develop in the brain stem

 Optic nerve glioma: develop in or around the optic nerve

 Mixed gliomas, such as oligoastrocytomas, contain cells from different types of

glia

By grade: Gliomas are further categorized according to their grade, which is determined

by pathologic evaluation of the tumor. The neuropathological evaluation and diagnostics

of brain tumor specimens is performed according to WHO Classification of Tumors of

the Central Nervous System.

10
Figure 3. Low-grade brain glioma in a 28-year-old male.

 Low-grade gliomas [WHO grade II] are well-differentiated (not anaplastic); these

tend to exhibit benign tendencies and portend a better prognosis for the patient.

However, they have a uniform rate of recurrence and increase in grade over time

so should be classified as malignant.

 High-grade [WHO grade III–IV] gliomas are undifferentiated or anaplastic; these

are malignant and carry a worse prognosis.

Of numerous grading systems in use, the most common is the World Health

Organization (WHO) grading system for astrocytoma, under which tumors are graded

from I (least advanced disease—best prognosis) to IV (most advanced disease—worst

prognosis).

By location: Gliomas can be classified according to whether they are above or below a

membrane in the brain called the tentorium. The tentorium separates

the cerebrum (above) from the cerebellum (below).

 The supratentorial is above the tentorium, in the cerebrum, and mostly found in

adults (70%).

 The infratentorial is below the tentorium, in the cerebellum, and mostly found in

children (70%).

11
 The pontine tumors are located in the pons of the brainstem. The brainstem has

three parts (pons, midbrain, and medulla); the pons controls critical functions such

as breathing, making surgery on these extremely dangerous.

2.1.2 Pathophysiology

High-grade gliomas are highly vascular tumors and have a tendency to infiltrate.

They have extensive areas of necrosis and hypoxia. Often, tumor growth causes a

breakdown of the blood–brain barrier in the vicinity of the tumor. As a rule, high-grade

gliomas almost always grow back even after complete surgical excision, so are commonly

called recurrent cancer of the brain. Conversely, low-grade gliomas grow slowly, often

over many years, and can be followed without treatment unless they grow and cause

symptoms. Several acquired (not inherited) genetic mutations have been found in

gliomas. Tumor suppressor protein 53 (p53) is mutated early in the disease, p53 is the

"guardian of the genome", which, during DNA and cell duplication, makes sure the DNA

is copied correctly and destroys the cell (apoptosis) if the DNA is mutated and cannot be

fixed. When p53 itself is mutated, other mutations can survive. Phosphatase and tensin

homolog (PTEN), another tumor suppressor gene, is itself lost or mutated. Epidermal

growth factor receptor, a growth factor that normally stimulates cells to divide, is

amplified and stimulates cells to divide too much. Together, these mutations lead to cells

dividing uncontrollably, a hallmark of cancer. Recently, mutations

in IDH1 and IDH2 were found to be part of the mechanism and associated with a more

favorable prognosis.

2.1.3 Prognosis

Gliomas are rarely curable. The prognosis for patients with high-grade gliomas is

12
generally poor, and is especially so for older patients. Of 10,000 Americans diagnosed

each year with malignant gliomas, about half are alive one year after diagnosis, and 25%

after two years. Those with anaplastic astrocytoma survive about three years.

Glioblastoma multiforme has a worse prognosis with less than a 12-month average

survival after diagnosis, though this has extended to 14 months with more recent

treatments.

Low grade: For low-grade tumors, the prognosis is somewhat more optimistic. Patients

diagnosed with a low-grade glioma are 17 times as likely to die as matched patients in the

general population. The age-standardized 10-year relative survival rate was 47%. One

study reported that low-grade oligodendroglioma patients have a median survival of 11.6

years; another reported a median survival of 16.7 years.

High grade: This group comprises anaplastic astrocytomas and glioblastoma multiforme.

Whereas the median overall survival of anaplastic (WHO grade III) gliomas is

approximately 3 years, glioblastoma multiforme has a poor median overall survival of c.

15 months.

Diffuse intrinsic pontine glioma: Diffuse intrinsic pontine glioma primarily affects

children, usually between the ages of 5 and 7. The median survival time with DIPG is

under 12 months. Surgery to attempt tumour removal is usually not possible or advisable

for DIPG. By their very nature, these tumours invade diffusely throughout the brain stem,

growing between normal nerve cells. Aggressive surgery would cause severe damage to

neural structures vital for arm and leg movement, eye movement, swallowing, breathing,

and even consciousness. Trials of drug candidates have been unsuccessful. The disease is

primarily treated with radiation therapy alone.

IDH1 and IDH2-mutated glioma: Patients with glioma carrying mutations in

either IDH1 or IDH2 have a relatively favorable survival, compared with patients with

13
glioma with wild-type IDH1/2 genes. In WHO grade III glioma, IDH1/2-mutated glioma

have a median prognosis of approximately 3.5 years, whereas IDH1/2 wild-type glioma

perform poor with a median overall survival of c. 1.5 years. In glioblastoma, the

difference is larger. There, IDH1/2 wild-type glioblastoma have a median overall survival

of 1 year, whereas IDH1/2-mutated glioblastoma have a median overall survival of more

than 3 years.

2.2 Magnetic Resonance Imaging (MRI)


Magnetic resonance imaging (MRI) is a safe and painless test that uses a magnetic

field and radio waves to produce detailed pictures of the body's organs and structures. An

MRI differs from a CAT scan (also called a CT scan or a computed axial tomography

scan) because it doesn't use radiation. An MRI scanner consists of a large doughnut-

shaped magnet that often has a tunnel in the center. Patients are placed on a table that

slides into the tunnel. Some centers have open MRI scanners that have larger openings

and are helpful for patients with claustrophobia. MRI scanners are located in hospitals

and radiology centers. During the examination, radio waves manipulate the magnetic

position of the atoms of the body, which are picked up by a powerful antenna and sent to

a computer. The computer performs millions of calculations, resulting in clear, cross-

sectional black-and-white images of the body. These images can be converted into three-

dimensional (3-D) pictures of the scanned area. These images help to pinpoint problems

in the body.

2.2.1 Why it is done

MRI is used to detect a variety of conditions, including problems of the brain, spinal cord,

skeleton, chest, lungs, abdomen, pelvis, wrists, hands, ankles, and feet. In some cases, it
14
can provide clear images of body parts that can't be seen as well with an X-ray, CAT scan,

or ultrasound. MRI is particularly valuable for diagnosing problems with the eyes, ears,

heart, and circulatory system.

An MRI's ability to highlight contrasts in soft tissue makes it useful in deciphering

problems with joints, cartilage, ligaments, and tendons. MRI can also be used to identify

infections and inflammatory conditions or to rule out problems such as tumors.

Figure 4. Patient being positioned for MR study of the head and abdomen

2.2.2 Construction and Physics

To perform a study, the person is positioned within an MRI scanner that forms a strong

magnetic field around the area to be imaged. In most medical applications, protons

(hydrogen atoms) in tissues containing water molecules create a signal that is processed

to form an image of the body. First, energy from an oscillating magnetic field

temporarily is applied to the patient at the appropriate resonance frequency. The excited

hydrogen atoms emit a radio frequency signal, which is measured by a receiving coil.

15
The radio signal may be made to encode position information by varying the main

magnetic field using gradient coils. As these coils are rapidly switched on and off they

create the characteristic repetitive noise of an MRI scan. The contrast between different

tissues is determined by the rate at which excited atoms return to the equilibrium state.

Exogenous contrast agents may be given to the person to make the image clearer. The

major components of an MRI scanner are the main magnet, which polarizes the sample,

the shim coils for correcting shifts in the homogeneity of the main magnetic field, the

gradient system which is used to localize the MR signal and the RF system, which

excites the sample and detects the resulting NMR signal. The whole system is controlled

by one or more computers.

Figure 5. Schematic of construction of a cylindrical superconducting MR scanner.

16
MRI requires a magnetic field that is both strong and uniform. The field strength of the

magnet is measured in teslas – and while the majority of systems operate at 1.5 T,

commercial systems are available between 0.2 and 7 T. Most clinical magnets are

superconducting magnets, which require liquid helium. Lower field strengths can be

achieved with permanent magnets, which are often used in "open" MRI scanners for

claustrophobic patients. Recently, MRI has been demonstrated also at ultra-low fields,

i.e., in the microtesla-to-millitesla range, where sufficient signal quality is made

possible by prepolarization (on the order of 10-100 mT) and by measuring the Larmor

precession fields at about 100 microtesla with highly sensitive superconducting

quantum interference devices (SQUIDs).

2.2.3 T1 and T2

Each tissue returns to its equilibrium state after excitation by the independent relaxation

processes of T1 (spin-lattice; that is, magnetization in the same direction as the static

magnetic field) and T2 (spin-spin; transverse to the static magnetic field). To create a T1-

weighted image, magnetization is allowed to recover before measuring the MR signal by

changing the repetition time (TR). This image weighting is useful for assessing the

cerebral cortex, identifying fatty tissue, characterizing focal liver lesions and in general

for obtaining morphological information, as well as for post-contrast imaging. To create

a T2-weighted image, magnetization is allowed to decay before measuring the MR signal

by changing the echo time (TE). This image weighting is useful for detecting edema and

inflammation, revealing white matter lesions and assessing zonal anatomy in the prostate

and uterus.

17
Figure 6. Examples of T1 weighted, T2 weighted and PD weighted MRI scans.

The standard display of MRI images is to represent fluid characteristics in black and white

images, where different tissues turn out as follows:

Table 1. Standard Display of MRI images.

Signal T1-weighted T2-weighted


High  Fat  More water content, as in
 Subacute hemorrhage edema, tumor
 Melanin  Extracellularly located
 Protein-rich fluid methemoglobin in subacute
 Slowly flowing blood hemorrhage
 Paramagnetic substances,
such as gadolinium
 Cortical pseudolaminar
necrosis
Intermediate Gray matter darker than white White matter darker than grey
matter matter
Low  Bone  Bone
 Urine  Air
 CSF  Fat
 Air  Low proton density, as in
 More water content, as in calcification and fibrosis
edema, tumor  Paramagnetic material, such
 Low proton density as in as deoxyhemoglobin
calcification  Protein-rich fluid

18
2.3 Deep Learning
Deep learning is a subfield of machine learning, which aims to learn a hierarchy of

features from input data. Nowadays, researchers have intensively investigated deep

learning algorithms for solving challenging problems in many areas such

as image classification, speech recognition, signal processing, and natural language

processing.

2.3.1 Deep Learning Methods

Deep learning methods are a group of machine learning methods that can

learn features hierarchically from lower level to higher level by building

a deep architecture. The deep learning methods have the ability to automatically

learn features at multiple levels, which makes the system be able to learn complex

mapping function directly from data, without the help of the human-crafted features.

The most characterizing feature of deep learning methods is that their models all have

deep architectures. A deep architecture means it has multiple hidden

layers in the network. In contrast, a shallow architecture has only a few hidden

layers (1 to 2 layers). Deep Convolutional Neural networks are successfully applied in

various areas. Regression, Classification, dimensionality reduction, modeling motion,

modeling textures, information retrieval, natural language processing, robotics, fault

diagnosis, and road crack detection.

19
2.3.2 Deep Learning Algorithms

Deep learning algorithms have been extensively studied in recent years. As a

consequence, there are a large number of related approaches. Generally speaking, these

algorithms can be grouped into two categories based on their architectures:

1. Restricted Boltzmann machines (RBMs).

2. Convolutional neural networks (CNNs).

Restricted Boltzmann machines (RBMs): RBM is an energy-based probabilistic

generative model. It is composed of one layer of visible units and one layer of hidden

units. The visible units represent the input vector of a data sample and the hidden units

represent features that are abstracted from the visible units. Every visible unit is connected

to every hidden unit, whereas no connection exists within the visible layer or hidden layer.

Figure 7 illustrates the graphical model of restricted Boltzmann machine.

Figure 7. Restricted Boltzmann Machines.

Convolutional neural networks (CNNs): During the last seven years, the quality of

image classification and object detection has been dramatically improved due to the deep
20
learning method. Convolutional neural networks (CNNs) brought a revolution in the

computer vision area. It not only have been continuously advancing the image

classification accuracy, but also play an important role for generic feature extraction such

as scene classification, object detection, semantic segmentation, image retrieval, and

image caption. Convolutional neural network (CNNs) is one of the most

powerful classes of deep neural networks in image processing tasks. It is

highly effective and commonly used in computer vision applications. The convolution

neural network contains three types of layers: convolution layers, subsampling layers, and

full connection layers. The whole architecture of the convolutional neural network is

shown in Figure 8. A brief introduction to each type of layer is provided in the following

paragraphs.

Figure 8. Convolutional Neural Network Architecture.

2.3.3 Convolution Layer

As Figure 8 shows, in convolution layer, the left matrix is the input, which is a

digital image, and the right matrix is a convolution matrix. The convolution layer takes

21
the convolution of the input image with the convolution matrix and generates the output

image.

Figure 9. Digital Image Representation and Convolutional Matrix.

Usually, the convolution matrix is called filter and the output image is called filter

response or filter map. An example of convolution calculation is demonstrated in Figure

9. Each time, a block of pixels is convoluted with a filter and generates a pixel in a new

image.

22
Figure 10. Example convolution calculation.

2.3.4 Subsampling Layer

The subsampling layer is an important layer to the convolutional neural network.

This layer is mainly to reduce the input image size in order to

give the neural network more invariance and robustness. The most used method

for subsampling layer in image processing tasks is max pooling. So the subsampling layer

is frequently called max pooling layer. The max pooling method is shown in Figure 10.

The image is divided into blocks and the maximum value of each block is the

corresponding pixel value of the output image. The reason to use

subsampling layer is as follows. First, the subsampling layer has fewer parameters and

it is faster to train. Second, a subsampling layer makes convolution layer tolerate

translation and rotation among the input pattern.

23
Figure 11. Example subsampling layer.

2.3.5 Fully Connected Layer

Full connection layers are similar to the traditional feed-forward neural layer, see

Figure 11. They make the neural network fed forward into vectors with a predefined

length. We could fit the vector into certain categories or take it as a representation vector

for further processing.

24
Chapter 3 Proposed Methods
3.1 Model Architecture
The model we have designed for the task of autonomous brain segmentation has at its

core the basic design of a conditional generative adversarial network. Conditional GANs

have the structure illustrated in the Figure 12 below:

Figure 12. Proposed GAN Architecture.

Our model builds off of the GAN architecture in a pretty different way. In the GAN

structure we have our generator G and a discriminator D, which are trained in as

adversarial manner. The generator is trained to create realistic image masks from a noise

input z, and the discriminator is trained to differentiate between the real image mask x and

those produced by the generator G(x). Using the feedback from the discriminator, our

generator can improve its ability to produce images so as to trick the discriminator into

classifying them as real in the future. Doing so produces more accurate image masks for

segmentation.

25
The most obvious change when going from a traditional GAN to our model is that instead

of a noise vector z, the generator is fed an actual image x, which we want to convert into

another structurally similar image mask y. Our generator should now produce G(x), which

we want to eventually be indistinguishable from y.

In addition to the traditional GAN losses, we also apply an L1 loss, which is just a pixel-

wise absolute value loss on the generated image masks. In this situation, we force the

generator to approximate G(x) = y with the additional loss:

𝐿1 = |𝐺(𝑥) − 𝑦|.

In a traditional GAN we would never apply such a loss because it would prevent the

generator from producing new images. In the case of image translation however, we care

about precise image translations rather than new ones. This need for precise images is

also why we don’t entirely throw away the GAN aspect of our network. An L1 loss by

itself would produce blurry or washed-out images by virtue of attempting to generate

images that are “on average” correct. By keeping the GAN losses, we encourage the

network to produce crisp images that are visually indistinguishable from the real ones.

3.1.1 Pre-Processing Layer

The files are organized as shown in Figure 13 below from the file system.

Figure 13. Dataset file naming structure.

26
Table 2 below outlines differences in the file types in the dataset.

Table 2. Summary of MRI sequences in the dataset.

TR(msec) TE(msec)

T1-Weighted 500 14

(short TR and TE)

T2-Weighted 4000 90

(long TR and TE)

Flair 9000 114

(very long TR and TE)

Figure 14. MRI 2D slices after preprocessing from 3D file.

27
3.1.2 Generator

While trying to ensure accurate images, the third addition to our model is the

implementation of a U-Net architecture in the generator. Put simply, the U-Net is an auto-

encoder in which the outputs from the encoder-half of the network are concatenated with

their mirrored counterparts in the decoder-half of the network. By including some of these

skip connections, we prevent the middle part of the network from becoming a bottleneck

on the nature of information flow.

Figure 15. Auto-encoder and U-net Architecture.

In the case of our model, our input is the image x that we want to convert and the output

G(x) is the image we want it to become. By concatenating mirrored layers, we are able to

ensure that the structure of the original image is passed over to the decoder-half of the

network directly. When thinking about the task of segmentation, the representations

learned at each scale of the encoder are extremely useful for the decoder in terms of

providing the structure of the segmented image mask.

So taking all the points mentioned above about the design of our generator model, we

came up with a 68 layer deep neural network implementation, which we can quickly go

over by looking at the Figure 16 below:


28
Figure 16. 68 Layer Deep Neural Network Generator.

Due to the complexity of the model, we have color coded the layers and provided legends

on the bottom right of the image for reference. Table 3 is also provided below with

implementation details of the layers, please note that the table below is not complete as

this is a summary of the model, only the first 10 and the last layer are shown.

Table 3. Generator - Layer Implementation Details.

Layer Layer Name Layer Type Input Shape Output Shape

1 Unet_input Input Layer (n, 1, 256, 256) (n, 1, 256, 256)

2 Convolution_1 Convolution Layer (n, 1, 256, 256) (n, 64, 128, 128)

3 Leaky_Relu_1 Leaky Relu (n, 64, 128, 128) (n, 64, 128, 128)

29
4 Convolution_2 Convolution Layer (n, 64, 128, 128) (n, 128, 64, 64)

5 Batch_Norm_2 Batch Normalization (n, 128, 64, 64) (n, 128, 64, 64)

6 Leaky_Relu_2 Leaky Relu (n, 128, 64, 64) (n, 128, 64, 64)

7 Convolution_3 Convolution Layer (n, 128, 64, 64) (n, 256, 32, 32)

8 Batch_Norm_3 Batch Normalization (n, 256, 32, 32) (n, 256, 32, 32)

9 Leaky_Relu_3 Leaky Relu (n, 256, 32, 32) (n, 256, 32, 32)

10 Convolution_4 Convolution Layer (n, 256, 32, 32) (n, 512, 16, 16)

A defining feature of image-to-image translation problems is that they map a high

resolution input grid to a high resolution output grid. In addition, for the problems we

consider, the input and output differ in surface appearance, but both are renderings of the

same underlying structure. Therefore, structure in the input is roughly aligned with

structure in the output. We design the generator architecture around these considerations.

Many previous solutions [1, 2, 3, 4, 5] to problems in this area have used an encoder-

decoder network [6]. In such a network, the input is passed through a series of layers that

progressively downsample, until a bottleneck layer, at which point the process is reversed.

Such a network requires that all information flow pass through all the layers, including

the bottleneck. For many image translation problems, there is a great deal of low-level

information shared between the input and output, and it would be desirable to shuttle this

information directly across the net. For example, in the case of image colorization, the

input and output share the location of prominent edges.

To give the generator a means to circumvent the bottleneck for information like this, we

add skip connections, following the general shape of a “U-Net” [7]. Specifically, we add

skip connections between each layer 𝑖 and layer 𝑛 − 𝑖, where n is the total number of

layers. Each skip connection simply concatenates all channels at layer i with those at layer

30
𝑛 − 𝑖.

3.1.3 Discriminator

In our model, we utilize a different king of discriminator architecture known as a

PatchGAN. The concept behind the PatchGAN is that instead of outputting a single

discriminator score for the whole image, a set of separate scores are produced for every

patch of the image, and then an average of the scores is taken to produce a final score.

This approach is known to improve performance by relying on fewer parameters, and a

precisely tuned patch size can yield improved image quality. A PatchGAN with a patch

size configured to be the same as the image size is therefore equivalent to a traditional

discriminator architecture. For the sake of simplicity on our work, our model’s

implementation utilizes a traditional discriminator, or a PatchGAN over the entire image.

Figure 17. 6 Layer Deep Neural Network Discriminator.

31
Table 4 below, lists the input shape details of the discriminator part of our proposed model:

Table 4. Discriminator Layer Implementation Details.

Layer Layer Name Layer Type Input Shape Output Shape


1 Patch_gan_input Input Layer (n, 1, 256, 256) (n, 1, 256, 256)
2 Patch_gan Model (n, 1, 256, 256) [(n, 2), (n, 512)]
3 Dense_1 Dense (n, 512) (n, 500)
4 Reshape_1 Reshape (n, 500) (n, 100, 5)
5 Lambda_1 Lambda (n, 100, 5) (n, 100)
6 Merge_8 Merge [(n, 2), (n, 100)] (n, 102)
7 Disc_output Dense (n, 102) (none, 2)

It is well known that the L2 loss and L1 produces blurry results on image generation

problems [8]. Although these losses fail to encourage high frequency crispness, in many

cases they nonetheless accurately capture the low frequencies. For problems where this

is the case, we do not need an entirely new framework to enforce correctness at the low

frequencies. L1 will already do. This motivates restricting the GAN discriminator to only

model high-frequency structure, relying on an L1 term to force low-frequency correctness

(Eqn. 4). In order to model high-frequencies, it is sufficient to restrict our attention to the

structure in local image patches. Therefore, we design a discriminator architecture –

which we term a PatchGAN – that only penalizes structure at the scale of patches. This

discriminator tries to classify if each N _ N patch in an image is real or fake. We run this

discriminator convolutationally across the image, averaging all responses to provide the

ultimate output of D.

In Section 4.4, we demonstrate that N can be much smaller than the full size of the image

and still produce high quality results. This is advantageous because a smaller PatchGAN

has fewer parameters, runs faster, and can be applied on arbitrarily large images.

Such a discriminator effectively models the image as a Markov random field, assuming

32
independence between pixels separated by more than a patch diameter. This connection

was previously explored in [9], and is also the common assumption in models of texture

[10, 11] and style [12, 13, 14, 15]. Our PatchGAN can therefore be understood as a form

of texture/style loss.

3.1.1 DCGAN

GANs are a type of generative model that learns a mapping from a random noise vector

z to an output image y, G: z  y. On the other hand, conditional GANs learn a mapping

from an observed image x and a random noise vector z to y, G: {x, z}  y. The generator

G is trained to produce outputs that cannot be distinguished from “real” images by an

adversarial trained discriminator, D, which is trained to do well at possibly detecting the

generator’s “fakes”. This training procedure is shown in Figure 18.

Figure 18. Deep Conditional GAN (DCGAN).

Table 5 outlines the implementation details of the input and output fields of the whole
33
deep conditional generative adversarial network.

Table 5. DCGAN Layer Implementation Details.

Layer Layer Name Layer Type Input Shape Output Shape


1 DCGAN_input Input Layer (n, 1, 256, 256) (n, 1, 256, 256)
2 Unet_gen Model (n, 1, 256, 256) (n, 1, 256, 256)
3 Lambda_2 Lambda (n, 1, 256, 256) (n, 1, 256, 256)
4 Disc_nn Model (n, 1, 256, 256) (n, 2)

We adapt our generator and discriminator architectures from those in [16]. Both generator

and discriminator use modules of the form convolution-BatchNorm-ReLu [17].

3.2 Evaluation Methods


To test and evaluate the performance of our model we utilize a couple of methods. One

for checking if we have reached an equilibrium point in the training and the other for

checking the performance of our model against previous studies as mentioned in the

related work section.

3.2.1 Objective Function

The objective of a conditional GAN can be expressed as

𝑂 (𝐺, 𝐷 ) = 𝐸 , [𝑙𝑜𝑔𝐷(𝑥, 𝑦)] + 𝐸 , [log(1 − 𝐷(𝑥, 𝐺(𝑥, 𝑧)]. (1)

Where 𝐺 tries to minimize this objective against an adversarial D that tries to maximize

it, i.e. 𝐺 ∗ = 𝑎𝑟𝑔 min max 𝑂 (𝐺, 𝐷)

To test the importance of conditioning the discriminator, we also compare to an

unconditional variant in which the discriminator does not observe 𝑥

𝑂 (𝐺, 𝐷 ) = 𝐸 [log(𝐷(𝑦)] + 𝐸 , [log(1 − 𝐷(𝐺(𝑥, 𝑧)]. (2)

Previous approaches have found it beneficial to mix the GAN objective with a more

traditional loss, such as L2 distance [1]. The discriminator’s job remains unchanged, but

34
the generator is tasked to not only fool the discriminator but also to be near the ground

truth output in an L2 sense. We also explore this option, using L1 distance rather than L2

as L1 encourages less blurring:

𝑂 (𝐺 ) = 𝐸 , , [|𝑦 − 𝐺 (𝑥, 𝑧)| ]. (3)

Our final objective is

𝐺 ∗ = 𝑎𝑟𝑔 min max 𝑂 (𝐺, 𝐷 ) + 𝜆𝑂 (𝐺). (4)

Without z, the net could still learn a mapping from x to y, but would produce deterministic

outputs, and therefore fail to match any distribution other than a delta function.

Past conditional GANs have acknowledged this and provided Gaussian noise z as an input

to the generator, in addition to x (e.g., [2]). In initial experiments, we did not find this

strategy effective – the generator simply learned to ignore the noise – which is consistent

with Mathieu et al. [18]. Instead, for our final models, we provide noise only in the form

of dropout, applied on several layers of our generator at both training and test time.

Despite the dropout noise, we observe only minor stochasticity in the output of our nets.

Designing conditional GANs that produce highly stochastic output, and thereby capture

the full entropy of the conditional distributions they model, is an important question left

open by the present work.

3.2.2 Loss Function

To explain our choice of the loss function, we need to look into two types of loss functions

that are suitable for this kind of problem, namely the L1 and L2 loss.

L1 loss function is used to minimize the error which is the sum of all the absolute

differences between the true value and predicted value:

35
𝐿1 = |𝑦 −𝑦 |

L2 loss function is used to minimize the error which is the sum of all the squared

differences between the true value and the predicted value:

𝐿2 = 𝑦 −𝑦

3.2.3 Dice Coefficient

The Dice coefficient (also known as Dice similarity index) is the same as the F1 score,

but it's not the same as accuracy. The main difference might be the fact that accuracy takes

into account true negatives while Dice coefficient and many other measures just handle

true negatives as uninteresting defaults, the dice coefficient is given by:


2|𝑋 ∩ 𝑌|
𝐷𝑆𝐶 =
|𝑋| + |𝑌|
When applied to Boolean data, using the definition of true positive (TP), false positive

(FP), and false negative (FN), it can be written as:


2𝑇𝑃
𝐷𝑆𝐶 =
2𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
To evaluate the performance of our model and validate it’s capability to generate

segmentations, create an algorithm to compute the dice score of the between 2 images

and return a percentage signifying the similarity of the two.

Dice coefficient as a floating range between 0 and 1.

o Maximum similarity = 1

o No similarity = 0

3.2.4 Perceptual Evaluation

The method of perceptual evaluation, basically implores a couple of human subjects to

36
give some feedback on what they perception is about the results. So they can issue a

human plausibility score, which is an approximation in terms of percentage.

37
Chapter 4 Experimental Setup and Training

4.1 Experimental Setup


To perform experiments in this work, we utilized the python programming language

which was setup on a linux Ubuntu distribution. The main libraries used were TensorFlow

and Keras. The machine was equipped with two NVIDEA 1080i GPUs.

Figure 19. Different sections of tumor annotated in colors.

Our dataset was obtained from the Medical Image Computing and Computer Assisted

Intervention Society (“The MICCAI Society”).

The datasets used in this work have been updated, since 2016, with more routine

clinically-acquired 3T multimodal MRI scans and all the ground truth labels have been

manually-revised by expert board-certified neuroradiologists.

Ample multi-institutional routine clinically-acquired pre-operative multimodal MRI

scans of glioblastoma (GBM/HGG) and lower grade glioma (LGG), 840 and 300 patients

respectively, with pathologically confirmed diagnosis and available OS, are provided as

the training, validation and testing data. These multimodal scans describe a) native (T1)

and b) post-contrast T1-weighted (T1Gd), c) T2-weighted (T2), and d) T2 Fluid

Attenuated Inversion Recovery (FLAIR) volumes, and were acquired with different

clinical protocols and various scanners from about 19 institutions, mentioned as data

38
contributors in the MICCAI website. All the imaging datasets have been segmented

manually, by one to four raters, following the same annotation protocol, and their

annotations were approved by experienced neuro-radiologists. Annotations comprise the

GD-enhancing tumor, the peritumoral edema, and the necrotic and non-enhancing tumor

core, as described in the BraTS reference paper, see Figure 19 above, published in IEEE

Transactions for Medical Imaging. The provided data are distributed after their pre-

processing, i.e. co-registered to the same anatomical template, interpolated to the same

resolution (1 𝑚𝑚 ) and skull-stripped.

To test the performance of our model and design, we perform four experiments based on

the nature of our dataset. To understand the setup, see Table 6, Figure 20 below and Table

1:

Table 6. Dataset Training, Testing and Validation Ratio.

Type Total Training Testing Validation


(50%) (30%) (20%)

HGG 840 420 252 168

LGG 300 150 90 60

Total 1140 570 342 228

Table 6 above summarizes the setup of our training data, we split the available dataset

into 50% for training, 30% for testing and optimization and finally 20% for validation on

unseen data so that we can test how well our model performs against new data.

39
The test set and cross validation set have different purposes. If we drop either one, we

lose its benefits:

 The cross validation set is used to help detect over-fitting and to assist in hyper-

parameter search such as the learning rate.

 The test set is used to measure the performance of the model.

We cannot use the cross validation set to measure performance of our model accurately,

because we would deliberately tune our results to get the best possible metric, over maybe

hundreds of variations of our parameters. The cross validation result is therefore likely to

be too optimistic. For the same reason, we cannot drop the cross validation set and use

the test set for selecting hyper parameters, because then we are pretty much guaranteed

to be overestimating how good our model is. In the ideal world we use the test set just

once, or use it in a neutral fashion to compare different experiments.

If we cross validate, find the best model, and then add in the test data to train, it

is possible (and in some situations perhaps quite likely) our model would be improved.

However, we have no way to be sure whether that has actually happened, and even if it

has, we do not have any unbiased estimate of what the new performance is.

As mentioned above, we perform four experiments, 1) All image types 2) T1 and T1c

image types 3) T2 image types, and finally 4) FLAIR image types. We transpose the 3D

image slices information coming from the MRI scans into 3 basic viewpoints, transverse

view, sagittal view and coronal view, see Figure 20 below. We perform our experiments

on only the transverse view.

40
Figure 20. MRI 2D viewpoints, transverse, sagittal and coronal.

To perform optimization on our neural network, we follow the standard approach from

[19]: where we alternate between one gradient descent step on the discriminator D, then

one step on the generator G. As it was suggested in the original GAN paper, rather than

training G to minimize 𝑙𝑜𝑔(1 − 𝐷(𝑥, 𝐺𝑥, 𝑧)) we instead train to maximize

log 𝐷 𝑥, 𝐺 (𝑥, 𝑧) [19]. Added to that, we divide the objective by 2 while optimizing D,

which should slow down the rate at which D learns with respect to G. We utilize minibatch

stochastic gradient descent (SGD) and apply the Adam solver [20], we set the learning

rate to 0.0002, and also the momentum parameters 𝛽 = 0.5, 𝛽 = 0.999. During the

inference period, we run the generator network in precisely the same manner as during

the training phase. This is different from the usual protocol in that dropout is applied at

during testing, and then we apply batch normalization [17] using the statistics of the test

batch, rather than aggregated statistics of the training batch. This kind of approach to

batch normalization, when the batch size is set to 1, has been given the term “instance

normalization” and has been demonstrated to be very effective in image generation tasks

[21]. In our experiments, we use batch sizes between 1 and 5 depending on the experiment.

We also add a data generator to perform image augmentations, such that our inputs and

targets are rotated at angles of 90°, 180° 𝑎𝑛𝑑 270° about the origin. Augmentations

also include flipping the images in a top to bottom and right to left manner.

41
Chapter 5 Results and Discussion

Figure 21 shows the training curves for the four different experiment setups

configurations. The performance seems to differ based on the image type used in the

experiment. We explore the possible causes of this difference in Dice performance in the

results that follow below.

(a)

(b)

42
Figure 21. FLAIR experiment (a) Training losses (b) Dice Coefficient Training.

As it can be seen from the graphs on Figure 21. Above, training, testing and validation

losses decrease over time.

(a)

43
(b)

Figure 22. T2 experiment (a) Training losses (b) Dice Coefficient Training.

Consider Figure 23 below, it shows a sample result after a prediction or sampling


operation was run of the FLAIR model:

Figure 23. A Sample Result From the FLAIR Experiment.

44
The FLAIR image type experiment is the one that gives the highest similarity score (Dice

coefficient) of 84.93%, as can be been seen from Figure 23 above, the model correctly

identifies the location of the tumor and outlines its boundaries very closely to what an

expert would have dove. We believe the reason the FLAIR image type experiment out

performs the others, is due to the tumor pixels being much brighter with respect to the

rest of the brain.

Consider Figure 24 below, it outlines the erroneous results from the other

experiments, such as T1 image types. The tumor may be correctly identified as you

progress through the slices, however due to the fact that the tumor is almost as bright as

the rest of the brain, some parts of the brain may be returned as part of the tumor, which

degrades the similarity score severely.

Figure 24. A Sample Erroneous Result from the T1 Experiment.

On other cases, some parts of the brain that show up just as bright as the tumor, tend to

45
make it to the segmentation result, this is because, in this work we have not built in a

mechanism for the model to detect brain parts and ignore them if they are not infected.

Figure 25 below demonstrates this issue from sampling.

Figure 25. A Sample Brain Structure Mistaken for Tumor.

In some other cases, a complete miss occurs and a tumor is predicted yet there isn’t one,

or it is not detected at all even when it is present. Figure 26 shows the former and latter

respectively.

46
(a) (b)
Figure 26. A Sample Complete Miss.

It can be seen that for case a) the predicted availability of a tumor is due to the whole

brain structure being bright, yet we have established that our model seems to have learned

that, the somewhat bright structures in the images are possible tumors. For the second

case b) the whole brain image seems to be completely dark with an absence of any

outstanding brighter pixels, hence no tumor is predicted even though according to experts

there should be something there.

Table 7 below, displays the individual average Dice scores for the all four experiments.

47
Table 7. Average Similarity Scores from Experiments.

It can be seen from the results that the FLAIR images have the best similarity score and

combining all the image types, greatly reduces the performance of the model. The bar

chart below outlines a better visual comparison of the experiments and the overall average

score.

Figure 27. Bar Chart Comparison of all Experiments.

Following that, we then try to see what actual human beings think of the experiment, so

ten participants from our lab were picked and given one of the image results after

sampling on the model. They were asked to give an approximate score of how similar

they think the segmentations produced by the model are compared to the ones created by

the experts. Table 8 below displays their responses.

48
Table 8. Human Plausibilty Score.

The average of this test is approximately 7% below the average of the Dice score, and the

highest score on this test is approximately 1% below the highest score on the Dice score.

We finally perform a comparison between the related works we mentioned earlier in this

document, to see how our own work compares with similar studies. The results are shown

in Table 9 below.

Table 9. Comparison with Related Studies.

49
We decided to pin our best model result from the FLAIR image types experiment. It

follows that our model ranks number 2 when compared with these similar studies. It worth

pointing out that these comparisons are on the whole tumor basis. On identifying things

like tumor core, the other studies performed really well, but our work was not focused on

such, it was all about the whole tumor, hence we make the comparison on that.

50
Chapter 6 Conclusions and Future Work

6.1 Conclusions
The proposed model is found to be capable of generating the image segmentation

masks given a 3D MRI image. The accuracy or similarity score at the current level only

good enough to identify the position of a tumor but it can not be used in an actual medical

environment, since very high precision is required when it to patient’s well being. It seems

using all MRI sequences types to train the model at once greatly diminishes the similarity

score of the generated image maks, hence it it better to tackle such a problem focusing on

one image type at a time. The FLAIR MRI sequence image type is the one well suited to

identify and segment brain tumors using our methodology.

6.2 Future Work


The current model can be improved by separating the HGG and LGG datasets and

training for longer epochs. Further using the Dice coefficient as the loss function would

probably be an advantage over the L1 loss.

51
References

[1] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A.Efros. Context

encoders: Feature learning by inpainting. CVPR, 2016.

[2] X. Wang and A. Gupta. Generative image modeling using style and structure

adversarial networks. ECCV, 2016.

[3] J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer

and super-resolution. 2016.

[4] Y. Zhou and T. L. Berg. Learning temporal transformations from time-lapse

videos. In ECCV, 2016.

[5] D. Yoo, N. Kim, S. Park, A. S. Paek, and I. S. Kweon. Pixellevel domain transfer.

ECCV, 2016.

[6] S. Hwang, J. Park, N. Kim, Y. Choi, and I. So Kweon. Multispectral pedestrian

detection: Benchmark dataset and baseline. In CVPR, pages 1037–1045, 2015.

[7] O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for

biomedical image segmentation. In MICCAI, pages 234–241. Springer, 2015.

[8] A. B. L. Larsen, S. K. Sønderby, and O. Winther. Autoencoding beyond pixels

using a learned similarity metric. arXiv preprint arXiv:1512.09300, 2015.

[9] C. Li and M. Wand. Precomputed real-time texture synthesis with markovian

generative adversarial networks. ECCV, 2016.

[10] A. A. Efros and T. K. Leung. Texture synthesis by nonparametric sampling. In

ICCV, volume 2, pages 1033–1038. IEEE, 1999.

[11] L. A. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis and the controlled

generation of natural stimuli using convolutional neural networks. arXiv preprint

arXiv:1505.07376, 12, 2015.

52
[12] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with

neural networks. Science, 313(5786):504–507, 2006.

[13] A. Hertzmann, C. E. Jacobs, N. Oliver, B. Curless, and D. H. Salesin. Image

analogies. In SIGGRAPH, pages 327–340. ACM, 2001.

[14] L. A. Gatys, A. S. Ecker, and M. Bethge. Image style transfer using convolutional

neural networks. CVPR, 2016.

[15] C. Li and M. Wand. Combining markov random fields and convolutional neural

networks for image synthesis. CVPR, 2016.

[16] A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with

deep convolutional generative adversarial networks. arXiv preprint

arXiv:1511.06434, 2015.

[17] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training

by reducing internal covariate shift. 2015. 3, 4

[18] M. Mathieu, C. Couprie, and Y. LeCun. Deep multi-scale video prediction

beyond mean square error. ICLR, 2016.

[19] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D.Warde-Farley, S. Ozair, A.

Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.

[20] D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.

[21] D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing

ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.

[22] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B.

Freymann, K. Farahani, C. Davatzikos, Advancing the cancer genome atlas

glioma mri collections with expert segmentation labels and radiomic features,

Nature Scientific Data, 2017.

53
[23] S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. S. Kirby, J. B.

Freymann, K. Farahani, C. Davatzikos, Segmentation labels and radiomic

features for the pre-operative scans of the tcga-gbm collection, The Cancer

Imaging Archive, 2017.

[24] A. Beers, K. Chang, J. Brown, E. Sartor, C. Mammen, E. Gerstner, B. Rosen:

“Sequential 3D U-Nets for Biologically-Informed Brain Tumor Segmentation”,

arXix preprint arXiv:1709.02967, 2017.

[25] V. Alex, M. Safwan: “Automatic Segmentation and Overall Survival Prediction

in Gliomas using Fully Convolutional Neural Network and Texture Analysis”,

arXiv preprint arXiv:1712.02066, 2017.

[26] H. Bharath, S. Colleman, D. Sima, S. Van Huffel, Tumor Segmentation from

Multimodal MRI Using Random Forest with Superpixel and Tensor Based

Feature Extraction. In: Crimi A., Bakas S., Kuijf H., Menze B., Reyes M. (eds)

Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries.

BrainLes 2017. Lecture Notes in Computer Science, vol 10670. Springer, Cham

[27] Guotai Wang, Wenqi Li, Sebastien Ourselin: “Automatic Brain Tumor

Segmentation using Cascaded Anisotropic Convolutional Neural Networks”,

arXiv preprint arXiv:1709.00382, 2017.

[28] Automatic segmentation of brain tumor from MR images using SegNet: selection

of training data sets (NTUST)

54
55

Das könnte Ihnen auch gefallen