Beruflich Dokumente
Kultur Dokumente
Thanks to Allah for giving us the strength and the courage to complete
a project with such complexity, after all what we have been through. We
would like to thank our supervisor Dr/ Ahmed Elshewy for help and advice.
And We have the pleasure to express our deep appreciation to all the staff
members of faculty of computers and information for their advice and
helpful discussions especially Dr/ Mohammed Salah. Also, we would like to
especially thank our families and our colleagues for their encouragement and
motivation. And finally, I want to thank this team for acting with such
harmony, collaboration, and professionalism while working on our project.
TEAM MEMBERS
Breast cancer is the most frequent cancer among women and the second
most common cancer overall, impacting 2.1 million women each year, and also
causes the greatest number of cancer-related deaths among women. In 2018, it is
estimated that 627,000 women died from breast cancer – that is approximately
15% of all cancer deaths among women. While breast cancer rates are higher among
women in more developed regions, rates are increasing in nearly every region
globally. Breast Cancer is the most prevalent cancer among Egyptian women and
constitutes 29% of National Cancer Institute cases. Median age at diagnosis is one
decade younger than in countries of Europe and North America.
Machine learning (ML) has become a vital part of medical imaging research. ML
methods have evolved over the years from manual seeded inputs to automatic
initializations. The advancements in the field of ML have led to more intelligence In
the proposed system, We present a deep convolutional neural network for breast
cancer screening exam classification, trained and evaluated on over DDSM. The
DDSM is a database of 2,620 scanned film mammography studies. It contains
normal and benign cases with verified pathology information. To increase the
amount of training data we extract the Regions of Interest (ROI) from each image,
perform data augmentation and then train ConvNets on the augmented data. The
ConvNets were trained to predict both whether a scan was normal or abnormal,
Multiple datasets were created using different ROI extraction techniques and
amounts of data augmentation. The datasets ranged in size from 27,000 training
images to 62,000 training images. To validate our model, Different models
performed differently on different datasets with different classification methods.
The results indicate that the best model was able to achieve relatively high
accuracy=99.6%, recall=95%, and loss=0.008, while the others traded off precision
and recall. we are doing this to enhance the efforts being done by our community
leaders fighting the cancer and especially breast cancer like Baheya Hospital and
Egyptian National Cancer Institute.
TABLE OF CONTENTS
Chapter 1: introduction
1.1 Artificial Intelligence ……………………………………………………………………………………………….. 8
1.1.1 Difference between Human and Machine Intelligence …………………………………………… 9
1.1.2 Applications of AI ……………………………………………………………………………………………………………… 9
1.1.3 Artificial Intelligence in Medicine ………………………………………………………………………………….. 10
1.2 Machine Learning …………………………………………………………………………………………………… 11
1.2.1 Difference between Machine Learning and Artificial Intelligence ………………………… 11
1.2.2 Types of Machine Learning …………………………………………………………………………………………… 12
1.2.3 Machine Learning:How it works? …………………………………………………………………………………. 13
1.3 Deep Learning ………………………………………………………………………………………………………… 14
1.3.1 Deep Learning:How it works? ……………………………………………………………………………………….. 15
1.4 Deep neural networks ……………………………………………………………………………………..…… 16
1.4.1 What is the difference between neural networks, DL, ML and AI? ……………………… 17
1.4.2 How is deep learning being used? ……………………………………………………………………………… 18
1.4.3 Where best to apply deep learning? ……………………………………………………………………….… 19
1.4.4 How long does it take to train a deep learning model? …………………………….………….. 19
Chapter 4: Preprocessing
4.1 dataset ………………………………………………………………………………………………….……………. 60
4.1.1 training dataset ………………………………………………………………………………………….……….……. 61
4.2 preprocessing ……………………………………………………………………………………….…………. 63
4.2.1 ROI extraction method 1 …………………………………………………………………………….……….… 64
4.2.2 ROI extraction method 2 …………………………………………………………………………….…….…. 64
4.2.3 Normal images ……………………………………………………………………………………………….…….… 65
4.2.4 MIAS images ……………………………………………………………………………………………….……… 66
4.2.5 image preprocessing techniques ……………………………………………………………………………… 66
4.2.5.1 standardize images ……………………………………………………………………………………. 68
4.2.5.2 data augmentation ……………………………………………………………………………………. 68
4.2.5.3 scaling images ……………………………………………………………………………………………. 69
4.2.5.4 flipping images …………………………………………………………………………………………… 69
4.2.5.5 image rotation ……………………………………………………………………………………………. 70
4.2.5.6 resize images ………………………………………………………………………………………………. 71
4.2.5.7 threshold ………………………………………………………………………………………………………. 73
4.3 Image segmentation ………………………………………………………………………………………………………… 74
4.3.1 Mask ……………………………………………………………………………………………………………….….. 75
4.3.2 Scan …………………………………………………………………………………………………………………… 76
INTRODUCTION
In this chapter we are going to talk briefly about Artificial Intelligence,
Machine Learning and Deep Learning showing how those sub-children
topics have helped us in determining the most convenient technology and
approach to be imported and adopted in the project implementation.
8
Introduction
Chapter 1
1.1.2 Applications of AI
1. Gaming − AI plays an important role for machines to think of a large
number of possible positions based on deep knowledge in strategic
games. for example, chess, river crossing, N-queen’s problems and etc.
9
Introduction
Chapter 1
10
Introduction
Chapter 1
The argument for increased use of AI in medicine is that quite a lot of the
above could be automated - automation often means tasks are completed
more quickly, and it also frees up a medical professional’s time when they
could be performing other duties, ones that cannot be automated, and so are
seen as a more valuable use of human resources.
11
Introduction
Chapter 1
12
Introduction
Chapter 1
13
Introduction
Chapter 1
14
Introduction
Chapter 1
15
Introduction
Chapter 1
16
Introduction
Chapter 1
DNNs are typically feedforward networks in which data flows from the input
layer to the output layer without looping back. At first, the DNN creates a map
of virtual neurons and assigns random numerical values, or "weights", to
connections between them. The weights and inputs are multiplied and return
an output between 0 and 1. If the network did not accurately recognize a
particular pattern, an algorithm would adjust the weights. That way the
algorithm can make certain parameters more influential, until it determines
the correct mathematical manipulation to fully process the data.
17
Introduction
Chapter 1
For many tasks, for recognizing and generating images, speech and
language, and in combination with reinforcement learning to match human-
level performance in games ranging from the ancient, such as Go, to the
modern, such as Dota 2 and Quake III.
18
Introduction
Chapter 1
19
Neural Network
Chapter 2
NEURAL NETWORK
20
Neural Network
Chapter 2
A neural network simply consists of neurons (also called nodes). These nodes
are connected in some way. Then each neuron holds a number, and each
connection holds a weight.
These neurons are split between the input, hidden and output layer. In
practice, there are many layers and there are no general best number of
layers.
From other perspective Artificial Neuron are also called as perceptron. This
consist of the following basic terms:
● Input
● Weight
● Bias
● Activation Function
● Output
21
Neural Network
Chapter 2
This model classifies the data point based on its distance from
a center point. If you don’t have training data, for example, you’ll
want to group things and create a center point. The network looks for
data points that are similar to each other and groups them. One of
the applications for this is power restoration systems.
22
Neural Network
Chapter 2
23
Neural Network
Chapter 2
1) Sigmoid Function
2) Threshold Function
The threshold function is used when you don’t want to worry about
the uncertainty in the middle.
24
Neural Network
Chapter 2
The ReLU (rectified linear unit) function gives the value but says if it’s
over 1, then it will just be 1, and if it’s less than 0, it will just be 0. The ReLU
function is most commonly used these days.
25
Neural Network
Chapter 2
26
Neural Network
Chapter 2
● Mean Squared Error (MSE): MSE loss is used for regression tasks.
As the name suggests, this loss is calculated by taking the mean of
squared differences between actual(target) and predicted values.
● Binary Cross entropy (BCE): BCE loss is used for the binary
classification tasks. If you are using BCE loss function, you just
need one output node to classify the data into two classes. The
output value should be passed through a sigmoid activation
function and the range of output is (0 – 1).
27
Neural Network
Chapter 2
ConvNet architectures make the explicit assumption that the inputs are
images, which allows us to encode certain properties into the architecture.
These then make the forward function more efficient to implement and
vastly reduce the amount of parameters in the network.
28
Neural Network
Chapter 2
Convolution Layer
This is the first step in the process of extracting valuable features from
an image. A convolution layer has a number of filters that perform the
convolution operation. Every image is considered as a matrix of pixel values.
Consider the following 5x5 image whose pixel values are either 0 or 1. There’s
also a filter matrix with a dimension of 3x3. Slide the filter matrix over the
image and compute the dot product to get the convolved feature matrix.
29
Neural Network
Chapter 2
ReLU Layer
ReLU stands for the rectified linear unit. Once the feature maps are
extracted, the next step is to move them to a ReLU layer.
ReLU performs an element-wise operation and sets all the negative pixels to
0. It introduces non-linearity to the network, and the generated output is a
rectified feature map. Below is the graph of a ReLU function:
The original image is scanned with multiple convolution and ReLU layers for
locating the features.
30
Neural Network
Chapter 2
Pooling Layer
31
Neural Network
Chapter 2
The pooling layer uses various filters to identify different parts of the image
like edges, corners, body, feathers, eyes, and beak.
Here’s how the structure of the convolution neural network looks so far:
The next step in the process is called flattening. Flattening is used to convert
all the resultant 2-Dimensional arrays from pooled feature maps into a single
long continuous linear vector.
32
Neural Network
Chapter 2
The flattened matrix is fed as input to the fully connected layer to classify
the image.
Fully connected
Fully connected layers connect every neuron in one layer to every
neuron in another layer. It is in principle the same as the traditional multi-
layer perceptron neural network (MLP). The flattened matrix goes through a
fully connected layer to classify the images.
33
Neural Network
Chapter 2
Training Set
This dataset corresponds to Step 1 in the previous section. It includes the set
of input examples that the model will be fit into — or trained on — by adjusting
the parameters (i.e. weights in the context of Neural Networks).
Validation Set
In order for the model to be trained, it needs to periodically be evaluated
(Step 2), and that is exactly what the validation set is for. Through calculating
the loss (i.e. error rate) the model yields on the validation set at any given
point, we can know how accurate it is. This is the essence of training.
Subsequently, the model will tune its parameters based on the frequent
evaluation results on the validation set.
Test Set
This corresponds to the final evaluation that the model goes through after
the training phase (utilizing training and validation sets) has been completed.
This step is critical to test the generalizability of the model (Step 3). By using
this set, we can get the working accuracy of our model.
34
Technologies & Tools
Chapter 3
3.1 Python
35
Technologies & Tools
Chapter 3
3.1.1 Python in AI
As AI and ML are being applied across various channels and industries,
big corporations invest in these fields, and the demand for experts in ML and
AI grows accordingly. Jean Francois Puget, from IBM’s machine learning
department, expressed his opinion that Python is the most popular language
for AI and ML.
We have conducted some research on Python’s strong sides and found out
why you should opt in for Python when bringing your AI and ML projects to
life.
36
Technologies & Tools
Chapter 3
● TensorFlow for working with deep learning by setting up, training, and
utilizing artificial neural networks with massive datasets.
● Caffe for deep learning that allows switching between the CPU and the
GPU and processing 60+ mln images a day using a single NVIDIA K40
GPU.
In the PyPI repository, you can discover and compare more Python libraries.
37
Technologies & Tools
Chapter 3
Flexibility
Python for machine learning is a great choice, as this language is very flexible:
38
Technologies & Tools
Chapter 3
Platform independence
Python is not only comfortable to use and easy to learn but also very
versatile. What we mean is that Python for machine learning development
can run on any platform including Windows, MacOS, Linux, Unix, and
twenty-one others. To transfer the process from one platform to another,
developers need to implement several small-scale changes and modify
some lines of code to create an executable form of code for the chosen
platform. Developers can use packages like PyInstaller to prepare their code
for running on different platforms.
Again, this saves time and money for tests on various platforms and makes
the overall process more simple and convenient.
Readability
Python is very easy to read so every Python developer can understand
the code of their peers and change, copy or share it. There’s no confusion,
errors or conflicting paradigms, and this leads to more efficient exchange of
algorithms, ideas, and tools between AI and ML professionals.
There are also tools like IPython available, which is an interactive shell that
provides extra features like testing, debugging, tab-completion, and others,
and facilitates the work process.
39
Technologies & Tools
Chapter 3
Community support
It’s always very helpful when there’s strong community support built
around the programming language. Python is an open-source language
which means that there’s a bunch of resources open for programmers starting
from beginners and ending with pros.
A lot of Python documentation is available online as well as in Python
communities and forums, where programmers and machine learning
developers discuss errors, solve problems, and help each other out.
Python programming language is absolutely free as is the variety of useful
libraries and tools.
Growing popularity
As a result of the advantages discussed above, Python is becoming
more and more popular among data scientists. According to Stack Overflow,
the popularity of Python is predicted to grow until 2020, at least.
This means it’s easier to search for developers and replace team players if
required. Also, the cost of their work may be not as high as when using a less
popular programming language.
3.2 Pytorch
PyTorch is a Python-based scientific computing package that uses the
power of graphics processing units. It is also one of the preferred deep
learning research platforms built to provide maximum flexibility and speed.
It is known for providing two of the most high-level features; namely, tensor
computations with strong GPU acceleration support and building deep neural
networks on a tape-based autograd system.
There are many existing Python libraries which have the potential to change
how deep learning and artificial intelligence are performed, and this is one
such library. One of the key reasons behind PyTorch’s success is it is
completely Pythonic and one can build neural network models effortlessly.
It is still a young player when compared to its other competitors, however, it
is gaining momentum fast.
40
Technologies & Tools
Chapter 3
You can always use your favorite Python packages such as NumPy, SciPy, and
Cython to extend PyTorch functionalities and services when required. Now
you might ask, why PyTorch? What's so special in using it to build deep
learning models?
The answer is quite simple, PyTorch is a dynamic library (very flexible and you
can use as per your requirements and changes) which is currently adopted by
many of the researchers, students, and artificial intelligence developers. In
the recent Kaggle competition, PyTorch library was used by nearly all of the
top 10 finishers.
Some of the key highlights of PyTorch includes:
41
Technologies & Tools
Chapter 3
The interesting fact is, PyTorch is still in early-release beta, but the way
everyone is adopting this deep learning framework at a brisk pace shows its
real potential and power in the community. Even though it is in the beta
release, there are 741 contributors on the official GitHub repository working
on enhancing and providing improvements to the existing PyTorch
functionalities.
42
Technologies & Tools
Chapter 3
PyTorch uses different backends for CPU, GPU and for various
functional features rather than using a single back-end. It uses tensor
backend TH for CPU and THC for GPU. While neural network backends such
as THNN and THCUNN for CPU and GPU respectively. Using separate
backends makes it very easy to deploy PyTorch on constrained systems.
Imperative style
43
Technologies & Tools
Chapter 3
Highly extensible
PyTorch is deeply integrated with the C++ code, and it shares some C++
backend with the deep learning framework, Torch. Thus allowing users to
program in C/C++ by using an extension API based on cFFI for Python and
compiled for CPU for GPU operation. This feature has extended the PyTorch
usage for new and experimental use cases thus making them a preferable
choice for research use.
3.2.4 PyTorch-Approach
PyTorch is a native Python package by design. Its functionalities are
built as Python classes, hence all its code can seamlessly integrate with
Python packages and modules. Similar to NumPy, this Python-based library
enables GPU-accelerated tensor computations plus provides rich options of
APIs for neural network applications. PyTorch provides a complete end-to-
end research framework which comes with the most common building
blocks for carrying out everyday deep learning research. It allows chaining of
high-level neural network modules because it supports Keras-like API in its
torch.nn package.
PyTorch 1.0: The path from research to production
We have been discussing all the strengths PyTorch offers, and how these
make it a go-to library for research work. However, one of the biggest
downsides is, it has been its poor production support. But this is expected to
change soon.
44
Technologies & Tools
Chapter 3
This new version promises to handle tasks one has to deal with while running
the deep learning models efficiently on a massive scale. Along with the
production support, PyTorch 1.0 will have more usability and optimization
improvements. With PyTorch 1.0, your existing code will continue to work as-
is, there won’t be any changes to the existing API. If you want to stay updated
with all the progress to PyTorch library.
The beta release of this long-awaited version is expected later this year. Major
vendors like Microsoft and Amazon are expected to provide complete
support to the framework across their cloud products.
Summing up, PyTorch is a compelling player in the field of deep learning and
artificial intelligence libraries, exploiting its unique niche of being a research-
first library. It overcomes all the challenges and provides the necessary
performance to get the job done. If you’re a mathematician, researcher,
student who is inclined to learn how deep learning is performed, PyTorch is
an excellent choice as your first deep learning framework to learn.
3.3 MATLAB
MATLAB is a high-performance language for technical computing. It
integrates computation, visualization, and programming in an easy-to-use
environment where problems and solutions are expressed in familiar
mathematical notation. Typical uses include:
● Math and computation
● Algorithm development
● Modeling, simulation, and prototyping
● Data analysis, exploration, and visualization
● Scientific and engineering graphics
● Application development, including Graphical User Interface building
45
Technologies & Tools
Chapter 3
The name MATLAB stands for matrix laboratory. MATLAB was originally
written to provide easy access to matrix software developed by the LINPACK
and EISPACK projects, which together represent the state-of-the-art in
software for matrix computation.
MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory
and advanced courses in mathematics, engineering, and science. In industry,
MATLAB is the tool of choice for high-productivity research, development,
and analysis.
46
Technologies & Tools
Chapter 3
You can exchange models with TensorFlow™ and PyTorch through the
ONNX format and import models from TensorFlow-Keras and Caffe. The
toolbox supports transfer learning with DarkNet-53, ResNet-50, NASNet,
SqueezeNet and many other pretrained models.
47
Technologies & Tools
Chapter 3
while true
im = snapshot(camera); % Take a picture
image(im); % Show the picture
im = imresize(im,[227 227]); % Resize the picture for
alexnet
label = classify(net,im); % Classify the picture
title(char(label)); % Show the class label
drawnow
end
Eight lines of MATLAB code are all that’s needed to take advantage of a
classification network to identify images using an AlexNet model.
48
Technologies & Tools
Chapter 3
Coding is just part of the story, though. The Deep Network Designer (see
figure) provides a way to use pretrained models including SqueezeNet,
Inception-v3, ResNet-101, GoogLeNet, and VGG-19, as well as developing
new models. It can be used in conjunction with the MATLAB Image Labeler
application, where users can view and label images for semantic
segmentation. Developers are able to create domain-specific workflows for
ground-truth labeling information for images, videos, and audio clips.
The Deep Network Designer can be used to fine-tune pre trained deep-learning
networks.
The toolbox can also be applied to train large datasets by taking advantage
of distributed computation via multicore processors and GPUs on the
desktop using the Parallel Computing Toolbox. Developers could take
advantage of the cloud, too. The Deep Learning Toolbox supports Amazon
EC2 P2, P3, and G3 GPU instances using the MATLAB Distributed Computing
Server. The toolbox also supports Amazon AWS and Microsoft Azure.
49
Technologies & Tools
Chapter 3
50
Technologies & Tools
Chapter 3
Run Phase or Inference Phase: Once training is done Tensorflow can be run
on many different platforms. You can run it on
● Desktop running Windows, macOS or Linux
● Cloud as a web service
● Mobile devices like iOS and Android
You can train it on multiple machines then you can run it on a different
machine, once you have the trained model.
The model can be trained and used on GPUs as well as CPUs. GPUs were
initially designed for video games. In late 2010, Stanford researchers found
that GPU was also very good at matrix operations and algebra so that it makes
them very fast for doing these kinds of calculations. Deep learning relies on a
lot of matrix multiplication. TensorFlow is very fast at computing matrix
multiplication because it is written in C++. Although it is implemented in C++,
TensorFlow can be accessed and controlled by other languages mainly,
Python.
Finally, a significant feature of TensorFlow is the TensorBoard. The
TensorBoard enables to monitor graphically and visually what TensorFlow is
doing.
51
Technologies & Tools
Chapter 3
Graphs
TensorFlow makes use of a graph framework. The graph gathers and
describes all the series computations done during the training. The graph has
lots of advantages:
● It was done to run on multiple CPUs or GPUs and even mobile
operating system
● The portability of the graph allows to preserve the computations for
immediate or later use. The graph can be saved to be executed in the
future.
● All the computations in the graph are done by connecting tensors
together
○ A tensor has a node and an edge. The node carries the
mathematical operation and produces endpoints outputs. The
edges explain the input/output relationships between nodes.
52
Technologies & Tools
Chapter 3
53
Technologies & Tools
Chapter 3
2. Variables
3. Placeholders
Constants
Constants are parameters with values that do not change. To define a
constant, we use tf.constant() command
Variables
Variables allow us to add new trainable parameters to the graph. To
define a variable, we use tf.Variable() command and initialize it before
running the graph in a session.
Placeholders
Placeholders allow us to feed data to a TensorFlow model from
outside a model. It permits value to be assigned later. To define a placeholder,
we use the tf.placeholder() command.
54
Technologies & Tools
Chapter 3
Keras High-Level API handles the way we make models, defining layers, or
set up multiple input-output models. In this level, Keras also compiles our
model with loss and optimizer functions, training process with fit function.
Keras doesn't handle Low-Level API such as making the computational graph,
making tensors or other variables because it has been handled by the
"backend" engine.
Keras Backend
In Keras, with the help of TensorFlow Libraries, the backend carries out all the
bottom level calculations. The backend engine carries out the development
of the models. In Keras, we will use TensorFlow as the default backend
engine.
55
Technologies & Tools
Chapter 3
56
Technologies & Tools
Chapter 3
Sequential Model
from keras.models import Sequential
from keras.layers import Dense,
Activation,Conv2D,MaxPooling2D,Flatten,Dropout
model = Sequential()
Convolutional Layer
MaxPooling Layer
57
Technologies & Tools
Chapter 3
Dense Layer
Adding a Fully Connected Layer with just specifying the output Size
model.add(Dense(256, activation='relu'))
Dropout Layer
Our final step is to evaluate the model with the test data.
58
Technologies & Tools
Chapter 3
59
Preprocessing
Chapter 4
PREPROCESSING
4.1 Dataset
DDSM is a well-known dataset of normal and abnormal scans, and
one of the few publicly available datasets of mammography imaging.
Unfortunately, the size of the dataset is relatively small. To increase the
amount of training data we extract the Regions of Interest (ROI) from each
image, perform data augmentation and then train ConvNets on the
augmented data. The ConvNets were trained to predict both whether a scan
was normal or abnormal, and to predict whether abnormalities were
calcifications or masses and benign or malignant.
60
Preprocessing
Chapter 4
The CBIS-DDSM collection includes a subset of the DDSM data selected and
curated by a trained mammographer. The CBIS-DDSM images have been
pre-processed and saved as DiCom images, and thus are better quality than
the DDSM images, but this dataset only contains scans with abnormalities. In
order to create a dataset which can be used to predict the presence of
abnormalities, the ROIs were extracted from the CBIS-DDSM dataset and
combined with normal images taken from the DDSM dataset.
For the CBIS-DDSM images the masks were used to isolate and extract the
ROI from each image. For the DDSM images we simply created tiles of each
scan and included them as long as they met certain criteria.
Both offline and online data augmentation was used to increase the size of
the datasets.
Datasets 1 through 5 did not properly separate the training and test data and
thus are not referenced in this work.
61
Preprocessing
Chapter 4
As Dataset 9 was the only dataset that did not resize the images based on
the size of the ROI we felt that it introduced the least amount of artificial
manipulation into the data and after it was created we focused on training
with this dataset.
The CBIS-DDSM scans were of relatively large size, with a mean height of
5295 pixels and a mean width of 3131 pixels. Masks highlighting the ROIs were
provided. The masks were used to define a square which completely
enclosed the ROI. Some padding was added to the bounding box to provide
context and then the ROIs were extracted at 598x598 and then resized down
to 299x299 so they could be input into the ConvNet.
62
Preprocessing
Chapter 4
The ROIs had a mean size of 450 pixels and a standard deviation of 396. We
designed our ConvNets to accept 299x299 images as input. To simplify the
creation of the images, we extracted each ROI to a 598x598 tile, which was
then sized down by half on each dimension to 299x299. 598x598 was just
large enough that the majority of the ROIs could fit into it.
To increase the size of the training data, each ROI was extracted multiple
times using the methodologies described below. The size and variety of the
data was also increased by randomly horizontally flipping each tile, randomly
vertically flipping each tile, randomly rotating each tile, and by randomly
positioning each ROI within the tile.
4.2 Pre-processing
Preprocessing refers to all the transformations on the raw data before
it is fed to the machine learning or deep learning algorithm. For instance,
training a convolutional neural network on raw images will probably lead to
bad classification performances. The preprocessing is also important to
speed up training (for instance, centering and scaling techniques).
The CBIS-DDSM scans were of relatively large size, with a mean height of
5295 pixels and a mean width of 3131 pixels. Masks highlighting the ROIs were
provided. The masks were used to define a square which completely
enclosed the ROI. Some padding was added to the bounding box to provide
context and then the ROIs were extracted at 598x598 and then resized down
to 299x299 so they could be input into the ConvNet.
The ROIs had a mean size of 450 pixels and a standard deviation of 396. We
designed our ConvNets to accept 299x299 images as input. To simplify the
creation of the images, we extracted each ROI to a 598x598 tile, which was
then sized down by half on each dimension to 299x299. 598x598 was just
large enough that the majority of the ROIs could fit into it.
63
Preprocessing
Chapter 4
To increase the size of the training data, each ROI was extracted multiple
times using the methodologies described below. The size and variety of the
data was also increased by randomly horizontally flipping each tile, randomly
vertically flipping each tile, randomly rotating each tile, and by randomly
positioning each ROI within the tile.
64
Preprocessing
Chapter 4
The size of the ROI was only used to determine how much padding to add to
the bounding box before extraction. If the ROI was smaller than the 598x598
target we added more padding to provide greater variety when taking the
random crops. If the ROI was larger than 598x598 this was not necessary.
1. If the ROI was smaller than a 598x598 tile it was extracted with 20%
padding on either side.
2. If the ROI was larger than a 598x598 tile it was extracted with 5%
padding.
3. Each ROI was then randomly cropped three times using random
flipping and rotation.
For dataset 9, each DDSM image was cut into 598x598 tiles without being
resized. The tiles were then each resized down to 299x299.
65
Preprocessing
Chapter 4
66
Preprocessing
Chapter 4
67
Preprocessing
Chapter 4
68
Preprocessing
Chapter 4
Scaling is used to change the size of the image. It can be a scale down
or up. There are several methods available to interpolate the pixels.
69
Preprocessing
Chapter 4
Horzflip: This function returns the original image, or matrix, flipped from left to right.
Vertflip: This function returns the original image, or matrix, flipped upside
down.
70
Preprocessing
Chapter 4
Pad (default)
71
Preprocessing
Chapter 4
Resize BoxPad
When upscaling an image the image pixels themselves are not resized,
rather the image is padded to fit the given dimensions.
Resize Crop
Resizes the image to the given dimensions.
72
Preprocessing
Chapter 4
Resize Min
Resizes the image until the shortest side reaches the set given
dimension.
Resize Max
Resize Stretch
4.2.5.7 Thresholding
Thresholding is a type of image segmentation, where we change the
pixels of an image to make the image easier to analyze. we use thresholding
as a way to select areas of interest of an image, while ignoring the parts we
are not concerned with.
73
Preprocessing
Chapter 4
o Non-uniform illumination
o No control of the environment
o Inadequate model of the object of interest
o Noise
74
Preprocessing
Chapter 4
The mass boundaries for the region of interest (ROI) can be identified using
edge-based mass segmentation, which can detect and link the edge pixel to
form contour. Basically, this method involves two steps they are Edge
detection and Edge linking.
4.3.1 Mask
input: path to mask image PNG , opens the mask, reduces its size by
half, finds the borders of the mask and returns the center of the mass. if the
mass is bigger than the slice it returns the upper left and lower right corners
of the mask as tuples. which will be used to create multiple slices
returns: center_row - int with center row of mask, or tuple with edges of the
mask if the mask is bigger than the slice center_col – idem too_big - boolean
indicating if the mask is bigger than the slice.
75
Preprocessing
Chapter 4
4.3.2 Scan
The scans from one particular scanner (DBA) have white sections cut
out of them, possibly to hide personal information this is only on the normal
scans, so a convnet could use this information to identify the normal scans
to prevent this I will replace all white pixels with black, as there are no pure
white pixels in a normal scan.
76
Training ConvNets
Chapter 5
Training ConvNets
5.1 Model_4_a
Model description
We design the model based on known models the VGG model for
example. The model is really deep With 9 convolution layers all with the same
3X3 filter with different numbers and different sequence Also has 5 max
pooling layers and 3 dense layers. The model has 43,720,193 Trainable
parameters.
Model training time is 2h:55m using google colab fast GPUs .
Model architecture
conv2d (3X3)X32
conv2d (3X3)X256
conv2d (3X3)X32
Max Pooling (2X2)
conv2d (3X3)X3
conv2d (3X3)X512
Max Pooling (3X3)
conv2d (3X3)X128
Fully connected 2048
conv2d (3X3)X128
Fully connected 1
Max Pooling (2X2)
77
Training ConvNets
Chapter 5
78
Training ConvNets
Chapter 5
This type of results may look like an over fitting case but what
make this result is the lack of images and because of number of
training images is not equally divided between normal and
Malignant cases. the model is tending to classify all images as
normal cases.
5.2 Model_4_c
Model description
Updated version of model _4 with several changes the important two is:
1) Changing the optimizer to ADAM which is an adaptive learning rate
optimization algorithm that’s been designed specifically for training
deep neural networks. Adam can be looked at as a combination of
RMSprop and Stochastic Gradient Descent with momentum. It uses
the squared gradients to scale the learning rate like RMSprop and it
takes advantage of momentum by using moving average of the
gradient instead of gradient itself like SGD with momentum.
79
Training ConvNets
Chapter 5
2) Weight balancing:
Weight balancing balances our data by altering the weight that each
training example carries when computing the loss.
Because of the unbalanced nature of our dataset we use weight
balancing with a cross entropy weight of 1 to 3, for classes (0 and 1).
Model architecture
The same as model_4_a
80
Training ConvNets
Chapter 5
The weight balancing made a big change in the model out but the most
important one is the model starts to truly classify positive cases (model_4_a
classify all cases as negative).
The training accuracy does not change but the training loss decreased the
validation accuracy increased from 87 to 93 that’s a big move for the model
validation accuracy.
The validation recall is 0.75 which mean that the model starts to see positive
cases but unfortunately classify negative as positive cases.
5.3 Model_4_d
Model description
After adding weight balancing and change the optimizer in model_4_c . the
pixel values in images are normalized between 0 and 1 in model_4_d. image
normalization has a big impact when working with machine learning and
deep learning algorithms.
Model architecture
The same structure of model_4_a
81
Training ConvNets
Chapter 5
conv2d (3X3)X32
conv2d (3X3)X256
conv2d (3X3)X32
Max Pooling (2X2)
conv2d (3X3)X3
conv2d (3X3)X512
Max Pooling (3X3)
conv2d (3X3)X128
Fully connected 2048
conv2d (3X3)X128
Fully connected 1
Max Pooling (2X2)
82
Training ConvNets
Chapter 5
The prediction of the testing recall is 0.72 for the model after 30 epochs
While the validation accuracy is 0.94 this means the model starts to classify
most of cases as positive after more five epochs the recall become 0.77 with
the same validation accuracy but the validation loss increased Of course this
result make sense because of the weight balancing technique we used in this
model. after five more epochs (40 epochs) the model accuracy and recall
decreased to 0.92 and 0.68 where the model gets worse. We mentioned
before that the testing data which from MIAS images dataset come from a
completely different distribution than the training data which come from
DDSM images, And after five (45 epochs) more the model is over fitting the
training data(training accuracy =0.96 loss=.03) and the validation accuracy
decreased to 0.91 as shown below in epochs 40 to 45:
83
Training ConvNets
Chapter 5
5.4 Model_4_e
Model description
Model architecture
The same structure of model_4_a
We notice that after changing the classes weights the model get more
validation loss and less accuracy and recall.
84
Training ConvNets
Chapter 5
5.5 Model_5_a
Model description
In this model we use transfer leaning from a pre-trained network. A
pre-trained model is a saved network that was previously trained on a large
dataset, typically on a large-scale image-classification task. You either use the
pre-trained model as is or use transfer learning to customize this model to a
given task. The intuition behind transfer learning for image classification is
that if a model is trained on a large and general enough dataset, this model
will effectively serve as a generic model of the visual world. You can then take
advantage of these learned feature maps without having to start from scratch
by training a large model on a large dataset.
The network used is the MobileNet V2 model developed at Google. This is
pre-trained on the ImageNet dataset, a large dataset consisting of 1.4M
images and 1000 classes. ImageNet is a research training dataset with a wide
variety of categories.
Model structure
In model_5_a we only load a network that doesn't include the classification
layers at the top, and add one classification layer which only one neuron.
85
Training ConvNets
Chapter 5
5.6 Model_5_b
Model description
Here we just add class weights to the model.
Adding weight balancing make different but not really big so we decide to
add more fully connected layers. In model_5_c.
86
Training ConvNets
Chapter 5
5.7 Model_5_c
Model description
Here we add three fully connected layers to the model on have with
1024,2048,1 neurons respectively. Also we keep the class weights with a ratio
of 1 to 3 for class 0 and 1 respectively but after 15 epochs we change the ratio
to be 4 to 7.
87
Training ConvNets
Chapter 5
In the above two graphs explain the epochs 10 to 45 after epoch 5 here the
weights changed to 4:7 for the classes 0:1 respectively the model training
accuracy increases and loss decreases with epochs numbers but validation
metrics go up and down so the model may get to over fit the data also when
validation recall calculated the model is missing classifying positive cases.
5.8 Model_5_d
Model description
In this model we use transfer leaning from a pre-trained network. A
pre-trained model is a saved network that was previously trained on a large
dataset.
The network used is the MobileNet V2 model developed at Google. This is
pre-trained dataset, on the ImageNet a large dataset consisting of 1.4M
images and 1000 classes. ImageNet is a research training dataset with a wide
variety of categories.
In this model we train the up 55 layers of MobileNet model.
Model structure
In this model we train (or "fine-tune") the weights of the 55 top layers
of the MobileNet model alongside the training of the classifier we added. The
training process will force the weights to be tuned from generic feature maps
to features associated specifically with the breast cancer dataset.
The classifier layers are three fully connected layers to the model on have
with 1024,2048,1 neurons respectively. Also we keep the class weights with
a ratio of 3 to 7 for class 0 and 1 respectively but after 35 epochs we change
the ratio to be 2 to 8.and after more epochs the ratio changed.
88
Training ConvNets
Chapter 5
89
Training ConvNets
Chapter 5
We start with class weight ratio 3:7 for 0:1 after. And the model easily reach
training acuracy 0.996 with loss 0.008 and validation accuarcy=0.973 with
loss=0.279 at epoch 20,which we can say it’s the better metrics of this mode
where the recall(0)=1 and recall(1)=0.9
(Recall=0.9 accuracy = 0.97 )
we countinue training this model with the same ratio to epoch 35 where here
the training loss decreased to 0.006 but the more training desonet made a
big change in validation its only decrease the recall of(1).
After that we change the ratio to be 2:8 for 5 more epochs finally we increase
class 1 value more but the model gets worse with more training.
90
Training ConvNets
Chapter 5
5.9 Model_5_e
Model description
Model structure
In this model we train (or "fine-tune") the weights of the 100 top layers
of the MobileNet model alongside the training of the classifier we added. The
training process will force the weights to be tuned from generic feature maps
to features associated specifically with the breast cancer dataset.
The classifier layers are three fully connected layers to the model on have
with 1024,2048,1 neurons respectively. Also we keep the class weights with
a ratio of 3 to 7 for class 0 and 1 respectively but after 35 epochs we change
the ratio to be 2 to 8.and after more epochs the ratio changed.
91
Training ConvNets
Chapter 5
Use the same class weights with ratio 3:7. after 15 epochs the training
accuracy =99% , validation accuracy=98% and recall =95% which the best
matrics of the model we countinue training and measureing metrics to 35
epochs but with more training epochs the model gets worse with prediction
of positive cases.
92
Training ConvNets
Chapter 5
5.10 GitHub
After demonstrating all these models. It took a very long time while
running the models and put them to training we also created a github
repository that contains more information about our project and the code of
all the models we demonstrated and more. And here is the link
https://github.com/GP-FCI-SU/Breast-Cancer-Detection-with-Convolutional-Neural-
Networks
93
Conclusion & Future work
Chapter 6
Conclusion
We were able to achieve better than expected results as 98% for the
validation accuracy and 95% for the recall. as a proof of concept, we feel that
we have demonstrated that ConvNets can successfully be trained to predict
whether mammograms are normal or abnormal.
The life and death nature of diagnosing cancer creates many obstacles to
putting a system like this into practice. We feel that using a system to output
the probabilities rather than the predictions would allow such a system to
provide additional information to radiologists rather than replacing them. In
addition, the ability to adjust the decision threshold would allow radiologists
to focus on more ambiguous scans while devoting less time to scans which
have very low probabilities.
this helps in eliminating the unnecessary waiting time as well as reducing
human and technical errors in diagnosing Breast Cancer.
Future work
It would include creating a system which would take an entire,
unaltered scan as input and analyses it for abnormalities. And it would classify
mammogram images into other classes (benign and malignant)
Other networks will be suggested which include the very deep convolutional
network (VGG) and the residual (ResNet) architecture.
It would include creating a user interface for the specialists with a good user
experience to help them in decision making.
Unfortunately, the lack of available training data seems to be the bottleneck
for pursuing this in the future.
94
Glossary
GLOSSARY
• Backward pass (backpropagation): The calculation of internal variable
adjustments according to the optimizer algorithm, starting from the
output layer and working back through each layer to the input.
• Batch: The set of examples used during training of the neural network.
• Dense and Fully Connected (FC): Each node in one layer is connected to
each node in the previous layer.
• Early Stopping: In this method, we track the loss on the validation set
during the training phase and use it to determine when to stop training
such that the model is accurate but not overfitting.
• Learning rate: The “step size” for loss improvement during gradient
descent.
Glossary
• Loss: The discrepancy between the desired output and the actual output.
• Max Pooling: A pooling process in which many values are converted into
a single value by taking the maximum value from among them.
• Max Pooling: When working with RGB images we perform max pooling
on each color channel using the same window size and stride. Max
pooling on each color channel is performed in the same way as with
grayscale images, i.e. by selecting the max value in each window.
• MSE: Mean squared error, a type of loss function that counts a small
number of large discrepancies as worse than a large number of small
ones.
• Padding: Adding pixels of some value, usually 0, around the input image.
• Resizing: When working with images of different sizes, you must resize
all the images to the same size so that they can be fed into a CNN.
• RGB Image: Color image composed of 3 color channels: Red, Green, and
Blue.
Glossary
• Stride: the number of pixels to slide the kernel (filter) across the image.
• Test set: The data used for testing the final performance of our neural
network.
• Training Set: The data used for training the neural network.
• Validation dataset: This dataset is not used for training. Instead, it used
to test the model during training.
References
REFERENCES
1. https://medical-clinical-reviews.imedpub.com/abstract/breast-cancer-
detection-and-screening-23112.html#:~:text=Abstract-
,Breast%20Cancer%20Detection%20and%20Screening,clinical%20exam
ination%2C%20imaging%20and%20cytology
2. https://colab.research.google.com/drive/1c4LyKBBVnKQgOWCN-
KcTTzTTEglqM4GB?usp=sharing
3. https://www.researchgate.net/
4. https://www.academia.edu/
5. https://www.analyticsvidhya.com/
6. https://github.com/escuccim/mias-mammography
7. https://www.researchgate.net/publication/261200035_Breast_cancer_detection_A
_review_on_mammograms_analysis_techniques
8. https://ieeexplore.ieee.org/abstract/document/4454239
9. https://jamanetwork.com/journals/jama/article-abstract/1883018
10. https://inis.iaea.org/search/search.aspx?orig_q=RN:19041135
11. https://freecontent.manning.com/the-computer-vision-pipeline-part-3-image-
preprocessing/
12. https://github.com/escuccim/mammography-models