Sie sind auf Seite 1von 100

ACKNOWLEDGMENT

Thanks to Allah for giving us the strength and the courage to complete
a project with such complexity, after all what we have been through. We
would like to thank our supervisor Dr/ Ahmed Elshewy for help and advice.
And We have the pleasure to express our deep appreciation to all the staff
members of faculty of computers and information for their advice and
helpful discussions especially Dr/ Mohammed Salah. Also, we would like to
especially thank our families and our colleagues for their encouragement and
motivation. And finally, I want to thank this team for acting with such
harmony, collaboration, and professionalism while working on our project.
TEAM MEMBERS

1. Ahmed Mohamed Abdulradi


2. Mohamed Fathy Saleh
3. Omar Hamdy Sedek
4. Dina Ragab Elsayed
5. Shaza Hossam El-Deen
6. Asmaa Nasser Youssef
7. Asmaa Nasser Gharieb
ABSTRACT

Breast cancer is the most frequent cancer among women and the second
most common cancer overall, impacting 2.1 million women each year, and also
causes the greatest number of cancer-related deaths among women. In 2018, it is
estimated that 627,000 women died from breast cancer – that is approximately
15% of all cancer deaths among women. While breast cancer rates are higher among
women in more developed regions, rates are increasing in nearly every region
globally. Breast Cancer is the most prevalent cancer among Egyptian women and
constitutes 29% of National Cancer Institute cases. Median age at diagnosis is one
decade younger than in countries of Europe and North America.

Our goal is to build a convolutional neural network that works on mammography


images of breast cancer to make a binary classification normal or abnormal.

Machine learning (ML) has become a vital part of medical imaging research. ML
methods have evolved over the years from manual seeded inputs to automatic
initializations. The advancements in the field of ML have led to more intelligence In
the proposed system, We present a deep convolutional neural network for breast
cancer screening exam classification, trained and evaluated on over DDSM. The
DDSM is a database of 2,620 scanned film mammography studies. It contains
normal and benign cases with verified pathology information. To increase the
amount of training data we extract the Regions of Interest (ROI) from each image,
perform data augmentation and then train ConvNets on the augmented data. The
ConvNets were trained to predict both whether a scan was normal or abnormal,
Multiple datasets were created using different ROI extraction techniques and
amounts of data augmentation. The datasets ranged in size from 27,000 training
images to 62,000 training images. To validate our model, Different models
performed differently on different datasets with different classification methods.
The results indicate that the best model was able to achieve relatively high
accuracy=99.6%, recall=95%, and loss=0.008, while the others traded off precision
and recall. we are doing this to enhance the efforts being done by our community
leaders fighting the cancer and especially breast cancer like Baheya Hospital and
Egyptian National Cancer Institute.
TABLE OF CONTENTS
Chapter 1: introduction
1.1 Artificial Intelligence ……………………………………………………………………………………………….. 8
1.1.1 Difference between Human and Machine Intelligence …………………………………………… 9
1.1.2 Applications of AI ……………………………………………………………………………………………………………… 9
1.1.3 Artificial Intelligence in Medicine ………………………………………………………………………………….. 10
1.2 Machine Learning …………………………………………………………………………………………………… 11
1.2.1 Difference between Machine Learning and Artificial Intelligence ………………………… 11
1.2.2 Types of Machine Learning …………………………………………………………………………………………… 12
1.2.3 Machine Learning:How it works? …………………………………………………………………………………. 13
1.3 Deep Learning ………………………………………………………………………………………………………… 14
1.3.1 Deep Learning:How it works? ……………………………………………………………………………………….. 15
1.4 Deep neural networks ……………………………………………………………………………………..…… 16
1.4.1 What is the difference between neural networks, DL, ML and AI? ……………………… 17
1.4.2 How is deep learning being used? ……………………………………………………………………………… 18
1.4.3 Where best to apply deep learning? ……………………………………………………………………….… 19
1.4.4 How long does it take to train a deep learning model? …………………………….………….. 19

Chapter 2: Neural network


2.1 Artificial Neural Network ……………………………………………………………………………………… 20
2.1.1 How can ANN recognize patterns? …………………………………………………………………………….. 20
2.1.2 Types of Neural Networks ………………………………………………………………………………………….… 22
2.1.3 Activation Functions ……………………………………………………………………………………………………... 23
2.1.4 Types of activation functions ……………………………………………………………………………………… 24
2.1.5 Gradient descent ……………………………………………………………………………………………………………. 26
2.1.6 Versions of gradient descent ………………………………………………………………………………………. 26
2.1.7 Optimization techniques for Gradient Descent ……………………………………………………… 27
2.1.8 Loss Function ……………………………………………………………………………………………………………….… 27
2.2 Convolutional Neural Networks ……………………………………………………………………… 28
2.2.1 CNN Architecture ………………………………………………………………………………………………………….. 28
2.2.2 Layers in a Convolutional Neural Network ………………………………………………………….…. 29
2.2.3 What do we need to create a CNN Model? ……………………………………………………..….… 34
Chapter 3: Technologies & Tools
3.1 Python ……………………………………………………………………………………………………………..… 35
3.1.1 Python in AI ……………………………………………………………………………………………………………….. 36
3.1.2 Why Python is the most popular programming language used for AI? ………. 36
3.2 Pytorch …………………………………………………………………………………………………………….… 40
3.2.1 A brief history of PyTorch ……………………………………………………………………………………..… 41
3.2.2 PyTorch Community ……………………………………………………………………………………………….. 42
3.2.3 Why use PyTorch in AI? ………………………………………………………………………………………….. 43
3.2.4 PyTorch-Approach …………………………………………………………………………………………………… 44
3.3 MATLAB ……………………………………………………………………………………………………….....… 45
3.4 What is TensorFlow? ……………………………………………………………………………..…….…. 50
3.4.1 TensorFlow Architecture …………………………………………………………………………………………. 50
3.4.2 Where can TensorFlow run? …………………………………………………………………………..…….. 51
3.4.3 Components of TensorFlow ……………………………………………………………………………....… 52
3.4.4 Why is TensorFlow popular? ………………………………………………………………………………… 53
3.4.5 Program Elements in TensorFlow …………………………………………………………….……….... 53
3.5 What is Keras? ………………………………………………………………………………………………..... 54
3.5.1 Keras VS TensorFlow ………………………………………………………………………………………………. 55
3.5.2 Advantages of Keras ……………………………………………………………………………………………….. 56
3.5.3 Keras Fundamentals for Deep Learning …………………………………………………………….. 57
3.5.4 Compiling, Training, and Evaluate ……………………………………………………………………….. 58
3.5.5 Keras Applications ……………………………………………………………………………………………………. 59

Chapter 4: Preprocessing
4.1 dataset ………………………………………………………………………………………………….……………. 60
4.1.1 training dataset ………………………………………………………………………………………….……….……. 61
4.2 preprocessing ……………………………………………………………………………………….…………. 63
4.2.1 ROI extraction method 1 …………………………………………………………………………….……….… 64
4.2.2 ROI extraction method 2 …………………………………………………………………………….…….…. 64
4.2.3 Normal images ……………………………………………………………………………………………….…….… 65
4.2.4 MIAS images ……………………………………………………………………………………………….……… 66
4.2.5 image preprocessing techniques ……………………………………………………………………………… 66
4.2.5.1 standardize images ……………………………………………………………………………………. 68
4.2.5.2 data augmentation ……………………………………………………………………………………. 68
4.2.5.3 scaling images ……………………………………………………………………………………………. 69
4.2.5.4 flipping images …………………………………………………………………………………………… 69
4.2.5.5 image rotation ……………………………………………………………………………………………. 70
4.2.5.6 resize images ………………………………………………………………………………………………. 71
4.2.5.7 threshold ………………………………………………………………………………………………………. 73
4.3 Image segmentation ………………………………………………………………………………………………………… 74
4.3.1 Mask ……………………………………………………………………………………………………………….….. 75
4.3.2 Scan …………………………………………………………………………………………………………………… 76

Chapter 5: Training ConvNets


5.1 Model_4_a …………………………………………………………………………………………….….. 77
5.2 Model_4_c ………………………………………………………………………………………….……. 79
5.3 Model_4_d ………………………………………………………………………………………….….… 81
5.4 Mode;_4_e …………………………………………………………………………………………..…… 84
5.5 Model_5_a ………………………………………………………………………………………………… 85
5.6 Model_5_b ……………………………………………………………………………………………….. 86
5.7 Model_5_c ………………………………………………………………………………………………... 87
5.8 Model_5_d ……………………………………………………………………………………………..… 88
5.9 Model_5_e ………………………………………………………………………………………..……… 91
5.10 GitHub ……………………………………………………………………………………………………… 93

Chapter 6: Conclusion & future work


6.1 conclusion …………………………………………………………………………………………………… 94
6.2 future work …………………………………………………………………………………………………. 94
Glossary
References
Introduction
Chapter 1

INTRODUCTION
In this chapter we are going to talk briefly about Artificial Intelligence,
Machine Learning and Deep Learning showing how those sub-children
topics have helped us in determining the most convenient technology and
approach to be imported and adopted in the project implementation.

1.1 Artificial Intelligence.


Since the invention of computers or machines, their capability to
perform various tasks went on growing exponentially. Humans have
developed the power of computer systems in terms of their diverse working
domains, their increasing speed, and reducing size with respect to time.
Artificial Intelligence: is an approach to make a computer, a robot, or
a product to think how smart humans think.AI is a study of how the human
brain thinks, learns, decides, and works, when it tries to solve problems. And
finally, this study outputs intelligent software systems. The aim of AI is to
improve computer functions which are related to human knowledge, for
example, reasoning, learning, and problem-solving.

8
Introduction
Chapter 1

The intelligence is intangible, and it is composed of number of steps:


● Reasoning: It is the set of processes that enables us to provide basis for
judgement, making decisions, and prediction.
● Learning: It is the activity of gaining knowledge or skill by studying,
practicing, being taught, or experiencing something.
● Problem Solving: It is the process in which one perceives and tries to
arrive at a desired solution from a present situation by taking some
path, which is blocked by known or unknown hurdles.
● Perception: It is the process of acquiring, interpreting, selecting, and
organizing sensory information.
● Linguistic Intelligence: It is one’s ability to use, comprehend, speak,
and write the verbal and written language. It is important in
interpersonal communication.

1.1.1 Difference between Human and Machine Intelligence


● Humans perceive by patterns whereas machines perceive by a set of
rules and data.
● Humans store and recall information by patterns, machines do it by
searching algorithms. For example, the number 40404040 is easy to
remember, store, and recall as its pattern is simple.
● Humans can figure out the complete object even if some part of it is
missing or distorted, whereas the machines cannot do it correctly.

1.1.2 Applications of AI
1. Gaming − AI plays an important role for machines to think of a large
number of possible positions based on deep knowledge in strategic
games. for example, chess, river crossing, N-queen’s problems and etc.

2. Natural Language Processing − Interact with the computer that


understands natural language spoken by humans.

9
Introduction
Chapter 1

3. Expert Systems − Machine or software provide explanation and advice


to the users.

4. Vision Systems − Systems understand, explain, and describe visual


input on the computer.

5. Speech Recognition − There are some AI based speech recognition


systems that have the ability to hear and express as sentences and
understand their meanings while a person talks to it. For example, Siri
and Google assistant.

6. Handwriting Recognition − The handwriting recognition software


reads the text written on paper and recognizes the shapes of the
letters and converts it into editable text.

7. Intelligent Robots − Robots are able to perform the instructions given


by a human.

1.1.3 Artificial Intelligence in Medicine:


AI in medicine refers to the use of artificial intelligence technology /
automated processes in the diagnosis and treatment of patients who require
care. Whilst diagnosis and treatment may seem like simple steps, there are
many other background processes that must take place in order for a patient
to be properly taken care of, for example:
● Gathering of data through patient interviews and tests
● Processing and analyzing results
● Using multiple sources of data to come to an accurate diagnosis
● Determining an appropriate treatment method (often presenting
options)
● Preparing and administering the chosen treatment method
● Patient monitoring
● Aftercare, follow-up appointments etc.

10
Introduction
Chapter 1

The argument for increased use of AI in medicine is that quite a lot of the
above could be automated - automation often means tasks are completed
more quickly, and it also frees up a medical professional’s time when they
could be performing other duties, ones that cannot be automated, and so are
seen as a more valuable use of human resources.

1.2 Machine Learning


Machine Learning is a sub-area of artificial intelligence, whereby the
term refers to the ability of IT systems to independently find solutions to
problems by recognizing patterns in databases. In other words: Machine
Learning enables IT systems to recognize patterns on the basis of existing
algorithms and data sets and to develop adequate solution concepts.
Therefore, in Machine Learning, artificial knowledge is generated on the basis
of experience.

1.2.1 Difference between Machine Learning and Artificial


Intelligence:
While artificial intelligence (AI) is the broad science of mimicking
human abilities, machine learning is a specific subset of AI that trains a
machine how to learn.
In order to enable the software to independently generate solutions, the prior
action of people is necessary. For example, the required algorithms and data
must be fed into the systems in advance and the respective analysis rules for
the recognition of patterns in the data stock must be defined. Once these two
steps have been completed, the system can perform the following tasks by
Machine Learning:
• Finding, extracting, and summarizing relevant data
• Making predictions based on the analysis data
• Calculating probabilities for specific results
• Adapting to certain developments autonomously
• Optimizing processes based on recognized patterns

11
Introduction
Chapter 1

1.2.2 Types of Machine Learning:

Basically, algorithms play an important role in Machine Learning: On


the one hand, they are responsible for recognizing patterns and on the other
hand, they can generate solutions. Algorithms can be divided into different
categories:
Supervised learning: Train by using labeled data and learn and predict new
labels for unseen input data.
• Classification is the task of predicting a discrete class label, such as
“black, white, or gray” and “tumor or not tumor”.
• Regression is the task of predicting a continuous quantity, such as
“weight”, “probability”, and “cost”.

Unsupervised learning: Detect patterns and relationships between data


without using labeled data.
• Clustering algorithms: Discover how to split the data set into a
number of groups such that the data points in the same groups are
more similar to each other compared to data points in other groups.

Semi-supervised learning: A machine learning technique that falls between


supervised and unsupervised learning.
• It includes some labeled data with a large amount of unlabeled
data.
• Here is an example that uses pseudo-labeling:
1. Use labeled data to train a model.
2. Use the model to predict labels for the unlabeled data.
3. Use the labeled data and the newly generated labeled data to
create a new model.

12
Introduction
Chapter 1

Reinforcement learning: Reinforcement learning uses trial and error (a


rewarding approach).
• The algorithm discovers an association between the goal and the
sequence of events that leads to a successful outcome.
• Example reinforcement learning applications: ƒ Robotics: A robot
that must find its way. ƒ Self-driving cars.

Machine learning is a data analytics technique that teaches computers to do


what comes naturally to humans and animals: learn from experience.
Machine learning algorithms use computational methods to “learn”
information directly from data without relying on a predetermined equation
as a model.

The algorithms adaptively improve their performance as the number of


samples available for learning increases. Deep learning is a specialized form
of machine learning.
With the rise in big data, machine learning has become a key technique for
solving problems in areas, such as:
• Computational finance, for credit scoring and algorithmic trading.
• Image processing and computer vision, for face recognition, motion
detection, and object detection.
• Computational biology, for tumor detection, drug discovery, and DNA
sequencing.
• Energy production, for price and load forecasting.
• Automotive, aerospace, and manufacturing, for predictive
maintenance.
• Natural language processing, for voice recognition applications.

13
Introduction
Chapter 1

1.2.3 Machine Learning: How it works?


In a way, Machine Learning works in a similar way to human learning.
For example, if a child is shown images with specific objects on them, they
can learn to identify and differentiate between them. Machine Learning
works in the same way: Through data input and certain commands, the
computer is enabled to "learn" to identify certain objects (persons, objects,
etc.) and to distinguish between them. For this purpose, the software is
supplied with data and trained. For instance, the programmer can tell the
system that a particular object is a human being (="human") and another
object is not a human being (="no human"). The software receives continuous
feedback from the programmer. These feedback signals are used by the
algorithm to adapt and optimize the model. With each new data set fed into
the system, the model is further optimized so that it can clearly distinguish
between "humans" and "non-humans" in the end.

1.3 Deep Learning

● Deep Learning is a subfield of machine learning concerned with


algorithms inspired by the structure and function of the brain
called artificial neural networks.

● Deep learning is an artificial intelligence function that imitates the


workings of the human brain in processing data and creating patterns
for use in decision making.

● Deep learning learns from vast amounts of unstructured data that


would normally take humans decades to understand and process.

14
Introduction
Chapter 1

● Deep learning is a field that is based on learning and improving on its


own by examining computer algorithms. While machine learning uses
simpler concepts, deep learning works with artificial neural networks,
which are designed to imitate how humans think and learn. Until
recently, neural networks were limited by computing power and thus
were limited in complexity. However, advancements in big data
analytics have permitted larger, sophisticated neural networks,
allowing computers to observe, learn, and react to complex situations
faster than humans. Deep learning has aided image classification,
language translation, speech recognition. It can be used to solve any
pattern recognition problem and without human intervention.

● Also known as deep neural learning or deep neural network.

1.3.1 Deep Learning: How it works?


Deep learning has evolved hand in hand with the digital era, which has
brought about an explosion of data in all forms and from every region of the
world.
This data, known simply as big data, is drawn from sources like social
media, internet search engines, e-commerce platforms, and online cinemas,
among others. This enormous amount of data is readily accessible and can
be shared through applications like cloud computing.
Neural networks are comprised of layers of nodes, much like the human brain
is made up of neurons. Nodes within individual layers are connected to
adjacent layers. The network is said to be deeper based on the number of
layers it has. A single neuron in the human brain receives thousands of signals
from other neurons. In an artificial neural network, signals travel between
nodes and assign corresponding weights. A heavier weighted node will exert
more effect on the next layer of nodes. The final layer compiles the weighted

15
Introduction
Chapter 1

inputs to produce an output. Deep learning systems require powerful


hardware because they have a large amount of data being processed and
involves several complex mathematical calculations. Even with such
advanced hardware, however, deep learning training computations can take
weeks.
Deep learning systems require large amounts of data to return accurate
results; accordingly, information is fed as huge data sets. When processing
the data, artificial neural networks are able to classify data with the answers
received from a series of binary true or false questions involving highly
complex mathematical calculations. For example, a facial recognition
program works by learning to detect and recognize edges and lines of faces,
then more significant parts of the faces, and, finally, the overall
representations of faces. Over time, the program trains itself, and the
probability of correct answers increases. In this case, the facial recognition
program will accurately identify faces with time.

1.4 Deep neural networks


A deep neural network (DNN) is an artificial neural network (ANN) with
multiple layers between the input and output layers. The DNN finds the
correct mathematical manipulation to turn the input into the output, whether
it be a linear relationship or a non-linear relationship.
The network moves through the layers calculating the probability of each
output. For example, a DNN that is trained to recognize dog breeds will go
over the given image and calculate the probability that the dog in the image
is a certain breed. The user can review the results and select which
probabilities the network should display and return the proposed label. Each
mathematical manipulation as such is considered a layer, and complex DNN
have many layers, hence the name "deep" networks.

16
Introduction
Chapter 1

DNNs are typically feedforward networks in which data flows from the input
layer to the output layer without looping back. At first, the DNN creates a map
of virtual neurons and assigns random numerical values, or "weights", to
connections between them. The weights and inputs are multiplied and return
an output between 0 and 1. If the network did not accurately recognize a
particular pattern, an algorithm would adjust the weights. That way the
algorithm can make certain parameters more influential, until it determines
the correct mathematical manipulation to fully process the data.

1.4.1 What is the difference between neural networks, deep


learning, machine learning and AI?

• Artificial intelligence is the study of how to build machines capable of


carrying out tasks that would typically require human intelligence.

• Machine learning is the process of teaching a computer to carry out a


task, rather than programming it how to carry that task out step by
step.

17
Introduction
Chapter 1

• Deep learning is a subset of machine learning, whose capabilities differ


in several key respects from traditional shallow machine learning,
allowing computers to solve a host of complex problems that couldn't
otherwise be tackled.

• Neural networks are mathematical models whose structure is loosely


inspired by that of the brain.

• Each neuron within a neural network is a mathematical function that


takes in data via an input, transforms that data into a more amenable
form, and then spits it out via an output. You can think of neurons in a
neural network as being arranged in layers.

1.4.2 How is deep learning being used?

For many tasks, for recognizing and generating images, speech and
language, and in combination with reinforcement learning to match human-
level performance in games ranging from the ancient, such as Go, to the
modern, such as Dota 2 and Quake III.

Deep-learning systems are a foundation of modern online services. Such


systems are used by Amazon to understand what you say -- both your
speech and the language you use -- to the Alexa virtual assistant or by Google
to translate text when you visit a foreign-language website.
Every Google search uses multiple machine-learning systems, to understand
the language in your query through to personalizing your results.

18
Introduction
Chapter 1

1.4.3 Where is best to apply deep learning?


When your data is largely unstructured, and you have a lot of it.
Deep learning algorithms can take messy and broadly unlabeled data such as
video, images, audio recordings, and text.
It’s also best when applied to complex problems and things that would be
vastly expensive to solve with human decision making. Image processing is a
great example of this.
And also, deep learning is only appropriate if you have the high-end
computing power to make it work or are partnering with an analytics provider
who has the infrastructure and skills that might be lacking in-house.

1.4.4 How long does it take to train a deep learning model?


The time taken to train a deep-learning model varies hugely, from
hours to weeks or more, and is dependent on factors such as the available
hardware, optimization, the number of layers in the neural network, the
network architecture, the size of the dataset and more.

19
Neural Network
Chapter 2

NEURAL NETWORK

2.1 Neural Network (also called Neural Network)


Neural networks represent deep learning using artificial intelligence.
Neural networks are an algorithm inspired by the neurons in our brain. It is
designed to recognize patterns in complex data, and often performs the best
when recognizing patterns in audio, images, or video.

2.1.1 How can ANN recognize patterns?


They interpret sensory data through a kind of machine perception,
labeling or clustering raw input. The patterns they recognize are numerical,
contained in vectors, into which all real-world data, be it images, sound, text,
or time series, must be translated.
Neural networks help us cluster and classify. You can think of them as a
clustering and classification layer on top of the data you store and manage.
They help to group unlabeled data according to similarities among the
example inputs, and they classify data when they have a labeled dataset to
train on.

20
Neural Network
Chapter 2

A neural network simply consists of neurons (also called nodes). These nodes
are connected in some way. Then each neuron holds a number, and each
connection holds a weight.
These neurons are split between the input, hidden and output layer. In
practice, there are many layers and there are no general best number of
layers.

Input Layer: Input layer contains inputs and weights.


Hidden Layer: In a neural network, there can be more than one hidden
layer. Hidden layer contains the summation and activation function.
Output Layer: Output layer consists the set of results generated by the
previous layer. It also contains the desired value, i.e. values that are already
present in the output layer to check with the values generated by the previous
layer. It may be also used to improve the end results.

From other perspective Artificial Neuron are also called as perceptron. This
consist of the following basic terms:
● Input
● Weight
● Bias
● Activation Function
● Output

21
Neural Network
Chapter 2

2.1.2 Types of Neural Networks


The different types of neural networks are discussed below:

1) Feed-forward Neural Network

This is the simplest form of ANN (artificial neural network);


data travels only in one direction (input to output). This is the
example we just looked at. When you actually use it, it`s fast; when
you’re training it, it takes a while. Almost all vision and speech
recognition applications use
some form of this type of neural network.

2) Radial Basis Functions Neural Network

This model classifies the data point based on its distance from
a center point. If you don’t have training data, for example, you’ll
want to group things and create a center point. The network looks for
data points that are similar to each other and groups them. One of
the applications for this is power restoration systems.

3) Kohonen Self-organizing Neural Network

Vectors of random input are input to a discrete map comprised


of neurons. Vectors are also called dimensions or planes. Applications
include using it to recognize patterns in data like a medical analysis.

4) Recurrent Neural Network


In this type, the hidden layer saves its output to be used for
future prediction. The output becomes part of its new input.
Applications include text-to-speech conversion.

22
Neural Network
Chapter 2

5) Convolution Neural Network

In this type, the input features are taken in batches—as if they


pass through a filter. This allows the network to remember an
image in parts. Applications include signal and image processing,
such as facial recognition.

6) Modular Neural Network

This is composed of a collection of different neural networks


working together to get the output. This is cutting-edge and is still
in the research phase.

2.1.3 Activation Functions


Activation functions are mathematical equations that determine the
output of a neural network. The function is attached to each neuron in the
network and determines whether it should be activated (“fired”) or not, based
on whether each neuron’s input is relevant for the model’s prediction.
Activation functions also help normalize the output of each neuron to a
range between 1 and 0 or between -1 and 1.

23
Neural Network
Chapter 2

2.1.4 Types of activation functions

1) Sigmoid Function

The sigmoid function is used when the model is predicting


probability.

2) Threshold Function

The threshold function is used when you don’t want to worry about
the uncertainty in the middle.

24
Neural Network
Chapter 2

3) ReLU (rectified linear unit) Function

The ReLU (rectified linear unit) function gives the value but says if it’s
over 1, then it will just be 1, and if it’s less than 0, it will just be 0. The ReLU
function is most commonly used these days.

4) Hyperbolic Tangent Function

The hyperbolic tangent function is similar to the sigmoid function but


has a range of -1 to 1.

25
Neural Network
Chapter 2

2.1.5 Gradient descent


The most used algorithm to train neural networks is gradient descent.
And what’s this gradient? the gradient is a numeric calculation allowing us to
know how to adjust the parameters of a network in such a way that its
output deviation is minimized.

Gradient Descent is an iterative optimization algorithm, used to find the


minimum value for a function. The general idea is to initialize the
parameters to random values, and then take small steps in the direction of
the “slope” at each iteration. Gradient descent is highly used in supervised
learning to minimize the error function and find the optimal values for the
parameters.

2.1.6 Versions of gradient descent


The algorithm has several versions depending on the number of
samples that we introduce to the network for each iteration:

• Batch gradient descent: all available data is injected at once. This


version implies a high risk of getting stuck, since the gradient will be
calculated using all the samples, and the variations will be minimal
sooner or later. As a general rule: for a neural network it’s always
positive to have an input with some randomness.

• Stochastic gradient descent: a single random sample is introduced on


each iteration. The gradient will be calculated for that specific sample
only, implying the introduction of the desired randomness, and
making more difficult the possibility of getting stuck.

• Mini-batch (Stochastic) gradient descent: instead of feeding the


network with single samples, N random items are introduced on each
iteration. This preserves the advantages of the second version and also
getting a faster training due to the parallelization of operations.

26
Neural Network
Chapter 2

2.1.7 Optimization techniques for Gradient Descent

Some examples of optimization algorithms include:


● ADADELTA
● ADAGRAD
● ADAM
● NESTEROVS
● NONE
● RMSPROP
● SGD
● CONJUGATE GRADIENT
● HESSIAN FREE
● LBFGS
● LINE GRADIENT DESCENT

2.1.8 Loss Function


The Loss Function is one of the important components of Neural
Networks. Loss is nothing but a prediction error of Neural Net. And the
method to calculate the loss is called Loss Function. In simple words, the Loss
is used to calculate the gradients. And gradients are used to update the
weights of the Neural Net. This is how a Neural Net is trained.

● Mean Squared Error (MSE): MSE loss is used for regression tasks.
As the name suggests, this loss is calculated by taking the mean of
squared differences between actual(target) and predicted values.

● Binary Cross entropy (BCE): BCE loss is used for the binary
classification tasks. If you are using BCE loss function, you just
need one output node to classify the data into two classes. The
output value should be passed through a sigmoid activation
function and the range of output is (0 – 1).

27
Neural Network
Chapter 2

● Categorical Cross entropy (CC): When we have a multi-class


classification task, one of the loss functions you can go ahead is
this one. If you are using CCE loss function, there must be the
same number of output nodes as the classes. And the final layer
output should be passed through a SoftMax activation so that each
node output a probability value between (0–1).

● Sparse Categorical Cross entropy (SCC): This loss function is


almost similar to CCE except for one change. When we are using
SCCE loss function, you do not need to one encode the target
vector. If the target image is of a cat, you simply pass 0, otherwise
1. Basically, whichever the class is you just pass the index of that
class.

2.2 Convolutional Neural Networks (CNNs / ConvNets)


Convolutional Neural Networks are very similar to ordinary Neural
Networks from the previous chapter: they are made up of neurons that have
learnable weights and biases. Each neuron receives some inputs, performs a
dot product, and optionally follows it with a non-linearity.

ConvNet architectures make the explicit assumption that the inputs are
images, which allows us to encode certain properties into the architecture.
These then make the forward function more efficient to implement and
vastly reduce the amount of parameters in the network.

2.2.1 CNN Architecture


Simple ConvNet is a sequence of layers, and every layer of a ConvNet
transforms one volume of activations to another through a differentiable
function. We use three main types of layers to build ConvNet architectures:
Convolutional Layer, Pooling Layer, and Fully-Connected Layer (exactly as
seen in regular Neural Networks). We will stack these layers to form a full
ConvNet architecture.

28
Neural Network
Chapter 2

2.2.2 Layers in a Convolutional Neural Network


A convolution neural network has multiple hidden layers that help in
extracting information from an image. The four important layers in CNN are:
1. Convolution layer
2. ReLU layer
3. Pooling layer
4. Fully connected layer

Convolution Layer
This is the first step in the process of extracting valuable features from
an image. A convolution layer has a number of filters that perform the
convolution operation. Every image is considered as a matrix of pixel values.
Consider the following 5x5 image whose pixel values are either 0 or 1. There’s
also a filter matrix with a dimension of 3x3. Slide the filter matrix over the
image and compute the dot product to get the convolved feature matrix.

29
Neural Network
Chapter 2

ReLU Layer

ReLU stands for the rectified linear unit. Once the feature maps are
extracted, the next step is to move them to a ReLU layer.

ReLU performs an element-wise operation and sets all the negative pixels to
0. It introduces non-linearity to the network, and the generated output is a
rectified feature map. Below is the graph of a ReLU function:

The original image is scanned with multiple convolution and ReLU layers for
locating the features.

30
Neural Network
Chapter 2

Pooling Layer

Pooling is a down-sampling operation that reduces the dimensionality


of the feature map. The rectified feature map now goes through a pooling
layer to generate a pooled feature map.

31
Neural Network
Chapter 2

The pooling layer uses various filters to identify different parts of the image
like edges, corners, body, feathers, eyes, and beak.

Here’s how the structure of the convolution neural network looks so far:

The next step in the process is called flattening. Flattening is used to convert
all the resultant 2-Dimensional arrays from pooled feature maps into a single
long continuous linear vector.

32
Neural Network
Chapter 2

The flattened matrix is fed as input to the fully connected layer to classify
the image.

Fully connected
Fully connected layers connect every neuron in one layer to every
neuron in another layer. It is in principle the same as the traditional multi-
layer perceptron neural network (MLP). The flattened matrix goes through a
fully connected layer to classify the images.

33
Neural Network
Chapter 2

2.2.3 What do we need to create a CNN Model?


Creating a Supervised Machine Learning model is all about making a program
that is able to generalize to input samples that it has never seen before. This
task requires exposing the model — during training — to a certain number of
variations of input examples, which is likely to lead to sufficient accuracy. This
includes multiple steps that the model has to go through before it is available
for use:
Step 1: Making the model examine data.
Step 2: Making the model learn from its mistakes.
Step 3: Making a conclusion on how well the model performs.

Training Set
This dataset corresponds to Step 1 in the previous section. It includes the set
of input examples that the model will be fit into — or trained on — by adjusting
the parameters (i.e. weights in the context of Neural Networks).

Validation Set
In order for the model to be trained, it needs to periodically be evaluated
(Step 2), and that is exactly what the validation set is for. Through calculating
the loss (i.e. error rate) the model yields on the validation set at any given
point, we can know how accurate it is. This is the essence of training.
Subsequently, the model will tune its parameters based on the frequent
evaluation results on the validation set.

Test Set
This corresponds to the final evaluation that the model goes through after
the training phase (utilizing training and validation sets) has been completed.
This step is critical to test the generalizability of the model (Step 3). By using
this set, we can get the working accuracy of our model.

34
Technologies & Tools
Chapter 3

TECHNOLOGIES AND TOOLS

3.1 Python

Python is an interpreted, object-oriented, high-level programming


language with dynamic semantics. Its high-level built in data structures,
combined with dynamic typing and dynamic binding, make it very attractive
for Rapid Application Development, as well as for use as a scripting or glue
language to connect existing components together. Python's simple, easy to
learn syntax emphasizes readability and therefore reduces the cost of
program maintenance. Python supports modules and packages, which
encourages program modularity and code reuse. The Python interpreter and
the extensive standard library are available in source or binary form without
charge for all major platforms and can be freely distributed.

Often, programmers fall in love with Python because of the increased


productivity it provides. Since there is no compilation step, the edit-test-
debug cycle is incredibly fast. Debugging Python programs is easy: a bug or
bad input will never cause a segmentation fault. Instead, when the interpreter
discovers an error, it raises an exception. When the program doesn't catch the
exception, the interpreter prints a stack trace. A source level debugger allows
inspection of local and global variables, evaluation of arbitrary expressions,
setting breakpoints, stepping through the code a line at a time, and so on. The
debugger is written in Python itself, testifying to Python's introspective
power. On the other hand, often the quickest way to debug a program is to
add a few print statements to the source: the fast edit-test-debug cycle
makes this simple approach very effective.

35
Technologies & Tools
Chapter 3

3.1.1 Python in AI
As AI and ML are being applied across various channels and industries,
big corporations invest in these fields, and the demand for experts in ML and
AI grows accordingly. Jean Francois Puget, from IBM’s machine learning
department, expressed his opinion that Python is the most popular language
for AI and ML.
We have conducted some research on Python’s strong sides and found out
why you should opt in for Python when bringing your AI and ML projects to
life.

3.1.2 Why Python is the most popular programming language


used for AI?
A great library ecosystem
A great choice of libraries is one of the main reasons Python is the most
popular programming language used for AI. A library is a module, or a group
of modules published by different sources like PyPi which include a pre-
written piece of code that allows users to reach some functionality or perform
different actions. Python libraries provide base level items, so developers
don’t have to code them from the very beginning every time.
ML requires continuous data processing, and Python’s libraries let you access,
handle, and transform data. These are some of the most widespread libraries
you can use for ML and AI:
● Scikit-learn for handling basic ML algorithms like clustering, linear and
logistic regressions, regression, classification, and others.

● Pandas for high-level data structures and analysis. It allows merging


and filtering of data, as well as gathering it from other external sources
like Excel, for instance.

● Keras for deep learning. It allows fast calculations and prototyping, as


it uses the GPU in addition to the CPU of the computer.

36
Technologies & Tools
Chapter 3

● TensorFlow for working with deep learning by setting up, training, and
utilizing artificial neural networks with massive datasets.

● Matplotlib for creating 2D plots, histograms, charts, and other forms of


visualization.

● NLTK for working with computational linguistics, natural language


recognition, and processing.

● Scikit-image for image processing.

● PyBrain for neural networks, unsupervised and reinforcement learning.

● Caffe for deep learning that allows switching between the CPU and the
GPU and processing 60+ mln images a day using a single NVIDIA K40
GPU.

● StatsModels for statistical algorithms and data exploration.

In the PyPI repository, you can discover and compare more Python libraries.

A low entry barrier


Working in the ML and AI industry means dealing with a bunch of data
that you need to process in the most convenient and effective way. The low
entry barrier allows more data scientists to quickly pick up Python and start
using it for AI development without wasting too much effort on learning the
language.
Python programming language resembles the everyday English language,
and that makes the process of learning easier. Its simple syntax allows you to
comfortably work with complex systems, ensuring сlear relations between
the system elements.

37
Technologies & Tools
Chapter 3

Flexibility

Python for machine learning is a great choice, as this language is very flexible:

● It offers an option to choose either to use OOPs or scripting.


● There’s also no need to recompile the source code, developers can
implement any changes and quickly see the results.
● Programmers can combine Python and other languages to reach their
goals.

Moreover, flexibility allows developers to choose the programming styles


which they are fully comfortable with or even combine these styles to solve
different types of problems in the most efficient way.

● The imperative style consists of commands that describe how a


computer should perform these commands. With this style, you
define the sequence of computations which happen like a change of
the program state.
● The functional style is also called declarative because it declares what
operations should be performed. It doesn’t consider the program
state, compared to the imperative style, it declares statements in the
form of mathematical equations.
● The object-oriented style is based on two concepts: class and object,
where similar objects form classes. This style is not fully supported by
Python, as it can’t fully perform encapsulation, but developers can still
use this style to a finite degree.
● The procedural style is the most common among beginners, as it
proceeds tasks in a step-by-step format. It’s often used for
sequencing, iteration, modularization, and selection.

The flexibility factor decreases the possibility of errors, as programmers have


a chance to take the situation under control and work in a comfortable
environment.

38
Technologies & Tools
Chapter 3

Platform independence
Python is not only comfortable to use and easy to learn but also very
versatile. What we mean is that Python for machine learning development
can run on any platform including Windows, MacOS, Linux, Unix, and
twenty-one others. To transfer the process from one platform to another,
developers need to implement several small-scale changes and modify
some lines of code to create an executable form of code for the chosen
platform. Developers can use packages like PyInstaller to prepare their code
for running on different platforms.
Again, this saves time and money for tests on various platforms and makes
the overall process more simple and convenient.

Readability
Python is very easy to read so every Python developer can understand
the code of their peers and change, copy or share it. There’s no confusion,
errors or conflicting paradigms, and this leads to more efficient exchange of
algorithms, ideas, and tools between AI and ML professionals.
There are also tools like IPython available, which is an interactive shell that
provides extra features like testing, debugging, tab-completion, and others,
and facilitates the work process.

Good visualization options


We’ve already mentioned that Python offers a variety of libraries, and
some of them are great visualization tools. However, for AI developers, it’s
important to highlight that in artificial intelligence, deep learning, and
machine learning, it’s vital to be able to represent data in a human-readable
format.
Libraries like Matplotlib allow data scientists to build charts, histograms, and
plots for better data comprehension, effective presentation, and
visualization. Different application programming interfaces also simplify the
visualization process and make it easier to create clear reports.

39
Technologies & Tools
Chapter 3

Community support
It’s always very helpful when there’s strong community support built
around the programming language. Python is an open-source language
which means that there’s a bunch of resources open for programmers starting
from beginners and ending with pros.
A lot of Python documentation is available online as well as in Python
communities and forums, where programmers and machine learning
developers discuss errors, solve problems, and help each other out.
Python programming language is absolutely free as is the variety of useful
libraries and tools.

Growing popularity
As a result of the advantages discussed above, Python is becoming
more and more popular among data scientists. According to Stack Overflow,
the popularity of Python is predicted to grow until 2020, at least.
This means it’s easier to search for developers and replace team players if
required. Also, the cost of their work may be not as high as when using a less
popular programming language.

3.2 Pytorch
PyTorch is a Python-based scientific computing package that uses the
power of graphics processing units. It is also one of the preferred deep
learning research platforms built to provide maximum flexibility and speed.
It is known for providing two of the most high-level features; namely, tensor
computations with strong GPU acceleration support and building deep neural
networks on a tape-based autograd system.
There are many existing Python libraries which have the potential to change
how deep learning and artificial intelligence are performed, and this is one
such library. One of the key reasons behind PyTorch’s success is it is
completely Pythonic and one can build neural network models effortlessly.
It is still a young player when compared to its other competitors, however, it
is gaining momentum fast.

40
Technologies & Tools
Chapter 3

3.2.1 A brief history of PyTorch

Since its release in January 2016, many researchers have continued to


increasingly adopt PyTorch. It has quickly become a go-to library because of
its ease in building extremely complex neural networks. It is giving a tough
competition to TensorFlow especially when used for research work.
However, there is still some time before it is adopted by the masses due to
its still “new” and “under construction” tags.

PyTorch creators envisioned this library to be highly imperative which can


allow them to run all the numerical computations quickly. This is an ideal
methodology which fits perfectly with the Python programming style. It has
allowed deep learning scientists, machine learning developers, and neural
network debuggers to run and test part of the code in real time. Thus they
don’t have to wait for the entire code to be executed to check whether it
works or not.

You can always use your favorite Python packages such as NumPy, SciPy, and
Cython to extend PyTorch functionalities and services when required. Now
you might ask, why PyTorch? What's so special in using it to build deep
learning models?
The answer is quite simple, PyTorch is a dynamic library (very flexible and you
can use as per your requirements and changes) which is currently adopted by
many of the researchers, students, and artificial intelligence developers. In
the recent Kaggle competition, PyTorch library was used by nearly all of the
top 10 finishers.
Some of the key highlights of PyTorch includes:

● Simple Interface: It offers easy to use API, thus it is very simple to


operate and run like Python.

41
Technologies & Tools
Chapter 3

● Pythonic in nature: This library, being Pythonic, smoothly


integrates with the Python data science stack. Thus it can leverage
all the services and functionalities offered by the Python
environment.

● Computational graphs: In addition to this, PyTorch provides an


excellent platform which offers dynamic computational graphs,
thus you can change them during runtime. This is highly useful
when you have no idea how much memory will be required for
creating a neural network model.

3.2.2 PyTorch Community


The PyTorch community is growing in numbers on a daily basis. In the
just short year and a half, it has shown some great amount of developments
that have led to its citations in many research papers and groups. More and
more people are bringing PyTorch within their artificial intelligence research
labs to provide quality driven deep learning models.

The interesting fact is, PyTorch is still in early-release beta, but the way
everyone is adopting this deep learning framework at a brisk pace shows its
real potential and power in the community. Even though it is in the beta
release, there are 741 contributors on the official GitHub repository working
on enhancing and providing improvements to the existing PyTorch
functionalities.

PyTorch doesn’t limit to specific applications because of its flexibility and


modular design. It has seen heavy use by leading tech giants such as
Facebook, Twitter, NVIDIA, Uber and more in multiple research domains such
as NLP, machine translation, image recognition, neural networks, and other
key areas.

42
Technologies & Tools
Chapter 3

3.2.3 Why use PyTorch in AI?

Anyone who is working in the field of deep learning and artificial


intelligence has likely worked with TensorFlow before, Google’s most popular
open source library. However, the latest deep learning framework – PyTorch
solves major problems in terms of research work. Arguably PyTorch is
TensorFlow’s biggest competitor to date, and it is currently a much-favored
deep learning and artificial intelligence library in the research community.

Dynamic Computational graphs

It avoids static graphs that are used in frameworks such as TensorFlow,


thus allowing the developers and researchers to change how the network
behaves on the fly. The early adopters are preferring PyTorch because it is
more intuitive to learn when compared to TensorFlow.

Different back-end support

PyTorch uses different backends for CPU, GPU and for various
functional features rather than using a single back-end. It uses tensor
backend TH for CPU and THC for GPU. While neural network backends such
as THNN and THCUNN for CPU and GPU respectively. Using separate
backends makes it very easy to deploy PyTorch on constrained systems.

Imperative style

PyTorch library is specially designed to be intuitive and easy to use.


When you execute a line of code, it gets executed thus allowing you to
perform real-time tracking of how your neural network models are built.
Because of its excellent imperative architecture and fast and lean approach it
has increased overall PyTorch adoption in the community.

43
Technologies & Tools
Chapter 3

Highly extensible
PyTorch is deeply integrated with the C++ code, and it shares some C++
backend with the deep learning framework, Torch. Thus allowing users to
program in C/C++ by using an extension API based on cFFI for Python and
compiled for CPU for GPU operation. This feature has extended the PyTorch
usage for new and experimental use cases thus making them a preferable
choice for research use.

3.2.4 PyTorch-Approach
PyTorch is a native Python package by design. Its functionalities are
built as Python classes, hence all its code can seamlessly integrate with
Python packages and modules. Similar to NumPy, this Python-based library
enables GPU-accelerated tensor computations plus provides rich options of
APIs for neural network applications. PyTorch provides a complete end-to-
end research framework which comes with the most common building
blocks for carrying out everyday deep learning research. It allows chaining of
high-level neural network modules because it supports Keras-like API in its
torch.nn package.
PyTorch 1.0: The path from research to production
We have been discussing all the strengths PyTorch offers, and how these
make it a go-to library for research work. However, one of the biggest
downsides is, it has been its poor production support. But this is expected to
change soon.

PyTorch 1.0 is expected to be a major release which will overcome the


challenges developers face in production. This new iteration of the
framework will merge Python-based PyTorch with Caffe2 allowing machine
learning developers and deep learning researchers to move from research to
production in a hassle-free way without the need to deal with any migration
challenges. The new version 1.0 will unify research and production
capabilities in one framework thus providing the required flexibility and
performance optimization for research and production.

44
Technologies & Tools
Chapter 3

This new version promises to handle tasks one has to deal with while running
the deep learning models efficiently on a massive scale. Along with the
production support, PyTorch 1.0 will have more usability and optimization
improvements. With PyTorch 1.0, your existing code will continue to work as-
is, there won’t be any changes to the existing API. If you want to stay updated
with all the progress to PyTorch library.

The beta release of this long-awaited version is expected later this year. Major
vendors like Microsoft and Amazon are expected to provide complete
support to the framework across their cloud products.

Summing up, PyTorch is a compelling player in the field of deep learning and
artificial intelligence libraries, exploiting its unique niche of being a research-
first library. It overcomes all the challenges and provides the necessary
performance to get the job done. If you’re a mathematician, researcher,
student who is inclined to learn how deep learning is performed, PyTorch is
an excellent choice as your first deep learning framework to learn.

3.3 MATLAB
MATLAB is a high-performance language for technical computing. It
integrates computation, visualization, and programming in an easy-to-use
environment where problems and solutions are expressed in familiar
mathematical notation. Typical uses include:
● Math and computation
● Algorithm development
● Modeling, simulation, and prototyping
● Data analysis, exploration, and visualization
● Scientific and engineering graphics
● Application development, including Graphical User Interface building

45
Technologies & Tools
Chapter 3

MATLAB is an interactive system whose basic data element is an array that


does not require dimensioning. This allows you to solve many technical
computing problems, especially those with matrix and vector formulations,
in a fraction of the time it would take to write a program in a scalar
noninteractive language such as C or Fortran.

The name MATLAB stands for matrix laboratory. MATLAB was originally
written to provide easy access to matrix software developed by the LINPACK
and EISPACK projects, which together represent the state-of-the-art in
software for matrix computation.

MATLAB has evolved over a period of years with input from many users. In
university environments, it is the standard instructional tool for introductory
and advanced courses in mathematics, engineering, and science. In industry,
MATLAB is the tool of choice for high-productivity research, development,
and analysis.

MATLAB features a family of application-specific solutions called toolboxes.


Very important to most users of MATLAB, toolboxes allow you to learn and
apply specialized technology.

Toolboxes are comprehensive collections of MATLAB functions (M-files) that


extend the MATLAB environment to solve particular classes of problems.
Areas in which toolboxes are available include signal processing, control
systems, neural networks, fuzzy logic, wavelets, simulation, and many others.

46
Technologies & Tools
Chapter 3

Deep learning toolbox

Deep Learning Toolbox™ provides a framework for designing and


implementing deep neural networks with algorithms, pretrained models, and
apps. You can use convolutional neural networks (ConvNets, CNNs) and long
short-term memory (LSTM) networks to perform classification and
regression on image, time-series, and text data. You can build network
architectures such as generative adversarial networks (GANs) and Siamese
networks using automatic differentiation, custom training loops, and shared
weights. With the Deep Network Designer app, you can design, analyze, and
train networks graphically. The Experiment Manager app helps you manage
multiple deep learning experiments, keep track of training parameters,
analyze results, and compare code from different experiments. You can
visualize layer activations and graphically monitor training progress.

You can exchange models with TensorFlow™ and PyTorch through the
ONNX format and import models from TensorFlow-Keras and Caffe. The
toolbox supports transfer learning with DarkNet-53, ResNet-50, NASNet,
SqueezeNet and many other pretrained models.

You can speed up training on a single- or multiple-GPU workstation (with


Parallel Computing Toolbox™), or scale up to clusters and clouds, including
NVIDIA® GPU Cloud and Amazon EC2® GPU instances (with MATLAB
Parallel Server™).

MathWorks’ MATLAB 2018b release serves up a number of new features,


including the Deep Learning Toolbox that supports development of
machine-learning applications. Other new features include the 5G Toolbox,
NVIDIA Cloud, and DGX support plus Sensor Fusion and Tracking.

47
Technologies & Tools
Chapter 3

The Deep Learning Toolbox supports convolutional neural networks (CNNs)


and long short-term memory (LSTM) networks for classification and
regression on image, time-series, and text data. It alleviates MATLAB users
from having to work with other machine-learning frameworks, although it can
also import and export models to frameworks like PyTorch, MXNet, Caffe,
and TensorFlow-Keras using the ONNIX, the open neural network exchange
format.

Developers can take advantage of MATLAB’s deep-neural-network (DNN)


support from MATLAB code (see codelist below). MATLAB Coder or GPU
Coder can be used to generate C++ and CUDA code for deployment on Intel
using MMKL-DNN, ARM using the ARM Compute Library, and NVIDIA Tegra
platforms using NVIDIA’s numerous libraries.

while true
im = snapshot(camera); % Take a picture
image(im); % Show the picture
im = imresize(im,[227 227]); % Resize the picture for
alexnet
label = classify(net,im); % Classify the picture
title(char(label)); % Show the class label
drawnow
end

Eight lines of MATLAB code are all that’s needed to take advantage of a
classification network to identify images using an AlexNet model.

48
Technologies & Tools
Chapter 3

Coding is just part of the story, though. The Deep Network Designer (see
figure) provides a way to use pretrained models including SqueezeNet,
Inception-v3, ResNet-101, GoogLeNet, and VGG-19, as well as developing
new models. It can be used in conjunction with the MATLAB Image Labeler
application, where users can view and label images for semantic
segmentation. Developers are able to create domain-specific workflows for
ground-truth labeling information for images, videos, and audio clips.

The Deep Network Designer can be used to fine-tune pre trained deep-learning
networks.

The toolbox can also be applied to train large datasets by taking advantage
of distributed computation via multicore processors and GPUs on the
desktop using the Parallel Computing Toolbox. Developers could take
advantage of the cloud, too. The Deep Learning Toolbox supports Amazon
EC2 P2, P3, and G3 GPU instances using the MATLAB Distributed Computing
Server. The toolbox also supports Amazon AWS and Microsoft Azure.

49
Technologies & Tools
Chapter 3

3.4 What is TensorFlow?


TensorFlow is an open-source library developed by Google primarily
for deep learning applications. It also supports traditional machine learning.
TensorFlow was originally developed for large numerical computations
without keeping deep learning in mind. However, it proved to be very useful
for deep learning development as well, and therefore Google open-sourced
it.
Currently, the most famous deep learning library in the world is Google's
TensorFlow. Google product uses machine learning in all of its products to
improve the search engine, translation, image captioning or
recommendations.

3.4.1 TensorFlow Architecture


Tensorflow architecture works in three parts:
● Preprocessing the data
● Build the model
● Train and estimate the model

It is called Tensorflow because it takes input as a multidimensional array, also


known as tensors. You can construct a sort of flowchart of operations (called
a Graph) that you want to perform on that input. The input goes in at one
end, and then it flows through this system of multiple operations and comes
out the other end as output.
This is why it is called TensorFlow because the tensor goes in it flows through
a list of operations, and then it comes out the other side.

50
Technologies & Tools
Chapter 3

3.4.2 Where can TensorFlow run?


TensorFlow hardware, and software requirements can be classified
into the Development Phase: This is when you train the mode. Training is
usually done on your Desktop or laptop.

Run Phase or Inference Phase: Once training is done Tensorflow can be run
on many different platforms. You can run it on
● Desktop running Windows, macOS or Linux
● Cloud as a web service
● Mobile devices like iOS and Android

You can train it on multiple machines then you can run it on a different
machine, once you have the trained model.

The model can be trained and used on GPUs as well as CPUs. GPUs were
initially designed for video games. In late 2010, Stanford researchers found
that GPU was also very good at matrix operations and algebra so that it makes
them very fast for doing these kinds of calculations. Deep learning relies on a
lot of matrix multiplication. TensorFlow is very fast at computing matrix
multiplication because it is written in C++. Although it is implemented in C++,
TensorFlow can be accessed and controlled by other languages mainly,
Python.
Finally, a significant feature of TensorFlow is the TensorBoard. The
TensorBoard enables to monitor graphically and visually what TensorFlow is
doing.

51
Technologies & Tools
Chapter 3

3.4.3 Components of TensorFlow


Tensor
TensorFlow’s name is directly derived from its core framework: Tensor.
In Tensorflow, all the computations involve tensors. A tensor is a vector or
matrix of n-dimensions that represents all types of data. All values in a tensor
hold identical data types with a known (or partially known) shape. The shape
of the data is the dimensionality of the matrix or array.
A tensor can be originated from the input data or the result of a computation.
In TensorFlow, all the operations are conducted inside a graph. The graph is a
set of computation that takes place successively. Each operation is called an
op node and are connected to each other.
The graph outlines the ops and connections between the nodes. However, it
does not display the values. The edge of the nodes is the tensor, i.e., a way to
populate the operation with data.

Graphs
TensorFlow makes use of a graph framework. The graph gathers and
describes all the series computations done during the training. The graph has
lots of advantages:
● It was done to run on multiple CPUs or GPUs and even mobile
operating system
● The portability of the graph allows to preserve the computations for
immediate or later use. The graph can be saved to be executed in the
future.
● All the computations in the graph are done by connecting tensors
together
○ A tensor has a node and an edge. The node carries the
mathematical operation and produces endpoints outputs. The
edges explain the input/output relationships between nodes.

52
Technologies & Tools
Chapter 3

3.4.4 Why is TensorFlow popular?


TensorFlow is the best library of all because it is built to be accessible
for everyone. TensorFlow library incorporates different API to build at scale
deep learning architecture like CNN or RNN. TensorFlow is based on graph
computation; it allows the developer to visualize the construction of the
neural network with Tensor board. This tool is helpful to debug the program.
Finally, TensorFlow is built to be deployed at scale. It runs on CPU and GPU.
TensorFlow attracts the largest popularity on GitHub compared to the other
deep learning frameworks.

List of Prominent Algorithms supported by TensorFlow


Currently, TensorFlow 1.10 has a built-in API for:

● Linear regression: tf.estimator.LinearRegressor


● Classification:tf.estimator.LinearClassifier
● Deep learning classification: tf.estimator.DNNClassifier
● Deep learning wipe and deep:
tf.estimator.DNNLinearCombinedClassifier
● Boosted tree regression: tf.estimator.BoostedTreesRegressor
● Boosted tree classification:
tf.estimator.BoostedTreesClassifier

3.4.5 Program Elements in TensorFlow


TensorFlow programs work on two basic concepts:
1. Building a computational graph

2. Executing a computational graph

53
Technologies & Tools
Chapter 3

In TensorFlow, however, data can be stored and manipulated using three


different programming elements:
1. Constants

2. Variables

3. Placeholders

Constants
Constants are parameters with values that do not change. To define a
constant, we use tf.constant() command

Variables
Variables allow us to add new trainable parameters to the graph. To
define a variable, we use tf.Variable() command and initialize it before
running the graph in a session.

Placeholders
Placeholders allow us to feed data to a TensorFlow model from
outside a model. It permits value to be assigned later. To define a placeholder,
we use the tf.placeholder() command.

3.5 What is Keras?

KERAS is an Open Source Neural Network library written in Python


that runs on top of Theano or Tensorflow. It is designed to be modular, fast
and easy to use. It was developed by François Chollet, a Google engineer.

Keras doesn't handle low-level computation. Instead, it uses another library


to do it, called the "Backend. So Keras is a high-level API wrapper for the low-
level API, capable of running on top of TensorFlow, CNTK, or Theano.

54
Technologies & Tools
Chapter 3

Keras High-Level API handles the way we make models, defining layers, or
set up multiple input-output models. In this level, Keras also compiles our
model with loss and optimizer functions, training process with fit function.
Keras doesn't handle Low-Level API such as making the computational graph,
making tensors or other variables because it has been handled by the
"backend" engine.

Keras Backend

In Keras, with the help of TensorFlow Libraries, the backend carries out all the
bottom level calculations. The backend engine carries out the development
of the models. In Keras, we will use TensorFlow as the default backend
engine.

3.5.1 Keras VS TensorFlow

Criteria Keras TensorFlow

Objective It is used for developing It is used for developing model


conventional Layers layers or calculation tasks.

Tools It will use API tools like It will use Tensorboard


TFDBG Visualization tools

Difficulty If you have knowledge of For using TensorFlow, we need


Python, we can use Python to learn the syntax of some
easily TensorFlow Functions.

Type It is High-Level Wrapper It is Low-level API

Community It has many active It has many active


communities communities

55
Technologies & Tools
Chapter 3

3.5.2 Advantages of Keras


Fast Deployment and Easy to understand
Keras is very quick to make a network model. If you want to make a
simple network model with a few lines, Keras can help you with that.
Because of the friendly API, we can easily understand the process. Writing
the code with a simple function and no need to set multiple parameters.

Large Community Support


There are lots of AI communities that use Keras for their Deep Learning
framework. Many of them publish their codes as well tutorial to the general
public.

Have multiple Backends


You can choose Tensorflow, CNTK, and Theano as your backend with
Keras. You can choose a different backend for different projects depending
on your needs. Each backend has its own unique advantage.

Cross-Platform and Easy Model Deployment


With a variety of supported devices and platforms, you can deploy
Keras on any device like
● iOS with CoreML
● Android with Tensorflow Android,
● Web browser with .js support
● Cloud engine
● Raspberry Pi

Multi GPUs Support


You can train Keras on a single GPU or use multiple GPUs at once.
Because Keras has a built-in support for data parallelism so it can process
large volumes of data and speed up the time needed to train it.

56
Technologies & Tools
Chapter 3

3.5.3 Keras Fundamentals for Deep Learning


The main structure in Keras is the Model which defines the complete
graph of a network. You can add more layers to an existing model to build a
custom model that you need for your project.
Here's how to make a Sequential Model and a few commonly used layers in
deep learning.

Sequential Model
from keras.models import Sequential
from keras.layers import Dense,
Activation,Conv2D,MaxPooling2D,Flatten,Dropout
model = Sequential()

Convolutional Layer

This is an example of a convolutional layer as the input layer with the


input shape of 320x320x3, with 48 filters of size 3x3 and uses ReLU as an
activation function.
input_shape=(320,320,3) #this is the input shape of an image
320x320x3
model.add(Conv2D(48, (3, 3), activation='relu',
input_shape= input_shape))
another type is
model.add(Conv2D(48, (3, 3), activation='relu'))

MaxPooling Layer

To downsample the input representation, use MaxPool2d and specify


the kernel size.
model.add(MaxPooling2D(pool_size=(2, 2)))

57
Technologies & Tools
Chapter 3

Dense Layer

Adding a Fully Connected Layer with just specifying the output Size

model.add(Dense(256, activation='relu'))

Dropout Layer

Adding dropout layer with 50% probability


model.add(Dropout(0.5))

3.5.4 Compiling, Training, and Evaluate


After we define our model, let's start to train them. It is required to
compile the network first with the loss function and optimizer function. This
will allow the network to change weights and minimized the loss.
model.compile(loss='mean_squared_error', optimizer='adam')
Now to start training, use fit to fed the training and validation data to the
model. This will allow you to train the network in batches and set the epochs.

model.fit(X_train, X_train, batch_size=32, epochs=10,


validation_data=(x_val, y_val))

Our final step is to evaluate the model with the test data.

score = model.evaluate(x_test, y_test, batch_size=32)

58
Technologies & Tools
Chapter 3

3.5.5 Keras Applications

The Keras Applications is used for developing pre-trained models for


the purpose of deep neural networks. Keras Models are used for fine-tuning,
prediction, and feature extraction.
Pre-trained Models
Trained Models contains two modules: Model weights and Model
architecture. Model weights are big files. Therefore you have to download it,
and the feature should be
Extracted from the ImageNet database. The famous Pre-trained Models are
as follows:
● InceptionV3
● VGG16
● ResNet
● MobileNet

59
Preprocessing
Chapter 4

PREPROCESSING

4.1 Dataset
DDSM is a well-known dataset of normal and abnormal scans, and
one of the few publicly available datasets of mammography imaging.
Unfortunately, the size of the dataset is relatively small. To increase the
amount of training data we extract the Regions of Interest (ROI) from each
image, perform data augmentation and then train ConvNets on the
augmented data. The ConvNets were trained to predict both whether a scan
was normal or abnormal, and to predict whether abnormalities were
calcifications or masses and benign or malignant.

The MIAS dataset is a very small set of mammography images, consisting of


330 scans of all classes. The scans are standardized to a size of 1024x1024
pixels. The size of the dataset made this unusable for training, but it was used
for exploratory data analysis and as a supplementary test data set.

The University of California Irvine Machine Learning Repository contains


several datasets related to breast cancer. These consist of one dataset which
describes the characteristics of abnormalities and two which describe the
characteristics of cell nuclei taken from fine needle biopsies. These were used
for exploratory data analysis to gain insight into the characteristics of
abnormalities.

The DDSM is a database of 2,620 scanned film mammography studies. It


contains normal, benign, and malignant cases with verified pathology
information. The DDSM is saved as Lossless JPEGs, an archaic format which
has not been maintained for several decades.

60
Preprocessing
Chapter 4

The CBIS-DDSM collection includes a subset of the DDSM data selected and
curated by a trained mammographer. The CBIS-DDSM images have been
pre-processed and saved as DiCom images, and thus are better quality than
the DDSM images, but this dataset only contains scans with abnormalities. In
order to create a dataset which can be used to predict the presence of
abnormalities, the ROIs were extracted from the CBIS-DDSM dataset and
combined with normal images taken from the DDSM dataset.

In order to create a training dataset of adequate size which included both


normal and abnormal scans, images from the CBIS-DDSM dataset were
combined with images from the DDSM dataset. While the CBIS-DDSM
dataset included cropped and zoomed images of the Regions of Interest
(ROIs), in order to have greater control over the data, we extracted the ROIs
ourselves using the masks provided with the dataset.

For the CBIS-DDSM images the masks were used to isolate and extract the
ROI from each image. For the DDSM images we simply created tiles of each
scan and included them as long as they met certain criteria.
Both offline and online data augmentation was used to increase the size of
the datasets.

4.1.1 Training Datasets

Multiple datasets were created using different ROI extraction


techniques and amounts of data augmentation. The datasets ranged in size
from 27,000 training images to 62,000 training images.

Datasets 1 through 5 did not properly separate the training and test data and
thus are not referenced in this work.

61
Preprocessing
Chapter 4

1. Dataset 6 consisted of 62,764 images. This dataset was created to be


as large as possible, and each ROI is extracted multiple times in
multiple ways using both ROI extraction methods described below.
Each ROI was extracted with fixed context, with padding, at its original
size, and if the ROI was larger than our target image it was also
extracted as overlapping tiles.

2. Dataset 8 consisted of 40,559 images. This dataset used the


extraction method 1 described below to provide greater context for
each ROI. This dataset was created for the purpose of classifying the
ROIs by their type and pathology.

3. Dataset 9 consisted of 43,739 images. The previous datasets had used


zoomed images of the ROIs, which was problematic as it required the
ROI to be pre-identified and isolated. This dataset was created using
extraction method 2 described below.

As Dataset 9 was the only dataset that did not resize the images based on
the size of the ROI we felt that it introduced the least amount of artificial
manipulation into the data and after it was created we focused on training
with this dataset.

The CBIS-DDSM scans were of relatively large size, with a mean height of
5295 pixels and a mean width of 3131 pixels. Masks highlighting the ROIs were
provided. The masks were used to define a square which completely
enclosed the ROI. Some padding was added to the bounding box to provide
context and then the ROIs were extracted at 598x598 and then resized down
to 299x299 so they could be input into the ConvNet.

62
Preprocessing
Chapter 4

The ROIs had a mean size of 450 pixels and a standard deviation of 396. We
designed our ConvNets to accept 299x299 images as input. To simplify the
creation of the images, we extracted each ROI to a 598x598 tile, which was
then sized down by half on each dimension to 299x299. 598x598 was just
large enough that the majority of the ROIs could fit into it.

To increase the size of the training data, each ROI was extracted multiple
times using the methodologies described below. The size and variety of the
data was also increased by randomly horizontally flipping each tile, randomly
vertically flipping each tile, randomly rotating each tile, and by randomly
positioning each ROI within the tile.

4.2 Pre-processing
Preprocessing refers to all the transformations on the raw data before
it is fed to the machine learning or deep learning algorithm. For instance,
training a convolutional neural network on raw images will probably lead to
bad classification performances. The preprocessing is also important to
speed up training (for instance, centering and scaling techniques).

The CBIS-DDSM scans were of relatively large size, with a mean height of
5295 pixels and a mean width of 3131 pixels. Masks highlighting the ROIs were
provided. The masks were used to define a square which completely
enclosed the ROI. Some padding was added to the bounding box to provide
context and then the ROIs were extracted at 598x598 and then resized down
to 299x299 so they could be input into the ConvNet.

The ROIs had a mean size of 450 pixels and a standard deviation of 396. We
designed our ConvNets to accept 299x299 images as input. To simplify the
creation of the images, we extracted each ROI to a 598x598 tile, which was
then sized down by half on each dimension to 299x299. 598x598 was just
large enough that the majority of the ROIs could fit into it.

63
Preprocessing
Chapter 4

To increase the size of the training data, each ROI was extracted multiple
times using the methodologies described below. The size and variety of the
data was also increased by randomly horizontally flipping each tile, randomly
vertically flipping each tile, randomly rotating each tile, and by randomly
positioning each ROI within the tile.

4.2.1 ROI Extraction Method 1


The analysis of the UCI data indicated that the edges of an abnormality
were important as to determining its pathology and type, and this was
confirmed by a radiologist. Levy et al also report that the inclusion of context
was an important factor for multi-class accuracy.

To provide maximum context, each ROI was extracted in multiple ways:


1. The ROI was extracted at 598x598 at its original size. The entire ROI
was resized to 598x598, with padding to provide context. If the ROI
had the size of one dimension more than 1.5 times the other
dimension it was extracted as two tiles centered in the center of each
half of the ROI along it's largest dimension.

4.2.2 ROI Extraction Method 2


Method 1 relied on the size of the ROI to determine how to extract it,
which requires having the ROI pre-identified. While this provided very clear
images of each abnormality, the use of the size of the ROI to extract it
introduced an element of artificiality into the data which made it not
generalize well to classifying raw scans. This method was designed to
eliminate that artificiality by never resizing the images, and just extracting the
ROI using its center.

64
Preprocessing
Chapter 4

The size of the ROI was only used to determine how much padding to add to
the bounding box before extraction. If the ROI was smaller than the 598x598
target we added more padding to provide greater variety when taking the
random crops. If the ROI was larger than 598x598 this was not necessary.
1. If the ROI was smaller than a 598x598 tile it was extracted with 20%
padding on either side.
2. If the ROI was larger than a 598x598 tile it was extracted with 5%
padding.
3. Each ROI was then randomly cropped three times using random
flipping and rotation.

4.2.3 Normal Images


The normal scans from the DDSM dataset did not have ROIs so were
processed differently. As these images had not been pre-processed as had
the CBIS-DDSM images they contained artifacts such as white borders,
overlay text, and white patches of pixels used to cover up identifying personal
information. Each image was trimmed by 7% on each side to remove the
white borders.
To keep the normal images as similar to the CBIS-DDSM images, different
pre-processing was done for each dataset created. As datasets 6 and 8
resized the images based on the ROI size, to create the DDSM images for
these datasets, each image was randomly sized down by a random factor
between 1.8 and 3.2, then segmented into 299x299 tiles with a variable stride
between 150 and 200. Each tile was then randomly rotated and flipped.

For dataset 9, each DDSM image was cut into 598x598 tiles without being
resized. The tiles were then each resized down to 299x299.

65
Preprocessing
Chapter 4

To avoid the inclusion of images which contained the aforementioned


artifacts or which consisted largely of black background, each tile was then
added to the dataset only if it met upper and lower thresholds on mean and
variance. The thresholds were selected by randomly sampling tiles and
adjusted until most of the useless tiles were not included.

4.2.4 MIAS Images

As the MIAS images come from a completely different distribution


than the DDSM images, we felt they could provide a good assessment of how
well the models would generalize. For this reason we created a
supplementary test dataset consisting of the 330 MIAS images.
The MIAS images were a uniform size of 1024x1024, with each scan sized to
a height of 1024 and then horizontally padded with black on both sides. To
get these images to the same scale as the DDSM images we increased their
size by 2.58, which brought them to half the mean height of the DDSM
images. The ROIs were then extract using the same methods used for the
CBIS-DDSM images except the ROIs were extracted directly at 299x299
rather than being extracted at 598x598 and then sized down by half.

4.2.5 Image Processing techniques


Image processing is a method to perform some operations on an
image, in order to get an enhanced image or to extract some useful
information from it. It is a type of signal processing in which input is an image
and output may be image or characteristics/features associated with that
image.
Image processing applied to medical research has made many clinical
diagnosis protocols and treatment plans more efficient and accurate. For
example, a sophisticated nodule detection algorithm applied to digital
mammogram images can aid in the early detection of breast cancer.

66
Preprocessing
Chapter 4

Data preprocessing techniques might include:


• Convert color images to grayscale to reduce computation complexity:
it useful to lose unnecessary information from your images to reduce
space or computational complexity.

67
Preprocessing
Chapter 4

4.2.5.1 Standardize images

One important constraint that exists in some machine learning


algorithms, such as CNN, is the need to resize the images in your dataset to a
unified dimension. This implies that our images must be preprocessed and
scaled to have identical widths and heights before fed to the learning
algorithm.

4.2.5.2 Data augmentation


Another common pre-processing technique involves augmenting the
existing dataset with perturbed versions of the existing images. Scaling,
rotations and other affine transformations are typical. This is done to enlarge
your dataset and expose the neural network to a wide variety of variations of
your images. This makes it more likely that your model recognizes objects
when they appear in any form and shape. Here’s an example of image
augmentation applied to a butterfly image:

68
Preprocessing
Chapter 4

4.2.5.3 Scaling images

Scaling is used to change the size of the image. It can be a scale down
or up. There are several methods available to interpolate the pixels.

4.2.5.4 Flipping Images


Use the horzflip and vertflip functions to flip an image around a
horizontal or a vertical center line.

69
Preprocessing
Chapter 4

Horzflip: This function returns the original image, or matrix, flipped from left to right.

Vertflip: This function returns the original image, or matrix, flipped upside
down.

4.2.5.5 Image Rotation


Image rotation is a common image processing routine with
applications in matching, alignment, and other image-based algorithms. The
input to an image rotation routine is an image, the rotation angle θ, and a
point about which rotation is done.

70
Preprocessing
Chapter 4

4.2.5.6 Resize images


Resizes the current image to the given dimensions.
Six different types of resizing methods are available:
o Pad (default)
o BoxPad (Added v4.4.0)
o Crop
o Min (Added v4.4.0)
o Max
o Stretch

Pad (default)

Passing a single dimension will automatically preserve the aspect ratio


of the original image.

Pad mode also offers optional anchor positions:


• center (default)
• top
• right
• bottom
• left
• topleft
• topright
• bottomleft
• bottomright

71
Preprocessing
Chapter 4

Resize BoxPad

When upscaling an image the image pixels themselves are not resized,
rather the image is padded to fit the given dimensions.

BoxPad mode also offers optional anchor positions:


• center (default)
• top
• right
• bottom
• left
• topleft
• topright
• bottomleft
• bottomright

Resize Crop
Resizes the image to the given dimensions.

Crop mode also offers optional anchor positions:


• center (default)
• top
• right
• bottom
• left
• topleft
• topright
• bottomleft
• bottomright

72
Preprocessing
Chapter 4

Resize Min

Resizes the image until the shortest side reaches the set given
dimension.

Resize Max

Resizes the image to the given dimensions. If the set dimensions do


not match the aspect ratio of the original image, then the output is resized to
the maximum possible value in each direction while maintaining the original
aspect ratio.

Resize Stretch

Resizes the image to the given dimensions. If the set dimensions do


not match the aspect ratio of the original image, then the output is stretched
to match the new aspect ratio.

4.2.5.7 Thresholding
Thresholding is a type of image segmentation, where we change the
pixels of an image to make the image easier to analyze. we use thresholding
as a way to select areas of interest of an image, while ignoring the parts we
are not concerned with.

73
Preprocessing
Chapter 4

4.3 Image Segmentation


is defined as the process of isolating objects of interest from the rest
of the scene or it can be defined as the process of partitioning an image into
non-intersecting regions such that each region is homogeneous and the
union of no two adjacent regions is homogeneous.

segmentation process is one of the most difficult task in image processing.


Some of the common obstacles for this difficulty include:

o Non-uniform illumination
o No control of the environment
o Inadequate model of the object of interest
o Noise

Masses are the thickening of breast tissue which appears as lesions in


mammogram.

Mass segmentation is defined as the process in which masses are separated


from it’s back-ground and also captures the shape and boundary of masses.

74
Preprocessing
Chapter 4

The mass boundaries for the region of interest (ROI) can be identified using
edge-based mass segmentation, which can detect and link the edge pixel to
form contour. Basically, this method involves two steps they are Edge
detection and Edge linking.

4.3.1 Mask

input: path to mask image PNG , opens the mask, reduces its size by
half, finds the borders of the mask and returns the center of the mass. if the
mass is bigger than the slice it returns the upper left and lower right corners
of the mask as tuples. which will be used to create multiple slices
returns: center_row - int with center row of mask, or tuple with edges of the
mask if the mask is bigger than the slice center_col – idem too_big - boolean
indicating if the mask is bigger than the slice.

75
Preprocessing
Chapter 4

4.3.2 Scan

The scans from one particular scanner (DBA) have white sections cut
out of them, possibly to hide personal information this is only on the normal
scans, so a convnet could use this information to identify the normal scans
to prevent this I will replace all white pixels with black, as there are no pure
white pixels in a normal scan.

76
Training ConvNets
Chapter 5

Training ConvNets
5.1 Model_4_a
Model description
We design the model based on known models the VGG model for
example. The model is really deep With 9 convolution layers all with the same
3X3 filter with different numbers and different sequence Also has 5 max
pooling layers and 3 dense layers. The model has 43,720,193 Trainable
parameters.
Model training time is 2h:55m using google colab fast GPUs .

Model architecture

conv2d (3X3)X32
conv2d (3X3)X256
conv2d (3X3)X32
Max Pooling (2X2)
conv2d (3X3)X3

conv2d (3X3)X512
Max Pooling (3X3)

Max Pooling (2X2) conv2d (3X3)X64

flaten conv2d (3X3)X64

Max Pooling (2X2)


Fully connected 2048

conv2d (3X3)X128
Fully connected 2048

conv2d (3X3)X128
Fully connected 1
Max Pooling (2X2)

77
Training ConvNets
Chapter 5

Model logs and output

epoch Training acc Training loss Val_acc Val_loss


1 0.87 8.08 .88 0.27
2 0.87 2.48 0.92 0.22
3 0.88 10.11 0.88 0.25
4 0.88 7.9 0.86 0.38
5 0.88 7.99 0.89 0.26
6 0.88 22.14 0.87 0.32
7 0.87 2.42 0.86 1.35
8 0.86 1.31 0.86 0.38
9 0.86 0.38 0.86 0.38
10 0.86 0.38 0.86 0.38
11 0.86 0.38 0.86 0.38
12 0.86 0.38 0.86 0.38
13 0.87 0.39 0.86 0.38
14 0.87 0.38 0.86 0.38
15 0.86 0.38 0.86 0.38
16 0.86 0.38 0.86 0.38
17 0.86 0.38 0.86 0.38
18 0.87 0.39 0.87 0.39
19 0.87 0.39 0.87 0.39
20 0.87 0.39 0.87 0.39
22 0.87 0.39 0.87 0.39
24 0.87 0.39 0.87 0.39
26 0.87 0.39 0.87 0.39
28 0.87 0.39 0.87 0.39
30 0.87 0.39 0.87 0.39

78
Training ConvNets
Chapter 5

This type of results may look like an over fitting case but what
make this result is the lack of images and because of number of
training images is not equally divided between normal and
Malignant cases. the model is tending to classify all images as
normal cases.

5.2 Model_4_c
Model description

Updated version of model _4 with several changes the important two is:
1) Changing the optimizer to ADAM which is an adaptive learning rate
optimization algorithm that’s been designed specifically for training
deep neural networks. Adam can be looked at as a combination of
RMSprop and Stochastic Gradient Descent with momentum. It uses
the squared gradients to scale the learning rate like RMSprop and it
takes advantage of momentum by using moving average of the
gradient instead of gradient itself like SGD with momentum.

79
Training ConvNets
Chapter 5

2) Weight balancing:
Weight balancing balances our data by altering the weight that each
training example carries when computing the loss.
Because of the unbalanced nature of our dataset we use weight
balancing with a cross entropy weight of 1 to 3, for classes (0 and 1).

Model architecture
The same as model_4_a

Model logs and output

epoch Training acc Training loss Val_acc Val_loss


17 0.87 0.12 0.80 0.40
18 0.87 0.11 0.89 0.33
20 0.87 0.11 0.93 0.24
22 0.88 0.11 0.75 0.47
24 0.88 0.11 0.92 0.34
26 0.87 0.11 0.92 0.24
28 0.87 0.11 0.90 0.24
30 0.87 0.16 0.93 0.26

80
Training ConvNets
Chapter 5

The weight balancing made a big change in the model out but the most
important one is the model starts to truly classify positive cases (model_4_a
classify all cases as negative).

The training accuracy does not change but the training loss decreased the
validation accuracy increased from 87 to 93 that’s a big move for the model
validation accuracy.

The validation recall is 0.75 which mean that the model starts to see positive
cases but unfortunately classify negative as positive cases.

5.3 Model_4_d
Model description

Updated version of model_4_c.

After adding weight balancing and change the optimizer in model_4_c . the
pixel values in images are normalized between 0 and 1 in model_4_d. image
normalization has a big impact when working with machine learning and
deep learning algorithms.

Model architecture
The same structure of model_4_a

81
Training ConvNets
Chapter 5

conv2d (3X3)X32
conv2d (3X3)X256
conv2d (3X3)X32
Max Pooling (2X2)
conv2d (3X3)X3

conv2d (3X3)X512
Max Pooling (3X3)

Max Pooling (2X2) conv2d (3X3)X64

flaten conv2d (3X3)X64

Max Pooling (2X2)


Fully connected 2048

conv2d (3X3)X128
Fully connected 2048

conv2d (3X3)X128
Fully connected 1
Max Pooling (2X2)

Model logs and output

epoch Training acc Training loss Val_acc Val_loss


1 0.84 0.14 0.79 0.47
3 0.87 0.11 0.80 0.42
7 0.88 0.1 0.88 0.42
16 0.91 0.07 0.92 0.18
29 0.95 0.05 0.94 0.19
36 0.95 0.04 0.94 0.22
40 0.95 0.04 0.93 0.23
45 0.96 0.03 0.91 0.30

82
Training ConvNets
Chapter 5

The prediction of the testing recall is 0.72 for the model after 30 epochs
While the validation accuracy is 0.94 this means the model starts to classify
most of cases as positive after more five epochs the recall become 0.77 with
the same validation accuracy but the validation loss increased Of course this
result make sense because of the weight balancing technique we used in this
model. after five more epochs (40 epochs) the model accuracy and recall
decreased to 0.92 and 0.68 where the model gets worse. We mentioned
before that the testing data which from MIAS images dataset come from a
completely different distribution than the training data which come from
DDSM images, And after five (45 epochs) more the model is over fitting the
training data(training accuracy =0.96 loss=.03) and the validation accuracy
decreased to 0.91 as shown below in epochs 40 to 45:

83
Training ConvNets
Chapter 5

5.4 Model_4_e
Model description

Updated version of model_4_e.


After adding weight balancing and change the optimizer in
model_4_c. the pixel values in images are normalized between 0
and 1 in model_4_d. in model_4_e we change the classes weights
to be 4:7 for the classes 0 for negative cases and 1 for positive cases
respectively.

Model architecture
The same structure of model_4_a

Model logs and output

We notice that after changing the classes weights the model get more
validation loss and less accuracy and recall.

84
Training ConvNets
Chapter 5

5.5 Model_5_a
Model description
In this model we use transfer leaning from a pre-trained network. A
pre-trained model is a saved network that was previously trained on a large
dataset, typically on a large-scale image-classification task. You either use the
pre-trained model as is or use transfer learning to customize this model to a
given task. The intuition behind transfer learning for image classification is
that if a model is trained on a large and general enough dataset, this model
will effectively serve as a generic model of the visual world. You can then take
advantage of these learned feature maps without having to start from scratch
by training a large model on a large dataset.
The network used is the MobileNet V2 model developed at Google. This is
pre-trained on the ImageNet dataset, a large dataset consisting of 1.4M
images and 1000 classes. ImageNet is a research training dataset with a wide
variety of categories.

Model structure
In model_5_a we only load a network that doesn't include the classification
layers at the top, and add one classification layer which only one neuron.

Model logs and output

85
Training ConvNets
Chapter 5

Model_5_a recognize the training images well with accuracy of 90 after 10


epochs of training. But poorly classify the validation data.

So we update the model to model_5_b.

5.6 Model_5_b
Model description
Here we just add class weights to the model.

Training logs and output

Adding weight balancing make different but not really big so we decide to
add more fully connected layers. In model_5_c.

86
Training ConvNets
Chapter 5

5.7 Model_5_c
Model description
Here we add three fully connected layers to the model on have with
1024,2048,1 neurons respectively. Also we keep the class weights with a ratio
of 1 to 3 for class 0 and 1 respectively but after 15 epochs we change the ratio
to be 4 to 7.

Training logs and output


The first 10 epochs training logs

The logs of epochs 10 to 45

87
Training ConvNets
Chapter 5

In the above two graphs explain the epochs 10 to 45 after epoch 5 here the
weights changed to 4:7 for the classes 0:1 respectively the model training
accuracy increases and loss decreases with epochs numbers but validation
metrics go up and down so the model may get to over fit the data also when
validation recall calculated the model is missing classifying positive cases.

5.8 Model_5_d
Model description
In this model we use transfer leaning from a pre-trained network. A
pre-trained model is a saved network that was previously trained on a large
dataset.
The network used is the MobileNet V2 model developed at Google. This is
pre-trained dataset, on the ImageNet a large dataset consisting of 1.4M
images and 1000 classes. ImageNet is a research training dataset with a wide
variety of categories.
In this model we train the up 55 layers of MobileNet model.

Model structure
In this model we train (or "fine-tune") the weights of the 55 top layers
of the MobileNet model alongside the training of the classifier we added. The
training process will force the weights to be tuned from generic feature maps
to features associated specifically with the breast cancer dataset.
The classifier layers are three fully connected layers to the model on have
with 1024,2048,1 neurons respectively. Also we keep the class weights with
a ratio of 3 to 7 for class 0 and 1 respectively but after 35 epochs we change
the ratio to be 2 to 8.and after more epochs the ratio changed.

88
Training ConvNets
Chapter 5

Training logs and output

epoch acc loss Val_acc Val_loss


1 0.929 0.091 0.87 1.605
3 0.965 0.046 0.955 0.22
5 0.974 0.034 0.936 0.531
9 0.986 0.019 0.968 0.178
16 0.995 0.009 0.962 0.512
20 0.996 0.008 0.973 0.279
26 0.997 0.007 0.968 0.509
30 0.997 0.007 0.967 0.4
35 0.997 0.006 0.967 0.65
40 0.997 0.006 0.962 0.653
45 0.997 0.005 0.969 0.412
50 0.998 0.004 0.964 0.836
55 0.996 0.005 0.968 0.476
60 0.997 0.006 0.966 1.509
65 0.998 0.004 0.972 0.509
67 0.996 0.005 0.97 0.497

89
Training ConvNets
Chapter 5

We start with class weight ratio 3:7 for 0:1 after. And the model easily reach
training acuracy 0.996 with loss 0.008 and validation accuarcy=0.973 with
loss=0.279 at epoch 20,which we can say it’s the better metrics of this mode
where the recall(0)=1 and recall(1)=0.9
(Recall=0.9 accuracy = 0.97 )

we countinue training this model with the same ratio to epoch 35 where here
the training loss decreased to 0.006 but the more training desonet made a
big change in validation its only decrease the recall of(1).

After that we change the ratio to be 2:8 for 5 more epochs finally we increase
class 1 value more but the model gets worse with more training.

so we prefer to use the model weights at epoch 20.


We try to change the ration from this point but the accuracy decreaces.

90
Training ConvNets
Chapter 5

5.9 Model_5_e

Model description

Updated version of model_5_d

In this model we train the up 100 layers of MobileNet model.

Model structure

In this model we train (or "fine-tune") the weights of the 100 top layers
of the MobileNet model alongside the training of the classifier we added. The
training process will force the weights to be tuned from generic feature maps
to features associated specifically with the breast cancer dataset.

The classifier layers are three fully connected layers to the model on have
with 1024,2048,1 neurons respectively. Also we keep the class weights with
a ratio of 3 to 7 for class 0 and 1 respectively but after 35 epochs we change
the ratio to be 2 to 8.and after more epochs the ratio changed.

Training logs and output

91
Training ConvNets
Chapter 5

epoch acc loss Val_acc Val_loss


0 0.93 0.096 0.87 8.394
2 0.969 0.045 0.896 6.807
4 0.978 0.035 0.942 2.114
7 0.984 0.027 0.972 0.263
9 0.987 0.022 0.976 1.176
10 0.987 0.024 0.934 0.538
11 0.988 0.028 0.951 1.568
14 0.989 0.031 0.983 0.272
19 0.993 0.016 0.956 2.718
24 0.994 0.015 0.98 0.632
25 0.994 0.012 0.95 2.712
28 0.995 0.012 0.97 1.032
29 0.995 0.011 0.969 0.897
34 0.996 0.009 0.956 1.676

Use the same class weights with ratio 3:7. after 15 epochs the training
accuracy =99% , validation accuracy=98% and recall =95% which the best
matrics of the model we countinue training and measureing metrics to 35
epochs but with more training epochs the model gets worse with prediction
of positive cases.

92
Training ConvNets
Chapter 5

5.10 GitHub

After demonstrating all these models. It took a very long time while
running the models and put them to training we also created a github
repository that contains more information about our project and the code of
all the models we demonstrated and more. And here is the link

https://github.com/GP-FCI-SU/Breast-Cancer-Detection-with-Convolutional-Neural-
Networks

93
Conclusion & Future work
Chapter 6

Conclusion
We were able to achieve better than expected results as 98% for the
validation accuracy and 95% for the recall. as a proof of concept, we feel that
we have demonstrated that ConvNets can successfully be trained to predict
whether mammograms are normal or abnormal.
The life and death nature of diagnosing cancer creates many obstacles to
putting a system like this into practice. We feel that using a system to output
the probabilities rather than the predictions would allow such a system to
provide additional information to radiologists rather than replacing them. In
addition, the ability to adjust the decision threshold would allow radiologists
to focus on more ambiguous scans while devoting less time to scans which
have very low probabilities.
this helps in eliminating the unnecessary waiting time as well as reducing
human and technical errors in diagnosing Breast Cancer.

Future work
It would include creating a system which would take an entire,
unaltered scan as input and analyses it for abnormalities. And it would classify
mammogram images into other classes (benign and malignant)
Other networks will be suggested which include the very deep convolutional
network (VGG) and the residual (ResNet) architecture.
It would include creating a user interface for the specialists with a good user
experience to help them in decision making.
Unfortunately, the lack of available training data seems to be the bottleneck
for pursuing this in the future.

94
Glossary

GLOSSARY
• Backward pass (backpropagation): The calculation of internal variable
adjustments according to the optimizer algorithm, starting from the
output layer and working back through each layer to the input.

• Batch: The set of examples used during training of the neural network.

• Classification: A machine learning model used for distinguishing among


two or more output categories.

• CNNs: Convolutional neural network. That is, a network which has at


least one convolutional layer. A typical CNN also includes other types of
layers, such as pooling layers and dense layers.

• Color Images: Computers interpret color images as 3D arrays.

• Convolution: The process of applying a kernel (filter) to an image

• Convolutions: When working with RGB images we convolve each color


channel with its own convolutional filter. Convolutions on each color
channel are performed in the same way as with grayscale images, i.e. by
performing element-wise multiplication of the convolutional filter
(kernel) and a section of the input array. The result of each convolution
is added up together with a bias value to get the convoluted output.

• Dense and Fully Connected (FC): Each node in one layer is connected to
each node in the previous layer.

• Down sampling: The act of reducing the size of an image.

• Dropout: Removing a random selection of a fixed number of neurons in


a neural network during training.
Glossary

• Early Stopping: In this method, we track the loss on the validation set
during the training phase and use it to determine when to stop training
such that the model is accurate but not overfitting.

• Epoch: A full pass over the entire training dataset.

• Examples: An input/output pair used for training.

• Feature: The input(s) to our model.

• Flattening: The process of converting a 2d image into 1d vector.

• Forward pass: The computation of output values from input.

• Freezing Parameters: Setting the variables of a pre-trained model to non-


trainable. By freezing the parameters, we will ensure that only the
variables of the last classification layer get trained, while the variables
from the other layers of the pre-trained model are kept the same.

• Gradient Descent: An algorithm that changes the internal variables a bit


at a time to gradually reduce the loss function.

• Image Augmentation: Artificially boosting the number of images in our


training set by applying random image transformations to the existing
images in the training set.

• Kernel / filter: A matrix, which is smaller than the input, used to


transform the input into chunks.

• Labels: The output of the model.

• Layer: A collection of nodes connected together within a neural network.

• Learning rate: The “step size” for loss improvement during gradient
descent.
Glossary

• Loss: The discrepancy between the desired output and the actual output.

• Max Pooling: A pooling process in which many values are converted into
a single value by taking the maximum value from among them.

• Max Pooling: When working with RGB images we perform max pooling
on each color channel using the same window size and stride. Max
pooling on each color channel is performed in the same way as with
grayscale images, i.e. by selecting the max value in each window.

• Model: The representation of your neural network.

• MSE: Mean squared error, a type of loss function that counts a small
number of large discrepancies as worse than a large number of small
ones.

• Optimizer: A specific implementation of the gradient descent algorithm.


(There are many algorithms for this. In this course we will only use the
“Adam” Optimizer, which stands for ADAptive with Momentum. It is
considered the best-practice optimizer).

• Padding: Adding pixels of some value, usually 0, around the input image.

• Pooling: The process of reducing the size of an image through down


sampling. There are several types of pooling layers. For example, average
pooling converts many values into a single value by taking the average.
However, maxpooling is the most common.

• ReLU: An activation function that allows a model to solve nonlinear


problems.

• Resizing: When working with images of different sizes, you must resize
all the images to the same size so that they can be fed into a CNN.

• RGB Image: Color image composed of 3 color channels: Red, Green, and
Blue.
Glossary

• SoftMax: A function that provides probabilities for each possible output


class
• SoftMax activation function: calculated the probability distribution.

• Stride: the number of pixels to slide the kernel (filter) across the image.

• Session: A session is run to evaluate the nodes. This is called the


TensorFlow Runtime.

• Test set: The data used for testing the final performance of our neural
network.

• Training Set: The data used for training the neural network.

• Transfer Learning: A technique that reuses a model that was created by


machine learning experts and that has already been trained on a large
dataset. When performing transfer learning we must always change the
last layer of the pre-trained model so that it has the same number of
classes that we have in the dataset we are working with.

• Validation dataset: This dataset is not used for training. Instead, it used
to test the model during training.
References

REFERENCES
1. https://medical-clinical-reviews.imedpub.com/abstract/breast-cancer-
detection-and-screening-23112.html#:~:text=Abstract-
,Breast%20Cancer%20Detection%20and%20Screening,clinical%20exam
ination%2C%20imaging%20and%20cytology
2. https://colab.research.google.com/drive/1c4LyKBBVnKQgOWCN-
KcTTzTTEglqM4GB?usp=sharing
3. https://www.researchgate.net/
4. https://www.academia.edu/
5. https://www.analyticsvidhya.com/
6. https://github.com/escuccim/mias-mammography
7. https://www.researchgate.net/publication/261200035_Breast_cancer_detection_A
_review_on_mammograms_analysis_techniques
8. https://ieeexplore.ieee.org/abstract/document/4454239
9. https://jamanetwork.com/journals/jama/article-abstract/1883018
10. https://inis.iaea.org/search/search.aspx?orig_q=RN:19041135
11. https://freecontent.manning.com/the-computer-vision-pipeline-part-3-image-
preprocessing/
12. https://github.com/escuccim/mammography-models

Das könnte Ihnen auch gefallen