Creating Neural Network from Scratch in Python

25/11/2019 Creating a Neural Network from Scratch in Python
Creating a Neural Network from Scratch in

Python
Introduction
Have you ever wondered how chatbots like Siri, Alexa, and Cortona are able to respond to user queries? Or how
the autonomous cars are able to drive themselves without any human help? All of these fancy products have one
thing in common: Artificial Intelligence (AI). It is the AI which enables them to perform such tasks without being
supervised or controlled by a human. But the question remains: "What is AI?" A simple answer to this question is:
"AI is a combination of complex algorithms from the various mathematical domains such as Algebra, Calculus, and
Probability and Statistics.
In this article, we will study a simple artificial neural network, which is one of the main building block of artificial
intelligence. Different variants of an Artificial Neural Network exists, dedicated to solving a particular problem. For
instance Convolutional Neural Networks (https://en.wikipedia.org/wiki/Convolutional_neural_network) are
commonly used for Image Recognition problems while Recurrent Neural Networks
(https://en.wikipedia.org/wiki/Recurrent_neural_network) are used to solve sequence problems.
There are many deep learning libraries that can be used to create a neural network in a single line of code.
However, if you really want to understand the in-depth working of a neural network, I suggest you learn how to
code it from scratch in any programming language. Performing this exercise will really clear up many of the
concepts for you. And this is exactly what we will do in this article.
The Problem
Since this is an introductory article, the problem that we are going to solve is pretty simple. Suppose we have some
information about obesity, smoking habits, and exercise habits of five people. We also know whether these people
are diabetic or not. Our dataset looks like this:
Person Smoking Obesity Exercise Diabetic
Person 1 0 1 0 1
Person 2 0 0 1 0
Person 3 1 0 0 0
Person 4 1 1 0 1
Person 5 1 1 1 1
In the above table, we have five columns: Person, Smoking, Obesity, Exercise, and Diabetic. Here 1 refers to true
and 0 refers to false. For instance, the first person has values of 0, 1, 0 which means that the person doesn't smoke,
is obese, and doesn't exercise. The person is also diabetic.
It is clearly evident from the dataset that a person's obesity is indicative of him being diabetic. Our task is to create
a neural network that is able to predict whether an unknown person is diabetic or not given data about his exercise
habits, obesity, and smoking habits. This is a type of supervised learning problem where we are given inputs and
corresponding correct outputs and our task is to find the mapping between the inputs and the outputs.
Note: This is just a fictional dataset, in real life, obese people are not necessarily always diabetic.
The Solution
We will create a very simple neural network with one input layer and one output layer. Before writing any actual
code, let's first let's see how our neural network will execute, in theory.
Neural Network Theory

A neural network is a supervised learning algorithm which means that we provide it the input data containing the
independent variables and the output data that contains the dependent variable. For instance, in our example our
independent variables are smoking, obesity and exercise. The dependent variable is whether a person is diabetic or
not.
In the beginning, the neural network makes some random predictions, these predictions are matched with the
correct output and the error or the difference between the predicted values and the actual values is calculated. The
function that finds the difference between the actual value and the propagated values is called the cost function.
The cost here refers to the error. Our objective is to minimize the cost function. Training a neural network basically
refers to minimizing the cost function. We will see how we can perform this task.
The neural network that we are going to create has the following visual representation.
A neural network executes in two steps: Feed Forward and Back Propagation. We will discuss both of these steps
in details.
Feed Forward
In the feed-forward part of a neural network, predictions are made based on the values in the input nodes and the
weights. If you look at the neural network in the above figure, you will see that we have three features in the
dataset: smoking, obesity, and exercise, therefore we have three nodes in the first layer, also known as the input
layer. We have replaced our feature names with the variable x , for generality in the figure above.
The weights of a neural network are basically the strings that we have to adjust in order to be able to correctly
predict our output. For now, just remember that for each input feature, we have one weight.
The following are the steps that execute during the feedforward phase of a neural network:
Step 1: (Calculate the dot product between inputs and weights)
The nodes in the input layer are connected with the output layer via three weight parameters. In the output layer,
the values in the input nodes are multiplied with their corresponding weights and are added together. Finally, the
bias term is added to the sum. The b in the above figure refers to the bias term.
The bias term is very important here. Suppose if we have a person who doesn't smoke, is not obese, and doesn't
exercise, the sum of the products of input nodes and weights will be zero. In that case, the output will always be
zero no matter how much we train the algorithms. Therefore, in order to be able to make predictions, even if we do
not have any non-zero information about the person, we need a bias term. The bias term is necessary to make a
robust neural network.
Mathematically, in step 1, we perform the following calculation:
X. W = x1w1 + x2w2 + x3w3 + b
Step 2: (Pass the result from step 1 through an activation function)
The result from Step 1 can be a set of any values. However, in our output we have the values in the form of 1 and
0. We want our output to be in the same format. To do so we need an activation function
(https://en.wikipedia.org/wiki/Activation_function), which squashes input values between 1 and 0. One such
activation function is the sigmoid (https://en.wikipedia.org/wiki/Sigmoid_function) function.
The sigmoid function returns 0.5 when the input is 0. It returns a value close to 1 if the input is a large positive
number. In case of negative input, the sigmoid function outputs a value close to zero.
Mathematically, the sigmoid function can be represented as:
1
θX.W =
1 + e −X.W
Let us try to plot the sigmoid function:
input = np.linspace(-10, 10, 100)
def sigmoid(x):
# return sigmoid function
# plot sigmoid function
In the script above, we first randomly generate 100 linearly-spaced points between -10 and 10. To do so, we use
the linspace method from the NumPy library (/numpy-tutorial-a-simple-example-based-guide/). Next, we define
the sigmoid function. Finally, we use the matplotlib library to plot the input values against the values returned by
the sigmoid function. The output should look like this:
You can see that if the input is a negative number, the output is close to zero, otherwise if the input is positive the
output is close to 1. However, the output is always between 0 and 1. This is what we want.
This sums up the feedforward part of our neural network. It is pretty straightforward. First we have to find the dot
product (https://en.wikipedia.org/wiki/Dot_product) of the input feature matrix with the weight matrix. Next, pass
the result from the output through an activation function, which in this case is the sigmoid function. The result of
the activation function is basically the predicted output for the input features.
Back Propagation
In the beginning, before you do any training, the neural network makes random predictions which are far from
correct.
The principle behind the working of a neural network is simple. We start by letting the network make random
predictions about the output. We then compare the predicted output of the neural network with the actual output.
Next, we fine-tune our weights and the bias in such a manner that our predicted output becomes closer to the
actual output, which is basically known as "training the neural network".
In the back propagation section, we train our algorithm. Let's take a look at the steps involved in the back
propagation section.
Step 1: (Calculating the cost)
The first step in the back propagation section is to find the "cost" of the predictions. The cost of the prediction can
simply be calculated by finding the difference between the predicted output and the actual output. The higher the
difference, the higher the cost will be.
There are several other ways to find the cost, but we will use the mean squared error
(https://en.wikipedia.org/wiki/Mean_squared_error) cost function. A cost function is simply the function that finds
the cost of the given predictions.
The mean squared error cost function can be mathematically represented as:
Petreceri de Craciun corporate

DJ-i, dansatori profesionisti care au performat in
Ibiza. Nimeni nu va pleca treaz acasa.
JUBILE Catering Book Now
MSE = 1
n
∑ n
i=1
(predicted − observed)2
Here n is the number of observations.
Step 2: (Minimizing the cost)
Our ultimate purpose is to fine-tune the knobs of our neural network in such a way that the cost is minimized. If
your look at our neural network, you'll notice that we can only control the weights and the bias. Everything else is
beyond our control. We cannot control the inputs, we cannot control the dot products, and we cannot manipulate
the sigmoid function.
In order to minimize the cost, we need to find the weight and bias values for which the cost function returns the
smallest value possible. The smaller the cost, the more correct our predictions are.
This is an optimization problem (https://en.wikipedia.org/wiki/Optimization_problem) where we have to find the

function minima (https://en.wikipedia.org/wiki/Maxima_and_minima).
To find the minima of a function, we can use the gradient decent

(https://en.wikipedia.org/wiki/Optimization_problem) algorithm. The gradient decent algorithm can be
mathematically represented as follows:
repeat until convergence : { w := w − α

j j
∂wj
∂
J ( w 0 , w 1. . . . . . . w n ) } . . . . . . . . . . . . . (1)
Here in the above equation, J is the cost function. Basically what the above equation says is: find the partial
derivative of the cost function with respect to each weight and bias and subtract the result from the existing weight
values to get the new weight values.
The derivative of a function gives us its slope at any given point. To find if the cost increases or decreases, given the
weight value, we can find the derivative of the function at that particular weight value. If the cost increases with the
increase in weight, the derivative will return a positive value which will then be subtracted from the existing value.
On the other hand, if the cost is decreasing with an increase in weight, a negative value will be returned, which will
be added to the existing weight value since negative into negative is positive.
In Equation 1, we can see there is an alpha symbol, which is multiplied by the gradient. This is called the learning
rate. The learning rate defines how fast our algorithm learns. For more details about how learning rate can be
defined, check out this article (https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-
rate-methods-for-deep-learning-2c8f433990d1) .
We need to repeat the execution of Equation 1 for all the weights and bias until the cost is minimized to the
desirable level. In other words, we need to keep executing Equation 1 until we get such values for bias and weights,
for which the cost function returns a value close to zero.
And that's pretty much it. Now is the time to implement what we have studied so far. We will create a simple
neural network with one input and one output layer in Python.
Neural Network Implementation in Python

Let's first create our feature set and the corresponding labels. Execute the following script:
import numpy as np
feature_set = np.array([[0,1,0],[0,0,1],[1,0,0],[1,1,0],[1,1,1]])
labels = np.array([[1,0,0,1,1]])
labels = labels.reshape(5,1)
In the above script, we create our feature set. It contains five records. Similarly, we created a labels set which
contains corresponding labels for each record in the feature set. The labels are the answers we're trying to predict
with the neural network.
The next step is to define hyper parameters for our neural network. Execute the following script to do so:
np.random.seed(42)
weights = np.random.rand(3,1)
bias = np.random.rand(1)
lr = 0.05
In the script above we used the random.seed function so that we can get the same random values whenever the
script is executed.
In the next step, we initialize our weights with normally distributed random numbers. Since we have three features
in the input, we have a vector of three weights. We then initialize the bias value with another random number.
Finally, we set the learning rate to 0.05.
Next, we need to define our activation function and its derivative (I'll explain in a moment why we need to find the
derivative of the activation). Our activation function is the sigmoid function, which we covered earlier.
The following Python script creates this function:
def sigmoid(x):
# return expression for sigmoid function
And the method that calculates the derivative of the sigmoid function is defined as follows:
def sigmoid_der(x):
# return expression for the derivative of the sigmoid function (note: it can be written as a function of the sigmoid
function itself)
Now we are ready to train our neural network that will be able to predict whether a person is obese or not.
# Write the training algorithm following the explanations below.
In the first step, we define the number of epochs. An epoch is basically the number of times we want to train the
algorithm on our data. We will train the algorithm on our data 20,000 times. I have tested this number and found
that the error is pretty much minimized after 20,000 iterations. You can try with a different number. The ultimate
goal is to minimize the error.
# start you for loop here
Next we store the

values from the
feature_set to the
input variable.
# assign a new variable
input for the feature_set
# Define a new variable as the dot product of the input and the weight vector and add bias to it. This is Step 1 of the feedforward
section.
# Pass the dot product through the sigmoid activation function, as explained in Step 2 of the feedforward
section. This completes the feed forward part of our algorithm.
# Now is the time to start backpropagation. The variable z contains the predicted outputs. The first step of
the backpropagation is to find the error. Do so in a line below.
# Print the error on the screen.
Now is the time to execute Step 2 of backpropagation, which is the gist of this code.
We know that our cost function is:
1
MSE = ∑n (predicted − observed)2
n i=1
We need to differentiate this function with respect to each weight.Let's suppose "d_cost" is the derivate of
our cost function with respect to weight "w", we can use chain rule to find this derivative, as shown
below:
d_cost d_cost d_pred dz

= ,
dw d_pred dz dw
d d_cost
_ d_pred
d
Can be calculated as:
2(predicted − observed)
Here, 2 is constant and therefore can be ignored. This is basically the error which we already calculated.
# Translate this into a line of code.
Next we have to find:

d_pred
dz
Here "d_pred" is simply the sigmoid function and we have differentiated it with respect to input dot product "z".
# Translate this into a line of code.
Finally, we have to find:
d _z
dw
We know that:
z = x1w1 + x2w2 + x3w3 + b

Therefore, derivative with respect to any weight is simply the corresponding input. Hence, we can write our final
derivative of the cost function with respect to any weight.
Now we can have a z_delta variable, for example, which contains the product of dcost_dpred and dpred_dz .
Instead of looping through each record and multiplying the input with corresponding z_delta , we take the
transpose of the input feature matrix and multiply it with the z_delta . Finally, we multiply the learning rate
variable lr with the derivative to increase the speed of convergence.
Once the loop starts, you will see that the total error starts decreasing .
#Write these lines of code and start the loop.

The error should be extremely small at the end of the training of our neural network. At this point of time our
weights and bias will have values that can be used to detect whether a person is diabetic or not, based on his
smoking habits, obesity, and exercise habits.
You can now try and predict the value of a single instance. Let's suppose we have a record of a patient that comes
in who smokes, is not obese, and doesn't exercise. Let's find if he is likely to be diabetic or not. The input feature
will look like this: [1,0,0].
Execute the following script:
#Write script that will tell you how likely it is that this person is diabetic (in other words, the result).
In the output you will see:
[0.00707584]
You can see that the person is likely not diabetic since the value is much closer to 0 than 1.
Now let's test another person who doesn't, smoke, is obese, and doesn't exercises. The input feature vector will be
[0,1,0]. Execute this script:
#Write script that will tell you how likely it is that this person is diabetic (in other words, the result).
In the output you will see the following value:
[0.99837029]
You can see that the value is very close to 1, which is likely due to the person's obesity.
Conclusion
In this article we created a very simple neural network with one input and one output layer from scratch in Python.
Such a neural network is simply called a perceptron (https://en.wikipedia.org/wiki/Perceptron). A perceptron is able
to classify linearly separable data. Linearly separable data is the type of data which can be separated by a
hyperplane in n-dimensional space.
Real-word artificial neural networks are much more complex, powerful, and consist of multiple hidden layers and
multiple nodes in the hidden layer. Such neural networks are able to identify non-linear real decision boundaries. I
will explain how to create a multi-layer neural network from scratch in Python in an upcoming article.

Creating Neural Network from Scratch in Python

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Creating Neural Network from Scratch in Python

Hochgeladen von

Copyright:

Verfügbare Formate

25/11/2019 Creating a Neural Network from Scratch in Python

Creating a Neural Network from Scratch in

Person Smoking Obesity Exercise Diabetic

Neural Network Theory

Step 1: (Calculate the dot product between inputs and weights)

Mathematically, in step 1, we perform the following calculation:

X. W = x1w1 + x2w2 + x3w3 + b

Step 2: (Pass the result from step 1 through an activation function)

Mathematically, the sigmoid function can be represented as:

Let us try to plot the sigmoid function:

input = np.linspace(-10, 10, 100)

# plot sigmoid function

Step 1: (Calculating the cost)

Petreceri de Craciun corporate

Here n is the number of observations.

Step 2: (Minimizing the cost)

This is an optimization problem (https://en.wikipedia.org/wiki/Optimization_problem) where we have to find the

To find the minima of a function, we can use the gradient decent

repeat until convergence : { w := w − α

Neural Network Implementation in Python

The following Python script creates this function:

# Write the training algorithm following the explanations below.

Next we store the

# Print the error on the screen.

d_cost d_cost d_pred dz

Next we have to find:

# Translate this into a line of code.

Finally, we have to find:

z = x1w1 + x2w2 + x3w3 + b

#Write these lines of code and start the loop.

Execute the following script:

In the output you will see:

In the output you will see the following value:

Das könnte Ihnen auch gefallen