Sie sind auf Seite 1von 44

Machine Learning

With
Neural Networks

Anuj Saxena
Software Consultant
Knoldus Software LLP
Knolx Etiquettes
1) Respect Knolx session timings, you are requested not to
join sessions after a 5 minutes threshold post the session
start time
2) Make sure to submit a constructive feedback for all
sessions as it is very helpful for the presenter
3) Keep your mobile devices in silent mode, feel free to
move out of session in case you need to attend an urgent
call
4) Avoid unwanted chit chat during the session
Artificial Intelligence: Brief History
Agenda
• Machine Learning – what and why?
• ANN - Introduction
• Activation Function
• Train & Error
• Gradient Descent
• Importance of layers
• Back propogation
• Cons
• Demo
SKYNET
Machine learning
● Machine learning is the subfield of computer
science that gives computers the ability to
learn without being programmed.
Machine Learning techniques
● Decision tree
● Random Forests
● K means Clustering
● Naive Bayes Classifier
● Artificial Neural Networks
Artificial Neural
Network
How the brain works
At granular level

perceptron
What is perceptron
● A perceptron is an artificial unit that mimics a
biological neuron.
● Using multiple perceptrons we create an
Artificial Neural Network.
● In an ANN, each single unit in every layer
(except input layer) is a perceptron.
Perceptron
A simple neural network
A bit simpler
Self Drive Car: ALVINN
● Stands for Autonomous Land Vehicle
In a Neural Network
● Steering a vehicle
● Taking input from
a 30X32 sensor
● Hence, 30X32 units
in input layer
● These inputs are provided
to our neural net and the
output tells us which neuron
to fire from all output neurons
(where each neuron defines
a direction)
Activation Function
● The activation function is the last step of
processing in a perceptron.
● It takes the summation of multiplication of the
inputs and their corrosponding weights
Need for activation
• Consider the following

• Here value of Y ranges from -inf to +inf


• Hence how to decide whether the neuron should be
fired(activated) or not??
• So we got some activation functions with us
• Step Function
• Linear Function
• Sigmoid Function
Step function
• A threshold based activation function
• “activated” if Y > threshold else not
• In this picture
output is 1 ( activated) when
value > 0 (threshold)
and outputs a 0 ( not activated) otherwise

• Drawbacks:
• Can work wrong if using more that two classes (if more than one neuron outputs
activated)
• Multiple layer not supported
Linear Function
● Y = c * (summation + bias)
where summation = sum(input*weight)
● A linear function in form of
y = mx
● Not binary in nature
● Drawbacks
– Unbounded
– Can not use multiple layers with this too
Sigmoid
● Looks smooth
● Like step function
● Most widely used
● Benefits
– Nonlinear
– Bounded values
Sigmoid contd.
● As we are working in bounded outputs, our activation
functions have a range(0, 1)
i.e. our activations are bounded
● Although bounded but not binary in nature
● i.e. we can take the max ( or softmax) in case of more than
one neurons activated.
● As it is non linear in nature hence we can use mutiple layers to
effectively.
What is bias?
● The main function of a bias is to provide every node with a
trainable constant value (in addition to the normal inputs
that the node receives)
● Lets consider a simple network with 1 input and 1 output

● The output of the network is computed by multiplying the


input (x) by the weight (w0) and passing the result through
some kind of activation function (e.g. a sigmoid function.)
Bias(contd.)
● If we change the values
of w0 the graph
fluctuates like this
● Changing the weight
w0 essentially changes
the "steepness" of
the sigmoid
● But what if you wanted
the network to output 0
when input (x) is 2?
● changing the steepness
of the sigmoid won't
really work we need to
shift the entire curve
to the right.
Bias(contd.)
● Now consider this network with added bias
● The output of the network becomes sig(w0*x +
w1*1.0)
● Here the value of the bias is taken as 1.0
Bias(contd.)
● Now the graph moves
something like this with
the change in bias
● Having a weight of -5
for w1 shifts the curve
to the right,
which allows us to
have a network that
outputs 0 when x is 2.
Train & Error
● We now know that our perceptrons depend on its
weight vector to provide an output.
● In the training phase we shift the weights for each
input until we get our desired output
● In simple cases and less number of inputs we can
manually change our weights till the limit our
training data satisfies the outputs
● But what if the inputs are very large and training
data is really big too (a real time scenario)
Error
● Finding error implies that if we have set our weights in
our ANN model and now we want to check if they are
correct or not?
● An ideal case can not be found when there is no error in
the weight vector. So there will always be some error in
our model.
● i.e. Error = (expected output – gained output)
● Here comes the tolerance(how much error is acceptable)
● i.e. till when we need to update the weights
Minimizing Error through Gradient
Descent
● What is gradient??
Ans: An increase or decrease in the magnitude of a property observed in
passing from one point or moment to another
Or
In mathematics, the gradient is a multi-variable generalization of the
derivative.

● Error = -Y

● Squared error function E(w) = 1/2

● Gradient

● Weight update: where


Issue with gradient descent
● Gradient descent works fine only with single
layer models (why???)
● But for multilayer???
● Here comes the back propogation
Leftovers
Layers
● Problems that require two hidden layers are
rarely encountered as neural networks with two
hidden layers can represent functions with any
kind of shape.
● Currently no theoretical reason to use neural
networks with any more than two hidden layers.
● Most problem can be solved using only one
hidden layer.
Standards
● The number of neurons in a hidden layer:
Back Propogation
● We can find error in weights between hidden layer and the
output layer
● Problem is finding the error in weights between input layer and
hidden layer (and between one hidden layer to another hidden
layer in case of multiple hidden layers)
● For that we have back propogation
● In back propogation we find the error at the output layer and
then use that error to calculate error at the hidden layer.
Algorithm
Algorithm contd.
Output layer
Hidden Layer
Weight Change
Cons
● Google’s Photos app mistakenly tagged two
black people in a photograph as “gorillas.”
● Flickr’s smart new image recognition tool,
powered by Yahoo’s neural network, also
tagged a black man as an “ape.”
Demo
References
● Machine Learning – Tom Mitchell
● http://www.theprojectspot.com/tutorial-post

Das könnte Ihnen auch gefallen