Beruflich Dokumente
Kultur Dokumente
(http://www.blog.hackerearth.com/)
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 1/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 2/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
40 5
(https:// (https:// (http:// (https://
plus.go twitter. www.fa www.lin
ogle.co com/sh cebook. kedin.c
m/shar are? com/sh om/cws
e? original are.php /share?
url=htt _referer ? url=htt
p%3A% =/&text u=http p%3A%
2F%2Fb =3+Typ %3A%2 2F%2Fb
log.hac es+of+ F%2Fbl log.hac
kereart Gradien og.hack kereart
h.com% t+Desc erearth. h.com%
2F3- ent+Alg com%2 2F3-
types- orithms F3- types-
gradien +for+S types- gradien
t- mall+% gradien t-
descent 26+Lar t- descent
- ge+Dat descent -
algorith a+Sets - algorith
ms- &url=ht algorith ms-
small- tp://blo ms- small-
large- g.hacke small- large-
data- rearth.c large- data-
sets) om/3- data- sets)
types- sets)
gradien
t-
descent
-
algorith
ms-
small-
large-
data-
sets)
Introduction
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 3/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
Gradient Descent Algorithm is an iterative algorithm to nd a Global Minimum of an objective function (cost function) J(Ө). The
categorization of GD algorithm is for accuracy and time consuming factors that are discussed below in detail. This algorithm is
widely used in machine learning for minimization of functions. Here,the algorithm to achieve objective goal of picture below is in
this tutorial below.
We will consider linear regression for algorithmic example in this article while talking about gradient descent, although the ideas
apply to other algorithms too, such as
Logistic regression
Neural networks
Where are parameters and are input features. In order to solve the model, we try to nd the parameter, such
that the hypothesis ts the model in the best possible way.To nd the value of parameters we develop a cost function J( ) and use
gradient descent to minimize this function.
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 4/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
2. Keep changing these values iteratively in such a way it minimize the objective function, J( ).
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 5/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
If you have 300,000,000 records you need to read in all the records into memory from disk because you can't store them all
in memory.
Especially because disk I/O is typically a system bottleneck anyway, and this will inevitably require a huge number of reads.
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 6/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
Batch gradient descent is not suitable for huge datasets. The code below explains implementing gradient descent in python.
1 import numpy as np
2 import random
3
4
5 def gradient_descent(alpha, x, y, ep=0.0001, max_iter=10000):
6 converged = False
7 iter = 0
8 m = x.shape[0] # number of samples
9
10 # initial theta
11 t0 = np.random.random(x.shape[1])
12 t1 = np.random.random(x.shape[1])
13
14 # total error, J(theta)
15 J = sum([(t0 + t1*x[i] - y[i])**2 for i in range(m)])
16
17 # Iterate Loop
18 while not converged:
19 # for each training sample, compute the gradient (d/d_theta j(theta))
20 grad0 = 1.0/m * sum([(t0 + t1*x[i] - y[i]) for i in range(m)])
21 grad1 = 1.0/m * sum([(t0 + t1*x[i] - y[i])*x[i] for i in range(m)])
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 7/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
22
23 # update the theta_temp
24 temp0 = t0 - alpha * grad0
25 temp1 = t1 - alpha * grad1
26
27 # update theta
28 t0 = temp0
29 t1 = temp1
30
31 # mean squared error
32 e = sum( [ (t0 + t1*x[i] - y[i])**2 for i in range(m)] )
33
34 if abs(J-e) <= ep:
35 print 'Converged, iterations: ', iter, '!!!'
36 converged = True
37
38 J = e # update error
39 iter += 1 # update iter
40
41 if iter == max_iter:
42 print 'Max interactions exceeded!'
43 converged = True
44
45 return t0,t1
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 8/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
Taking rst step: pick rst training example and update the parameter using this example, then for second example and so
on
Taking second step: pick second training example and update the parameter using this example, and so on for ' m '.
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 9/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
SGD Never actually converges like batch gradient descent does,but ends up wandering around some region close to the global
minimum.
reduces the variance of the parameter updates, which can lead to more stable convergence.
can make use of highly optimized matrix, that makes computing of gradient very ef cient.
After initializing the parameter with arbitrary values we calculate gradient of cost function using following relation:
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 10/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
where ' b ' is number of batches and ' m ' is number training examples.
1. If α is too large algorithm would take larger steps and algorithm may not converge .
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 11/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
" J(Θ) should decrease after every iteration and should become constant (or converge ) after some iterations."
Above statement is because after every iteration of gradient descent and Θ takes values such that J(Θ) moves towards depth
i.e. value of J(Θ) decreases after every iteration.
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 12/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 13/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
Summary
In this article, we learned about the basics of gradient descent algorithm and its types. These optimization algorithms are being
widely used in neural networks these days. Hence, it's important to learn. The image below shows a quick comparison in all 3
types of gradient descent algorithms:
(http://blog.hackerearth.com/wp-content/uploads/2017/02/Gradient_Descent_Types.png)
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 14/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
40 5
(https:// (https:// (http:// (https://
plus.go twitter. www.fa www.lin
ogle.co com/sh cebook. kedin.c
m/shar are? com/sh om/cws
e? original are.php /share?
url=htt _referer ? url=htt
p%3A% =/&text u=http p%3A%
2F%2Fb =3+Typ %3A%2 2F%2Fb
log.hac es+of+ F%2Fbl log.hac
kereart Gradien og.hack kereart
h.com% t+Desc erearth. h.com%
2F3- ent+Alg com%2 2F3-
types- orithms F3- types-
gradien +for+S types- gradien
t- mall+% gradien t-
descent 26+Lar t- descent
- ge+Dat descent -
algorith a+Sets - algorith
ms- &url=ht algorith ms-
small- tp://blo ms- small-
large- g.hacke small- large-
data- rearth.c large- data-
sets) om/3- data- sets)
types- sets)
gradien
t-
descent
-
algorith
ms-
small-
large-
data-
sets)
AUTHOR POST
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 16/17
1/20/2018 3 Types of Gradient Descent Algorithms for Small & Large Data Sets | HackerEarth Blog
(http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets)
http://blog.hackerearth.com/3-types-gradient-descent-algorithms-small-large-data-sets 17/17