Sie sind auf Seite 1von 34

Deep Learning

Neural Network

2019/2/25 1
Course Details

• Contents:
• Introduction
• Programing frameworks
• Applications, data collection, data preprocessing, features selection
• Neural Network and Deep Learning Architecture
• Convolutional Neural Network
• Sequence Model
• Introduction to Reinforcement Learning

• Reference “Deep Learning” by :Ian Goodfellow, Yoshua Bengio, Aron Courville


• Most lecture note base om Andrew Ng notes

2019/2/25 2
Todays’ discussion focus on:

• Recap on last week discussion and project ideas for final project

• Introduction to Neural Network

• Logistic regression in neural network mindset

• Setting your notebooks for programming in python

2019/2/25 3
Final project ideas

• Think of a problem you need to solve in your research area


• Remaining Useful Life of robot arm joint
• Automated Visual Inspection of production Process (laser, electronics)
• Health care
• Cyber security
• Automated driving( self drive car)
• Intelligent conversational interface/ chatbots
• Energy use and cost
• Understanding intentions / bad behaviors
• Automated reading
• Customer service and troubleshooting

2019/2/25 4
Project timing

• Proposal
• Progress and plan for next step
• Poster presentation (short)
• Final report

2019/2/25 5
Summery of last week

• Mitchell’s definition
• “A computer program is said to learn from experience E with respect to some class of tasks T and
performance measure P, if its performance at tasks in T, as measured by P, improves with
experience E”.
• Machine learning performance P
TN  TP
• Accuracy Accuracy 
TN  TP  FN  FP
• Confusion matrix (Precision, Recall, F1 measures)
TP
Pr ecision 
TP  FP
P*R
TP F1  2( )
Re call 
TP  FN
PR

2019/2/25 6
• A=99.9%

Wiki
2019/2/25 7
• Common machine learning tasks T
• Classification
• Regression
• Machine translation
• Anomaly detection
• Machine learning experience E,
• Supervised
• Unsupervised
• Some machine learning algorithms interact with the environment
(feedback in the loop) – reinforcement learning

2019/2/25 8
Underfitting and overfitting

• Generalization ability – generalization error (or Test error)

• Solve problem of overfitting


• Reduce the number of features
• Regularization 1  m n
2
J ( )   
2m  i 1
( h (i )
( x )  y ) 
(i ) 2
 
i 1
 j 

2019/2/25 9
Lets dive to deep Learning

2019/2/25 10
Neural Network and Deep Learning Architecture

• Introduction
• Basic of Neural network Architecture
• One Layer Neural Network
• Deep Neural Network

2019/2/25 11
What is Neural Network

size price
price

neuron

size of house

2019/2/25 12
Sensor representation in brain

• [BrainPort; Welsh & Blasch, 1997; Nagel et al., 2005; Constantine-Paton & Law, 2009]

2019/2/25 13
Housing Price Prediction

size 𝑥1

#bedrooms 𝑥2
y
zip code 𝑥3

wealth 𝑥4

x, y 
Supervised Learning

Input(x) Output (y) Application

House features Price Real Estate


STD NN
Ad, user info Click on ad? (0/1) Advertising

Image Object (1,…,100) Object recognition CNN

Audio Text transcript Speech recognition


RNN
English French Machine translation

Image, Location info Position of other cars Autonomous driving combo


and pedestrian
Supervised Learning
Structured Data Unstructured Data
Size #bedrooms … Floor No Price
(1000$s)
2104 3 3 400
1600 3 5 330
2400 3 6 369
⋮ ⋮ ⋮ ⋮
3000 4 2 540
Audio/Vibration

User Age Ad Id … Click Neural Network and


41 93242 1 neuroscience study…
80 93287 0
18 87312 1
⋮ ⋮ ⋮ Text
27 71244 1
Feed forward networks

Standard NN
Recurrent NN
Convolutional NN

2019/2/25 17
What drive deep learning

• Large amount of data available


• Faster computation and
• Innovation in neural network algorithm.
Scale drives deep learning

Small training
set

Andrew Ng’s graph


2019/2/25 18
History

• Trend

Gartner hyper cycle graph to analyzing the history of artificial neural network technology

2019/2/25 19
Break

2019/2/25 20
Binary Classification

1 (cat) vs 0 (non cat)

x y
255
231
Blue  
Green 42 
Red  
22 
 
nx  12288
X  
255
134 
 
 
255
 
134 
Notation
 x, y  x  R nx , y  0,1

m training examples: x (1)


  
, y (1) , x ( 2 ) , y ( 2 ) ,  x ( m ) , y ( m ) 
    
Y  y (1) , y ( 2 ) , , y ( m ) 
 (1) ( 2 ) (m) 
X  x x  x  Y  R1m
    
 

X  R n x m
Logistic Regression

• Logistic regression is a learning algorithm used in a supervised learning problem


when the output 𝑦 are all either zero or one (binary).


• Given x, y  P( y  1 | x),

X R nx
where 0  y  1  ( z) 
1
1  ez
• Parameters:
w R nx bR

• Output:

y   ( wT x  b)

2019/2/25 23
Logistic Regression cost function
𝑦ො = 𝜎 𝑤 𝑇 𝑥 + 𝑏 , where 𝜎 𝑧 =
1 z (i )  wT x (i )  b
1+𝑒 −𝑧

Given: (𝑥 (1) , 𝑦 (1) ),…,(𝑥 (𝑚) , 𝑦 (𝑚) ) , want 𝑦ො (𝑖) ≈ 𝑦 𝑖

1 
Loss (error) fun: L( y, y )  ( y  y ) 2
2
  
L( y, y )  ( y log y  (1  y ) log(1  y )) y 1 y0
 
if y  1 : L( y, y )   log y
 
if y  0 : L( y, y )   log(1  y )
h (x) 1 h (x) 1

1 m 
1 m  (i )  (i ) 

Cost function: J (W , b)   L( y , y )    ( y log y  (1  y ) log(1  y (i ) ))
(i ) (i ) (i )

m i 1 m i 1  
Gradient Descent
1
𝑦ො = 𝜎 𝑤𝑇𝑥 +𝑏 , 𝜎 𝑧 =
1+𝑒 −𝑧

𝑚 𝑚
1 1
𝐽 𝑤, 𝑏 = ෍ ℒ(𝑦ො 𝑖 , 𝑦 (𝑖) ) = − ෍𝑦
(𝑖)
log 𝑦ො 𝑖 + (1 − 𝑦 (𝑖) ) log(1 − 𝑦ො 𝑖 )
𝑚 𝑚
𝑖=1 𝑖=1

𝐽 𝑤, 𝑏

Want to find 𝑤, 𝑏 that minimize 𝐽 𝑤, 𝑏

𝑏
𝑤
Gradient descent

• Lets J be function of w; J(w)


repeat{
J (w) dJ ( w)
0
dw dJ
w : w  
dw
}
𝑤
• If J(w , b), repeat{ w : w  
J
w
J
b : b  
w
2019/2/25 } 26
Computation graph

 a   (z )
• z=xy y   ( wx  b)
Remember chain rule of differentiation

If
z  u b
z  xy Then chain rule:
u  wx

x y w x b

Lets apply this to logistic regression

2019/2/25 27
Optimization algorithms

• Root Mean Square Prop(RMSProp) Momentum:


This results in minimizing oscillations and
faster convergence.

Adaptive Moment Estimation(Adam).


Combines ideas from both RMSProp and
Momentum
2019/2/25 28
Adaptive Moment Estimation(Adam).
• Combines ideas from both RMSProp and Momentum

2019/2/25 29
Logistic regression
+1
b
x1 w1 
𝑧= 𝑤𝑇𝑥 +𝑏 y
x2 w2 z  wT x  b
𝑦ො = 𝑎 = 𝜎(𝑧)  (z )
w3
x3 w4
𝐿 𝑎, 𝑦 = −(𝑦 log(𝑎) + (1 − 𝑦) log(1 − 𝑎))
x4
da= dL y 1 y
 
da a 1 a
w1 : w1   dw1
dz= dL  dL . da  a  y
dz da dz w2 : w2   dw2
b : b  db
dL dL da dz
dw1=  . .  x1dz
dw1 da dz dw1
dL dz dL
dw2=  dz.  x2 dz db =  dz
dw2 dw dw2
Logistic regression

• For m training examples:


𝑚 𝑚
1 1
𝐽 𝑤, 𝑏 = ෍ ℒ(𝑦ො 𝑖 , 𝑦 (𝑖) ) = − ෍𝑦
(𝑖)
log 𝑦ො 𝑖 + (1 − 𝑦 (𝑖) ) log(1 − 𝑦ො 𝑖 )
𝑚 𝑚
𝑖=1 𝑖=1

 (i ) for each training examples ( x (i ) , y (i ) )


a (i )  y   ( z (i ) )   ( wT x (i )  b)

 1 m  dw1(i ) , dw2(i ) , db (i )
J ( w, b)   L(a ( i ) , y ( i ) )
w1 m i 1 w1
1 m
  dw1(i )
m i 1

2019/2/25 31
Logistic regression

• Python implementation of logistic regression for m examples


J= 0 ;dw1 =0;dw2 = 0; db = 0
m=10
J
dw1 
for i in range(1,m): dw1
z = w.T*x + b
a = sigmoid(z)
J += -[y*np.log(a) + (1-y)*np.log(1-a)]

dz = a-y
dw1 += x1*dz w1 : w1   dw1
dw2 += x2 * dz
db += dz w2 : w2   dw2
J= J/m
dw1= dw1/m; dw2= dw2/m; db = db/m
b : b   db
Vector Valued functions

a11 a12  Vector implementation


𝑧1 a a 
𝑧= ⋮ A   21 22 
𝑧𝑛   import numpy as np
  u = np.exp(z)
an1 an 2 
u = np.log(A)
u = np.max(0,z)
import math
v = np.zeros((n,1), dtype = np.float32)
for i in range(n):
u[i] = math.exp(v[i])
Vectorization of logistic regression
1 T 
nx J ( w, b)   ( y log y  (1  y )T log(1  y ))
𝑧= 𝑤𝑇𝑥 +𝑏 w R m
J= 0 ;dw1 =0;dw2 = 0; db = 0
m=10 for iter in range(10000):
• z=0 Z = np.dot(W.T,X) + b
for i in range(1,m):
A = sigmoid(Z)
z = w.T*x + b
a = sigmoid(z)
dZ = A - Y
J += -[y*np.log(a) + (1-y)*np.log(1-a)] dw = 1/m *(X *dZ.T)
db = np.sum(dZ)
dz = a-y
dw1 += x1*dz w := w - αdw
dw2 += x2 * dz b := b - αdb
db += dz
J= J/m
dw1= dw1/m; dw2= dw2/m; db = db/m

2019/2/25 34