Deep Learning - Evolution and Future Trends

Lecture Series: AI is the New Electricity
Deep Learning - SCOPING, EVOLUTION & FUTURE TRENDS
Dr. Chiranjit Acharya

AILABS Academy
J-3, GP Block, Sector V, Salt
Lake City, Kolkata, West Presented at AILABS Academy,
Bengal 700091 Kolkata on April 18th 2018
Confidential, unpublished property of aiLabs. Do not duplicate or distribute. Use and distribution limited solely to authorized personnel. (c) Copyright 2018
A Journey into Deep Learning
▪Cutting edge technology

▪Garnered traction in both industry and academics
▪Achieves near-human-level performance in many pattern
recognition tasks
▪Excels in
▪structured, relational data
▪unstructured rich-media data such as image, video,
audio and text
AILABS (c) Copyright 2018 2

▪What is Deep Learning? Where is the “deepness”?
▪Where does Deep Learning come from?
▪What are the models and algorithms of Deep Learning?
▪What is the trajectory of evolution of Deep Learning?
▪What are the future trends of Deep Learning?


Artificial Intelligence
Holy Grail of AI Research
▪Understanding the neuro-biological and neuro-

physical basis of human intelligence
▪science of intelligence
▪Building intelligent machines which can think and act
like humans
▪engineering of intelligence

Facets of AI Research
▪knowledge representation
▪Reasoning
▪natural language understanding
▪natural scene understanding

Facets of AI Research
▪natural speech understanding
▪problem solving
▪Perception
▪Learning
▪planning

Machine Learning
Basic Doctrine of Learning

▪learning from examples
Outcome of Learning
▪rules of inference for some predictive task
▪embodiment of the rules = model
▪model is an abstract computing device
•kernel machine, decision tree, neural
network

Machine Learning
Connotations of Learning
▪process of generalization
▪discovering nature/traits of data
▪unraveling patterns and anti-patterns in data

Machine Learning
Connotations of Learning:
▪knowing distributional characteristics of data
▪identifying causal effects and propagation
▪identifying non-causal co variations & correlations

Machine Learning
Design Aspects of Learning System

▪ Choose the training experience
▪ Choose exactly what is to be learned, i.e. the target function /

machine
▪ Choose objective function & optimality criteria
▪ Choose a learning algorithm to infer the target function from

the experience.

Learning Work Flow
▪Stage 1: Feature Extraction, Feature subset selection,

Feature Vector Representation
▪Stage 2: Training / Testing Set Creation and Augmentation
▪Stage 3: Training the Inference Machine
▪Stage 4: Running the Inference Machine on Test Set
▪Stage 5: Stratified Sampling and Validation

Feature Extraction / Selection
low-level parts
mid-level parts Cognitive Elements
high-level parts
additional descriptors
Domain Expert Corpus Knowledge Engineer
Sparse
Sparse Coder
Representation

Training Set Augmentation
Existing
Existing
training setSet
Training
Sparse
Representation Samples
Random
Sampler
Reviewer
Augmented
training set

Training and Prediction / Recognition
Adaptive Prediction /
Training
Recognition
Set Learner Model
Unlabelled Predicted /
Residual Corpus Recognized Corpus
Prediction /
Recognition
Model

Sampling , Validation & Convergence
Human
Predicted
Reviewed
Corpus Stratified sub- Reviewer Stratified sub-
samples samples
Stratified
Sampler
Precision &
Recall
Calculator
Go back to No Yes End of

Training Set Converged Relevance
Augmentation Scoring
?
Evolution of Connectionist Models
1943: Artificial neuron model (McCulloch & Pitts)
▪ "A logical calculus of the ideas immanent in nervous activity"
▪ simple artificial “neurons” could be made to perform basic

logical operations such as AND, OR and NOT
▪ known as Linear Threshold Gate
▪ NO learning


w1j
x1
w2j n
x2 s j   wij xi  b j y j  f (s j )
i 0
yj
wnj
xn
AILABS (c) Copyright 2018

bj 18
1957: Perceptron model (Rosenblatt)

▪ invention of learning rules inspired by ideas from
neuroscience
if Σ inputi * weighti > threshold, output = +1

if Σ inputi * weighti < threshold, output = -1
▪ learns to classify input into two output classes
▪ Sigmoid transfer function: boundedness, graduality
y  1 as x  
y  0 as x  

w1j
x1
w2j n
x2 s j   wij xi  b j y j  f (s j )
i 0
yj
wnj
1
s j
1 e
xn
AILABS (c) Copyright 2018

bj 20
1960s: Delta Learning Rule (Widrow & Hoff)

▪ Define the error as the
squared residuals
E 1
2  n n
n
( y  ˆ
y ) 2
summed over all training

cases:
E yˆ n En
▪ Now differentiate to get
wi
 1
2 n w yˆ
error derivatives for i n
weights
  xi ,n ( yn  yˆ n )
n
▪ The batch delta rule
changes the weights in
proportion to their error E
derivatives summed wi  
over all training cases wi
1969: Minsky's objection to Perceptrons
▪ Marvin Minsky & Seymour Papert: Perceptrons
▪ Unless input categories are linearly separable, a perceptron

cannot learn to discriminate between them.
▪ Unfortunately, it appeared that many important categories

were not linearly separable.

Perceptrons are good at linear classification but ...
x1 1
1
1
1
1
1 1
1
1
x2


Perceptrons are incapable of simple nonlinear classification like XOR
(1) x1 (1) (0)
X1 X2 Output
0 0 0
0 1 1
1 0 1
1 1 0
(0) (0) (1)
(XOR operation) (0) x2(1)

Universal Approximation Theorem
Existential Version (Kolmogorov)
▪ There exists a finite combination of superposition and

addition of continuous functions of single variables which can
approximate any continuous, multivariate function on
compact subsets of R^d.
Constructive Version (Cybenko)

▪ The standard multilayer feed-forward network with a single
hidden layer, containing finite number of hidden neurons, is
a universal approximator among continuous functions on
compact subsets of R^d, under mild assumptions on the
activation function.

1986: Backpropagation for Multi-Layer Perceptrons

(Rumelhart, Hinton & Williams)
▪ solution to Minsky's objection regarding perceptron's limitation
▪ nonlinear classification is achieved by fully connected, multilayer,

feedforward networks of perceptrons (MLP)
▪ MLP can be trained by backpropagation
▪ Two-pass algorithm
▪ forward propagation of activation signals from input to output
▪ backward propagation of error derivatives from output to input


Input Layer 1 Layer 2 Output

x1 y1
x2 y2
…
…
…
…
…
…
…
…
xN yM
Input
Hidden Output
Layer
Layer Layer


▪ solution to Minsky's objection regarding perceptron's limitation
▪ nonlinear classification is achieved by fully connected, multilayer,

feedforward networks of perceptrons (MLP)
▪ MLP can be trained by backpropagation
▪ Two-pass algorithm
▪ forward propagation of activation signals from input to output
▪ backward propagation of error derivatives from output to input

Machine Learning Example
Handwriting Digit Recognition
Machine
“2”

Input Output
x1 y1 y1
0.1 is 1
x2 y2 y2
0.7 is 2
The image
is “2”
…
…
…
…
y10 0.2
y1 is 0
x256
16 x 16 = 256
Color → 1 Each output represents the
No color → 0 confidence of a digit.

Example Application
x1 y1
x2 y2
Machine “2
…
…
…
…
”
x256 y10

1989: Convolutional Neural Network (LeCun)
neuron
Input Layer 1 Layer 2 Layer Output
x1 … L y1
…
x2 … y2
…
…
…
…
…
…
…
…
…
…
…
xN … yM
Input
… Output
Layer Hidden Layers Layer
Deep means many hidden layers

Convolutional Neural Network
▪ Input can have very high dimension.

▪ Using a fully-connected neural network would need a large
amount of parameters.
▪ CNNs are a special type of neural network whose hidden
units are only connected to local receptive field.
▪ The number of parameters needed by CNNs is much
smaller.
Example: 200x200 image

a)fully connected: 40,000
hidden units => 1.6 billion
parameters
b)CNN: 5x5 kernel (filter), 100
feature maps => 2,500
parameters

Convolution Operation
Patc
h

Convolution Operation in CNN
▪ Input: an image (2-D array): x
▪ Convolution kernel (2-D array of learnable parameters): w
▪ Feature map (2-D array of processed data): s
▪ Convolution operation in 2-D domains:

Convolution Filters

Convolution Operation with Filters

Convolution Layers
Convolution Layer
Channels Feature Maps

3 Stages of a Convolutional Layer

Non Linear Stage
Tanh(x) ReLU

2006: Deep Belief Networks (Hinton), Stacked Auto-Encoders
(Bengio)
neuron
Input Layer 1 Layer 2 Layer Output
x1 … L y1
…
x2 … y2
…
…
…
…
…
…
…
…
…
…
…
xN … yM
Input
… Output
Layer Hidden Layers Layer
Deep means man y hidden layers

Deep Learning
Traditional pattern recognition models use hand-crafted

features and relatively simple trainable classifier.
hand-crafted “Simple”
feature Trainable output
extractor Classifier
This approach has the following limitations:

• It is very tedious and costly to develop hand-crafted
features
▪ The hand-crafted features are usually highly dependents on
one application, and cannot be transferred easily to other
applications

Deep Learning
Deep learning = representation learning

Seeks to learn hierarchical representations (i.e. features)
automatically through multiple stage of feature learning process.
Low-level Mid-level High-level Trainable

features features features classifier output
Feature visualization of convolutional net trained on ImageNet (Zeiler and Fergus, 2013)

Learning Hierarchical Representations
Low-level Mid-level High-level Trainable

output
features features features classifier
Increasing level of abstraction
Hierarchy of representations with increasing level of abstraction.

Each stage is a kind of trainable nonlinear feature transformation
Image recognition
Pixel → edge → motif → part → object
Text
Character → word → word group → clause → sentence → story

Pooling
Common pooling operations:
Max pooling
Report the maximum output within a rectangular neighborhood.
Average pooling
Report the average output of a rectangular neighborhood (possibly
weighted by the distance from the central pixel).

CiFAR10
CiFAR10

Deep CNN on CiFAR10
Deep CNN on CiFAR10

Deep CNN on CiFAR10
Deep CNN on CiFAR10

Deep CNN on CiFAR10
Deep CNN on CiFAR10

Future Trends
▪ Different and wider range of problems are being

addressed
▪ natural language understanding
▪ natural scene understanding
▪ natural speech understanding
▪ Feature learning is being investigated at deeper level
▪ Manifold learning
▪ Reinforcement learning
▪ Integration with other paradigms of machine learning

Thank You

Deep Learning - Evolution and Future Trends

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Deep Learning - Evolution and Future Trends

Hochgeladen von

Copyright:

Verfügbare Formate

Lecture Series: AI is the New Electricity

Deep Learning - SCOPING, EVOLUTION & FUTURE TRENDS

Dr. Chiranjit Acharya

▪Cutting edge technology

AILABS (c) Copyright 2018 2

▪What is Deep Learning? Where is the “deepness”?

▪Where does Deep Learning come from?

▪What are the models and algorithms of Deep Learning?

▪What is the trajectory of evolution of Deep Learning?

▪What are the future trends of Deep Learning?

AILABS (c) Copyright 2018 3

AILABS (c) Copyright 2018 4

Holy Grail of AI Research

▪Understanding the neuro-biological and neuro-

AILABS (c) Copyright 2018 5

AILABS (c) Copyright 2018 6

AILABS (c) Copyright 2018 7

Basic Doctrine of Learning

AILABS (c) Copyright 2018 8

▪discovering nature/traits of data

▪unraveling patterns and anti-patterns in data

AILABS (c) Copyright 2018 9

▪knowing distributional characteristics of data

▪identifying causal effects and propagation

▪identifying non-causal co variations & correlations

AILABS (c) Copyright 2018 10

Design Aspects of Learning System

▪ Choose exactly what is to be learned, i.e. the target function /

▪ Choose objective function & optimality criteria

▪ Choose a learning algorithm to infer the target function from

AILABS (c) Copyright 2018 11

▪Stage 1: Feature Extraction, Feature subset selection,

▪Stage 2: Training / Testing Set Creation and Augmentation

▪Stage 3: Training the Inference Machine

▪Stage 4: Running the Inference Machine on Test Set

▪Stage 5: Stratified Sampling and Validation

Domain Expert Corpus Knowledge Engineer

AILABS (c) Copyright 2018 13

AILABS (c) Copyright 2018 14

AILABS (c) Copyright 2018 15

Go back to No Yes End of

1943: Artificial neuron model (McCulloch & Pitts)

▪ "A logical calculus of the ideas immanent in nervous activity"

▪ simple artificial “neurons” could be made to perform basic

▪ known as Linear Threshold Gate

AILABS (c) Copyright 2018 17

1943: Artificial neuron model (McCulloch & Pitts)

AILABS (c) Copyright 2018

1957: Perceptron model (Rosenblatt)

if Σ inputi * weighti > threshold, output = +1

AILABS (c) Copyright 2018 19

1943: Artificial neuron model (McCulloch & Pitts)

AILABS (c) Copyright 2018

1960s: Delta Learning Rule (Widrow & Hoff)

summed over all training

1969: Minsky's objection to Perceptrons

▪ Marvin Minsky & Seymour Papert: Perceptrons

▪ Unless input categories are linearly separable, a perceptron

▪ Unfortunately, it appeared that many important categories

AILABS (c) Copyright 2018 22

AILABS (c) Copyright 2018 23

1969: Minsky's objection to Perceptrons

(1) x1 (1) (0)

AILABS (c) Copyright 2018 24