108 views

Uploaded by tintojames

MIT Introduction to Deep Learning Course Lecture 1 Slides

- Back Propagation Algorithm in Verilog
- Deep Learning (2016.09.23-00.17.14Z)
- Artificial Neural Network
- 2012-1158. Backpropagation NN.ppt
- SOFT COMPUTING ASSIGNMENT
- 091108
- Nural in Matlab
- 968
- 10_2_90_Microcontroller Based Neural Network Controlled Low Cost Autonomous Vehicle
- Sensor Data Wearable Dvice
- Backprop as Functor- A Compositional Perspective on Supervised Learning
- Allotment Results ACA
- Electricity Price Forecasting Using ELM-Tree Approach
- Image Classification With DIGITS Hryu
- A Survey Disease Detection Mechanism for Cotton Leaf: Training & Precaution Based Approach
- 9A15705 Soft Computing
- Neural Networks Based Forecasting of Electricity Markets
- vibration data time domain
- IRJET-Gait Analysis using Neural Networks
- 2011-Logarithmic Multiplier in Hardware Implementation of Neural Networks

You are on page 1of 114

Nick Locascio

2016: year of deep learning

MIT 6.S191 | Intro to Deep Learning | IAP 2017

Deep Learning Success

- Image Classification

Machine Translation

Speech Recognition

Speech Synthesis

Game Playing

Deep Learning Success

- Image Classification

Machine Translation

Speech Recognition

Speech Synthesis

Game Playing

Better than

and many, many more AlexNet humans

Krizhevsky, Sutskever, Hinton 2012

Deep Learning Success

Image Classification

- Machine Translation

Speech Recognition

Speech Synthesis

Game Playing

.

and many, many more

Deep Learning Success

Image Classification

Machine Translation

- Speech Recognition

Speech Synthesis

Game Playing

.

and many, many more

Deep Learning Success

Image Classification

Machine Translation

Speech Recognition

- Speech Synthesis

Game Playing

.

and many, many more

Deep Learning Success

Image Classification

Machine Translation

Speech Recognition

Speech Synthesis

- Game Playing

.

and many, many more

Deep Learning Success

Image Classification

Machine Translation

Speech Recognition

Speech Synthesis

- Game Playing

.

and many, many more

6.S191 Goals

1. Fundamentals

2. Practical skills

3. Up to speed on current state of the field

4. Foster an open and collaborative deep learning community within MIT

and development.

Class Information

1 week, 5 sessions

P/F, 3 credits

2 TensorFlow Tutorials

In-class Monday + Tuesday

1 Assignment: (more info in a few slides)

Typical Schedule

10:30am-11:15am Lecture #1

11:15am-12:00pm Lecture #2

12:00pm-12:30pm Coffee Break

12:30pm-1:30pm Tutorial / Proposal Time

Assignment Information

1 Assignment, 2 options:

Present a novel deep learning research idea or application

OR

Write a 1-page review of a deep learning paper

Option 1: Novel Proposal

Proposal Presentation

Groups of 3 or 4

Present a novel deep learning research idea or application

1 slide, 1 minute

List of example proposals on website: introtodeeplearning.com

Presentations on Friday

Submit groups by Wednesday 5pm to be eligible

Submit slide by Thursday 9pm to be eligible

Option 2: Paper Review

Write a 1-page review of a deep learning paper

Suggested papers listed on website introtodeeplearning.com

We will read + grade based on clarity of writing and technical

communication of main ideas.

Class Support

Piazza: https://piazza.com/class/iwmlwep2fnd5uu

Course Website: introtodeeplearning.com

Lecture slides: introtodeeplearning.com/schedule

Email us: introtodeeplearning-staff@mit.edu

OH by request

Staff: Lecturers

Staff: TA + Admin

Our Fantastic Sponsors!

Why Deep Learning and why now?

Why Deep Learning?

Why Now?

1. Large Datasets

2. GPU Hardware Advances + Price Decreases

3. Improved Techniques

Fundamentals of Deep Learning

The Perceptron

1. Invented in 1954 by Frank Rosenblatt

2. Inspired by neurobiology

The Perceptron

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Forward Pass

inputs weights sum non-linearity

Activation Function

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Sigmoid Activation

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Common Activation Functions

Importance of Activation Functions

Activation functions add non-linearity to our networks function

Most real-world problems + data are non-linear

Perceptron Forward Pass

inputs weights sum non-linearity

2

0.1

3 0.5

2.5 output

-1

0.2

5

3.0

bias

Perceptron Forward Pass

inputs weights sum non-linearity

(2*0.1) + 2

0.1

(3*0.5) + 3 0.5

2.5 output

(-1*2.5) + -1

0.2

(5*0.2) + 5

3.0

1

(1*3.0)

) bias

Perceptron Forward Pass

inputs weights sum non-linearity

2

0.1

3 0.5

2.5 output

-1

0.2

5

3.0

bias

How do we build neural networks

with perceptrons?

Perceptron Diagram Simplified

inputs weights sum non-linearity

x0

w

0

x1 w

1

w2 output

x2

wn

xn

b

bias

Perceptron Diagram Simplified

inputs output

x0

x1

o0

x2

xn

Multi-Output Perceptron

Input layer output layer

x0

x1 o0

x2 o1

xn

Multi-Layer Perceptron (MLP)

layer layer layer

h0

x0

h1 o0

x1

h2 on

xn

hn

Multi-Layer Perceptron (MLP)

layer layer layer

h0

x0

h1 o0

x1

h2 on

xn

hn

Deep Neural Network

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Applying Neural Networks

Example Problem: Will my Flight be Delayed?

Example Problem: Will my Flight be Delayed?

Temperature: -20 F

Example Problem: Will my Flight be Delayed?

[-20, 45]

Example Problem: Will my Flight be Delayed?

h0

x0

[-20, 45] h1 o0

x1

h2

Example Problem: Will my Flight be Delayed?

h0

x0

Predicted: 0.05

[-20, 45] h1 o0

x1

h2

Example Problem: Will my Flight be Delayed?

h0

x0

Predicted: 0.05

[-20, 45] h1 o0

x1 Actual: 1

h2

Quantifying Loss

h0

x0

Predicted: 0.05

[-20, 45] h1 o0

x1 Actual: 1

h2

Predicted Actual

Total Loss

Input h0 Predicted Actual

[ x0 [ [

[-20, 45], 0.05 1

h1 o0

[80, 0], 0.02 0

[4, 15], x1 0.96 1

[45, 60], h2 0.35 1

] ] ]

Predicted Actual

Total Loss

Input h0 Predicted Actual

[ x0 [ [

[-20, 45], 0.05 1

h1 o0

[80, 0], 0.02 0

[4, 15], x1 0.96 1

[45, 60], h2 0.35 1

] ] ]

Predicted Actual

Binary Cross Entropy Loss

Input h0 Predicted Actual

[ x0 [ [

[-20, 45], 0.05 1

h1 o0

[80, 0], 0.02 0

[4, 15], x1 0.96 1

[45, 60], h2 0.35 1

] ] ]

Mean Squared Error (MSE) Loss

Input h0 Predicted Actual

[ x0 [ [

[-20, 45], 10 40

h1 o0

[80, 0], 45 42

[4, 15], x1 100 110

[45, 60], h2 15 55

] ] ]

Predicted Actual

Training Neural Networks

Training Neural Networks: Objective

Training Neural Networks: Objective

loss function

Training Neural Networks: Objective

Loss is a function of the models parameters

How to minimize loss?

How to minimize loss?

Compute:

How to minimize loss?

of gradient to new point

How to minimize loss?

of gradient to new point

+

+

How to minimize loss?

Repeat!

This is called Stochastic Gradient Descent (SGD)

Repeat!

Stochastic Gradient Descent (SGD)

Initialize randomly

For N Epochs

For each training example (x, y):

Stochastic Gradient Descent (SGD)

Initialize randomly

For N Epochs

For each training example (x, y):

Stochastic Gradient Descent (SGD)

Initialize randomly

For N Epochs

For each training example (x, y):

MIT 6.S191 | Intro to Deep Learning | IAP 2017

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Calculating the Gradient: Backpropagation

W1 W2

x0 h0 o0 J( )

Training Neural Networks In Practice

Loss function can be difficult to optimize

Loss function can be difficult to optimize

Update Rule:

Loss function can be difficult to optimize

Update Rule:

Learning Rate & Optimization

Small Learning Rate

Learning Rate & Optimization

Large learning rate

How to deal with this?

1. Try lots of different learning rates to see what is just right

How to deal with this?

1. Try lots of different learning rates to see what is just right

2. Do something smarter

How to deal with this?

1. Try lots of different learning rates to see what is just right

2. Do something smarter : Adaptive Learning Rate

Adaptive Learning Rate

Learning rate is no longer fixed

Can be made larger or smaller depending on:

how large gradient is

how fast learning is happening

size of particular weights

etc

Adaptive Learning Rate Algorithms

ADAM

Momentum

NAG

Adagrad

Adadelta

RMSProp

Escaping Saddle Points

Escaping Saddle Points

Training Neural Networks In Practice 2:

MiniBatches

Why is it Stochastic Gradient Descent?

Initialize randomly

Only an estimate of

For N Epochs true gradient!

For each training example (x, y):

Minibatches Reduce Gradient Variance

Initialize randomly More accurate

For N Epochs estimate!

For each training batch {(x0, y0), , (xB, yB)}:

Advantages of Minibatches

More accurate estimation of gradient

Smoother convergence

Allows for larger learning rates

Minibatches lead to fast training!

Can parallelize computation + achieve significant speed increases on GPUs

Training Neural Networks In Practice 3:

Fighting Overfitting

The Problem of Overfitting

Regularization Techniques

1. Dropout

2. Early Stopping

3. Weight Regularization

4. ...many more

Regularization I: Dropout

During training, randomly set some activations to 0

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Regularization I: Dropout

During training, randomly set some activations to 0

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Regularization I: Dropout

During training, randomly set some activations to 0

Typically drop 50% of activations in layer

Forces network to not rely on any 1 node

Input hidden output

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Regularization I: Dropout

During training, randomly set some activations to 0

Typically drop 50% of activations in layer

Forces network to not rely on any 1 node

Input hidden output

layer layers layer

h0 h0

x0

h1 h1 o0

x1 ...

h2 h2 on

xn

hn hn

Regularization II: Early Stopping

Dont give the network time to overfit

...

Epoch 15: Train: 85% Validation: 80%

Epoch 16: Train: 87% Validation: 82%

Epoch 17: Train: 90% Validation: 85%

Epoch 18: Train: 95% Validation: 83%

Epoch 19: Train: 97% Validation: 78%

Epoch 20: Train: 98% Validation: 75%

Regularization II: Early Stopping

Dont give the network time to overfit

...

Epoch 15: Train: 85% Validation: 80%

Stop here!

Epoch 16: Train: 87% Validation: 82%

Epoch 17: Train: 90% Validation: 85%

Epoch 18: Train: 95% Validation: 83%

Epoch 19: Train: 97% Validation: 78%

Epoch 20: Train: 98% Validation: 75%

Regularization II: Early Stopping

Stop here!

Regularization III: Weight Regularization

Large weights typically mean model is overfitting

Add the size of the weights to our loss function

Perform well on task + keep weights small

Regularization III: Weight Regularization

Large weights typically mean model is overfitting

Add the size of the weights to our loss function

Perform well on task + keep weights small

Regularization III: Weight Regularization

Large weights typically mean model is overfitting

Add the size of the weights to our loss function

Perform well on task + keep weights small

Regularization III: Weight Regularization

Large weights typically mean model is overfitting

Add the size of the weights to our loss function

Perform well on task + keep weights small

Core Fundamentals Review

Perceptron Classifier

Stacking Perceptrons to form neural networks

How to formulate problems with neural networks

Train neural networks with backpropagation

Techniques for improving training of deep neural networks

Questions?

- Back Propagation Algorithm in VerilogUploaded byMMO
- Deep Learning (2016.09.23-00.17.14Z)Uploaded byAndres Tuells Jansson
- Artificial Neural NetworkUploaded bySelva Kumar
- 2012-1158. Backpropagation NN.pptUploaded byveena
- SOFT COMPUTING ASSIGNMENTUploaded byAkshit Singla
- 091108Uploaded byvol1no2
- Nural in MatlabUploaded byRengha Raju
- 968Uploaded bytufan85
- 10_2_90_Microcontroller Based Neural Network Controlled Low Cost Autonomous VehicleUploaded byDipesh Agrawal
- Sensor Data Wearable DviceUploaded byAkash Gupta
- Backprop as Functor- A Compositional Perspective on Supervised LearningUploaded byJan Hula
- Allotment Results ACAUploaded byrahul2057210
- Electricity Price Forecasting Using ELM-Tree ApproachUploaded byIRJET Journal
- Image Classification With DIGITS HryuUploaded byBreno Brito Miranda
- A Survey Disease Detection Mechanism for Cotton Leaf: Training & Precaution Based ApproachUploaded byEditor IJRITCC
- 9A15705 Soft ComputingUploaded bysivabharathamurthy
- Neural Networks Based Forecasting of Electricity MarketsUploaded byMahesh Reddy
- vibration data time domainUploaded byArun Peethambaran
- IRJET-Gait Analysis using Neural NetworksUploaded byIRJET Journal
- 2011-Logarithmic Multiplier in Hardware Implementation of Neural NetworksUploaded bybulicp
- Produccion de UreaUploaded byIlirea
- 401-1287-1-PBUploaded byبورنان محمد
- PROECO4494(AuthorPersonalCopy)Uploaded byRaul6669
- Comparison of Particle Swarm Optimization and Backpropagation AsUploaded byJohn
- Neural Network ModelsUploaded byCamelia Ignat
- Deep Neural Network Approximation for Custom Hardware_Where We've Been, Where We're Going.pdfUploaded byNallani Bhaskar
- rmh test.pdfUploaded bySalman Zia
- Neural Network Simulation at Warp Speed _ How We Got 17 Million cUploaded byAntonio Marcegaglia
- Speaker AuthenticationUploaded bysudeendra
- Malware Classification Using Deep Convolutional Neural NetworksUploaded byresplandor

- Ifw Dd 2016 Machine Learning FinalUploaded bymosqi
- a-gentle-introduction-to-blockchain-technology-web.pdfUploaded byFiroz
- Lecture 0Uploaded bytintojames
- Screw CompressorsUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 3 Slides - Computer VisionUploaded bytintojames
- Uda City Capstone Project Open a i GymUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 6 Slides - Deep Reinforcement LearningUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 4 Slides - Computer Vision Deep Generative ModelsUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 5 Slides - Multimodal LearningUploaded bytintojames
- MIT Introduction to Deep Learning Course Lecture 2 Slides - Sequence ModelingUploaded bytintojames
- 284398301-Cambridge-Primary-I-dictionary-3-Workbook.pdfUploaded bytintojames
- Ultimate Skills Checklist for Your First Data Analyst JobUploaded byalhullandi
- Time Series Data Basics With Pandas PartUploaded bytintojames
- Module 1 - Intro to Data Science.pptxUploaded bytintojames
- International Secondary Catalogue 2017Uploaded bytintojames
- Cambridge International Examinations Catalogue 2017Uploaded bytintojames
- 2017 Local CatalogueUploaded bytintojames
- International Primary Catalogue 2017Uploaded bytintojames
- Eclipse IoT White Paper - The Three Software Stacks Required for IoT ArchitecturesUploaded bytintojames
- Knowledge Discovery in Data: A Case StudyUploaded bytintojames
- 32215,11ND-LigganUploaded byAbdan Shidqi
- Altintas Bdc14 Invitedtalk Dec8th2014 141208092953 Conversion Gate02Uploaded bytintojames
- Ifw Deep Dive R-quick GuideUploaded bytintojames
- Iot Analytics Infoworld Deep Dive 0515Uploaded bytintojames
- Streams and Storm April 2014 FinalUploaded bytintojames
- visualapi sparkUploaded byplbrook
- Spark Internals Architecture 21Nov15Uploaded bytintojames
- A Deeper Understanding of Spark Internals Aaron DavidsonUploaded bytintojames
- Kaggle Ensembling GuideUploaded bytintojames

- EEG PsikiatriUploaded byJancolin Yani
- Postexercise Hypotension Central Mechanisms.Uploaded byItalo Rocha
- Fetal Alcohol SyndromeUploaded byMerlin Allen
- paper fo real 1Uploaded byapi-402457653
- Pre School Report Card 2006 07Uploaded byanon-385406
- Effectiveness of Multisensory Stimulation in ManagingUploaded byIrena
- nsw164Uploaded byJoe
- ArenUploaded byJeric Aren Ordoñez Dedicatoria
- Biopsychosocial Formulation PocketmodUploaded byluming7
- Books on ST Sept 2012Uploaded bypincopallino11
- OB MODEL-1Uploaded bythompsonahb
- Brain Function and TheoryUploaded byshasha
- Perfectionism the Crucible of Giftedness-SILVERMANUploaded bysupabobby
- Ilmiah JR Dg Dr DianUploaded byYulia Merita Sahdilla Putri
- 2005 - Psychosis Seminars - An Unconventional Approach - Psychiatric Services ,, 56, 11 - 1441-1443Uploaded byapi-3805453
- competente_engleza_2013Uploaded byAlex Radu
- PKC activators enhance GABAergic neurotransmission and paired-pulse facilitation in hippocampal CA1 pyramidal neuronsUploaded byAnonymous BOEbvxUqB
- ConflictUploaded byRahim Ejaz
- Leading-Edge NeuroscienceUploaded bygabyk68
- brain stem infarction.pdfUploaded byaldodong
- uwrt research paperUploaded byapi-302980447
- Nature Neuroscience November 2005.pdfUploaded bymonomaniahomicida
- Who Focus GroupUploaded bySaffa Abid
- Reference Pyromania Dsm VUploaded byFadli Amali
- Learning Environment and Teaching StrategiesUploaded byJovie Masongsong
- Carrell 1Uploaded byoananichi7386
- Michael Hall Movie MindUploaded byRaffaele Iannuzzi
- Rogers & McClelland (2004) - Semantic Cognition a Parallel Distributed Processing ApproachUploaded byjonas1808
- universal screenersUploaded byapi-360754295
- 3 Domains of LearningUploaded byRose Marie Hermosa