Deep Learning Cookbook

Deep Learning – a cookbook view
or “Comparative Analysis of Different Deep Learning Solutions “

The evolution of artificial intelligence Massive unstructured big data
Deep Learning
– Unsupervised training
– Generic code
– Pattern recognition
Systems can
– Observe
– Test
– Refine
Massive structured data sets Successes

– AlphaGO First Computer GO program to
Small data sets Machine learning beat a human
– Deep Blue Beating World Chess – Deep Face Facial verification
Early artificial intelligence Champion Kasparov
– Libratus AI Poker App
– ENIAC Heralded the Giant Brain“;
– DARPA Challenge Autonomous – Digital virtual assistants Siri
used for WW II ballistics
vehicle drove 132 miles – Google Self-driving cars
– Industrial robots
Statistical and mathematical models Predictive models defined by
Advanced Analytics and Heuristic
applied to solve problems machines based neural networks
1940 – 1980 1990 – 2000s Today

2
Traditional machine learning
Requires feature engineering Artificial
Intelligence
Machine
Learning
Machine learning Deep

Training Data Feature engineering
algorithm Learning
Training
Prediction
HPC
Learned model
Data Feature extraction Prediction
(prediction function)
3
Deep learning
Efficient data representations, no more feature engineering
Deep learning
Training Data
algorithm
Training
Prediction (inference)
Learned model
Data (transformation and Prediction
prediction function)
4
Types of artificial neural networks
Topology to fit data characteristics
Convolutional: Fully connected: Recurrent:
Images Speech, text, sensor Speech, text, sensor
Hidden Hidden Hidden Hidden Hidden

Input Output Input
Layer 1 Layer 2 Layer 1 Layer 2 Output Input
Layer 1
Output
5
Terminology
Flower
Epoch Batch Predictions
Model House
Errors
Iteration True labels
Training data
Worker 1 Worker 2 Worker 1 Worker 2

Worker 1 Strong scaling Weak scaling
6
Why deep learning?
Applications
Vision Speech Text Other

‒ Search & information ‒ Interactive voice ‒ Search and ranking ‒ Recommendation
extraction response (IVR) systems ‒ Sentiment analysis engines
‒ Security/Video ‒ Voice interfaces (Mobile, ‒ Machine translation ‒ Advertising
surveillance Cars, Gaming, Home) ‒ Question answering ‒ Fraud detection
‒ Self-driving cars ‒ Security (speaker ‒ AI challenges
‒ Medical imaging identification) ‒ Drug discovery
‒ Robotics ‒ Health care ‒ Sensor data analysis
‒ People with disabilities ‒ Diagnostic support
7
Applications break down
Images Image analysis
Video Video surveillance Detection

Look for a known object/pattern
Speech Speech recognition Generation

Generate content
Classification
Text Sentiment analysis Assign a label from a predefined set of
labels
Sensor Predictive maintenance Anomaly detection

Look for abnormal, unknown patterns
Other Fraud detection
8
How an individual customer’s AI evolves
Explore Experiment Scale up and Optimize

How can AI help me? How can I get started? How can I scale and optimize?
Do things better Boundary constraints Provisioning for inference

– Product development (regulations, etc.)
– Customer experience Infrastructure scale up
– Productivity Data – Training
– Employee experience Data model? Location? – Inference
How to create a model? – On-prem / cloud / hybrid
Do new things – Homegrown solution or open source?
– New disruptions – Simple ML or scalable DL? Data management
– Between edge and core
Design – Security
How to design and deploy the PoC? – Updates
– On-prem, cloud? – Regulations
– How to think about inference – Tracing
Performance
What is the best config to run?
How to tune the model to improve
accuracy?
9
Key IT challenges are constraining deep learning adoption
Limited knowledge, resources and capabilities
How to get started? How to go to production? How to optimize?
“I need simple, infrastructure and “I could use more expert advice and “I need help integrating the latest
software capabilities to rapidly and tailored solutions for migrating and technologies into my deep learning
efficiently support deep learning integrating apps in a production environment to accelerate
app development.” environment.” actionable insights.”
Immature, sub-optimal Inability to scale Lack of technology

foundation and integrate integration capabilities
Content under embargo until Oct 10, 2017 10

What about AI consumers ?
Do it yourself How do I do it ? I know better
Current wave of AI / Could benefit from better data Super-Experts – current

Machine Learning is core to science, machine learning, but wave is woefully inadequate
their business. All in-house it is not historically their core-
competency
Google, Baidu, Facebook, Banks, advertisers, Government – DoD, DoE,

Microsoft, Apple, etc. healthcare, manufacturing, NSA, NASA, etc.
food, automotive, etc.
Not ready for an ASIC. Don’t know what Begging for higher performance ASICs.
they need exactly. Many still developing Know exactly what they want to do.
on CPUs. Can’t use solutions that can’t Strong technology pull.
be verified or understood
Where to start ?
Recommend DL stack by vertical application
Verticals Voice interfaces Social media Manufacturing Oil & gas Connected cars
Data type Speech Images Video Sensor data
Data Small Moderate Large
Typical layers Convolutional Fully-connected Recurrent … Neural Network sits here
Frameworks TensorFlow Caffe 2 CNTK Torch …

Infrastructure x86 GPUs FPGAs TPU ? …
12
Neural Network : Popular Networks
Model size Model size GFLOPs

Network
(# params) (MB) (forward pass)
AlexNet 60,965,224 233 MB 0.7
GoogleNet 6,998,552 27 MB 1.6
VGG-16 138,357,544 528 MB 15.5
VGG-19 143,667,240 548 MB 19.6
ResNet50 25,610,269 98 MB 3.9
ResNet101 44,654,608 170 MB 7.6
ResNet152 60,344,387 230 MB 11.3
13
Today’s scale
Model size, data size, compute requirements
Application Model Training data FLOPs per epoch

Vision 1.7 * 109 14*106 images 6*1.7*109*14*106
~6.8 GB ~2.5 TB (256x256) ~1.4*1017
~10 TB (512x512)
Speech 60 * 106 100K hours of audio 6*60*106*34*109
~240 MB ~34*109 frames ~1.2*1019
~50 TB
Text 6.5 * 106 856*106 words 6*6.5*106*856*106
~260 MB ~3.3*1016
Signals 1.2 * 106 3*106 frames 6*1.2*3*106*3*106

~4.8 MB 6.5*1013
Today’s hardware
Model size, data size, compute requirements
Application Model Training data FLOPs per epoch

Vision 1.7 * 109 14*106 images 6*1.7*109*14*106
~6.8 GB ~2.5 TB (256x256) ~1.4*1017
~10 TB (512x512)
1 epoch per hour:

~39 TFLOPS
Today’s hardware:
Google TPU2: 180 TFLOPS Tensor ops (FP16 ??)
NVIDIA Tesla V100: 15 TFLOPS SP (30 TFLOPS FP16 , 120 TFLOPS Tensor ops), 12 GB memory
NVIDIA Tesla P100: 10.6 TFLOPS SP, 16 GB memory
NVIDIA Tesla K40: 4.29 TFLOPS SP, 12 GB memory
NVIDIA Tesla K80: 5.6 TFLOPS SP (8.74 TFLOPS SP with GPU boost), 24 GB memory
INTEL Xeon Phi: 2.4 TFLOPS SP
Superdome X: ~21 TFLOPS SP, 24 TB memory

So what to recommend?
Software
Hardware
16
Building performance models
Alex Net
TensorFlow Hardware
GoogleNet Scalable, automated real-time

Worker 1 Worker 2 intelligence
Caffe 2 Strong scaling
VGG-16, VGG -19
Tensor RT
ResNet 50, 101,152 Worker 1 Worker 2
Populated with 8 GPUs
BVLC Caffe Weak scaling
Eng Acoustic Model
17
TensorFlow – Weak Scaling – Training – Different models perfromance
in Tensor Flow . Scaling up to 8 GPUs
Speedup for up to 8 GPUs
8
0
1 2 4 8
DeepMNIST EngAcousticModel GoogleNet ResNet101 ResNet152
ResNet50 SensorNet VGG16 VGG19
18
TensorFlow - Inference ( Inferences per Second) - Different Models
witth different Batch
DeepMNIST
numbers GoogleNet
4500
350000 4000
300000 3500
250000 3000
2500
200000
2000
150000 1500
100000 1000
500
50000
HOW TO ANALYZE ALL THE DIFFERENT NUMBERS .
0
0
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
1 2 4 8
1 2 4 8
AS WE ADD MORE
ResNet50 OPTIONS and MORE TECHNOLOGIES
VGG19 IT
WOULD BE IMPOSSIBLE TO USE
1800
1000
1600
900
1400
800
1200
700
1000 600
800 500
600 400
400 300
200 200
0 100
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 0
1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192
1 2 4 8
1 2 4 8
19
HPE demystifies deep learning for faster intelligence across all organizations
New IT expertise, blueprints and technologies to get started, scale, integrate and optimize
Get started rapidly: Scale and Integrate: Deliver Optimize Environment:

Develop deep learning models attractive returns Enhance competitive advantage
IT expertise and solutions Proven blueprints and services Technology integration capabilities
to “get started” with deep learning models for “scalable” production deployments to maximize performance
Expertise Proven Blueprints Integration capabilities

− Rapid technology selection guides − Reference Architectures − Enhanced global Centers of Excellence
− State of the art training − Innovation labs for best practices − Next gen technology integration
Solutions Services
− Integrated purpose-built solutions − Deploy, integrate and support
− Out of the box solutions − Flexible, on-demand capacity
20
Get
Started
Select ideal technology configurations

with HPE Deep Learning Cookbook
“Book of recipes” for deep Expert advice to Availability of

learning workloads get you started complete toolset
− Comprehensive tool set based on extensive − Informed decision making - optimal − Deep Learning Benchmarking Suite:
benchmarking hardware and software configurations available on GitHub Dec 2018
− Includes 11 workloads with 8 DL − Eliminates the “guesswork” - validated − Deep Learning Performance
frameworks and 8 HPE hardware systems methodology and data Analysis Tool: planned to be released in
− Estimates workload performance and − Improves efficiency - detects bottlenecks the beginning of 2018.
recommends an optimal HW/SW stack for in deep learning workloads − Reference configurations: available
that workload soon on HPE.com website
21
Deep Learning Cookbook helps to pick the right HW/SW stack
Knowledgebase
Benchmarking Suite Reporting tool
Performance results • Performance results
• Benchmarking scripts
• 11 reference models • Performance prediction for arbitrary
• Reference models
• 8 frameworks ANNs
• Performance metrics • 8 hardware systems
• Scalability prediction
• Optimal HW/SW configuration
for a given workload
Performance and
scalability models
Reference configurations
• Machine learning (SVR) • Image classification
to predict performance
• Others to come
of core operations
will be available externally • Analytical communication
models
internal assets
• Analytical models for overall
performance
22
23
Thank you
Natalia Vassilieva Sorin Cheran
nvassilieva@hpe.com sorin.cheran@hpe.com
Sergey Serebryakov Bruno Monnet

sergey.serebryakov@hpe.com bruno.monnet@hpe.com
24

Deep Learning Cookbook

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Deep Learning Cookbook

Hochgeladen von

Copyright:

Verfügbare Formate

Deep Learning – a cookbook view

or “Comparative Analysis of Different Deep Learning Solutions “

Massive structured data sets Successes

1940 – 1980 1990 – 2000s Today

Machine learning Deep

Hidden Hidden Hidden Hidden Hidden

Worker 1 Worker 2 Worker 1 Worker 2

Vision Speech Text Other

Images Image analysis

Video Video surveillance Detection

Speech Speech recognition Generation

Sensor Predictive maintenance Anomaly detection

Other Fraud detection

Explore Experiment Scale up and Optimize

Do things better Boundary constraints Provisioning for inference

Immature, sub-optimal Inability to scale Lack of technology

Content under embargo until Oct 10, 2017 10

Do it yourself How do I do it ? I know better

Current wave of AI / Could benefit from better data Super-Experts – current

Google, Baidu, Facebook, Banks, advertisers, Government – DoD, DoE,

Data type Speech Images Video Sensor data

Data Small Moderate Large

Typical layers Convolutional Fully-connected Recurrent … Neural Network sits here

Frameworks TensorFlow Caffe 2 CNTK Torch …

Model size Model size GFLOPs

Application Model Training data FLOPs per epoch

Signals 1.2 * 106 3*106 frames 6*1.2*3*106*3*106

Application Model Training data FLOPs per epoch

1 epoch per hour:

Superdome X: ~21 TFLOPS SP, 24 TB memory

GoogleNet Scalable, automated real-time

Eng Acoustic Model

Get started rapidly: Scale and Integrate: Deliver Optimize Environment:

Expertise Proven Blueprints Integration capabilities

Select ideal technology configurations

“Book of recipes” for deep Expert advice to Availability of

Sergey Serebryakov Bruno Monnet

Das könnte Ihnen auch gefallen

Signals 1.2 * 106 3106 frames 61.231063106