Beruflich Dokumente
Kultur Dokumente
using
Deep Learning
• Models need to be
• Fast -> low latency to react quickly
• Small -> to run on embedded systems
Autonomous driving
• Benefits of autonomy in driving
• Save lives (~1 million people dies every year: human recklessness (DUI, distracted,…)
• Mobility is increased
• Labor-cost savings
• Reduction in carbon dioxide (CO2) emissions by optimized driving.
• Personalized transportation
Autonomous Driving
SEDRIC is the first concept car from the Volkswagen
Group, the prototype of an autonomous vehicle
6
Intelligent System
Communicate
Predict intent
intent
12
Predict pedestrian intentions
• Need to recognize each pedestrian’s activity over time (in video context)
…
Gesture/Activity
Recognition
Keypoints
(skeleton) Intent
Estimation Prediction
15
Keypoints Detection
16
Approach: Top-down
• Top-down
• Detect a human first
• Run pose detection model on each person detected
• Slightly better accuracy, at the cost of latency for crowded scenes
17
Approach: Bottom-up
• Bottom-up
• Find all joints/keypoints in image
• Post-process joints to separate each person from another
• Real-time capable for crowded scenes
18
Multi-Person Pose Estimation using Part Affinity Fields [Cao, Zhe 2017]
19
Approach
• Jointly Learning Parts Detection and Parts Association
21
Pose methods Comparison
• For the realtime purpose one of the best approach is OpenPose
• post-processing is needed ; quadratic in terms of scaling to the number of people
• Inference time is mostly spent on the CNN
• Even for large number of people (~20), the post-processing time is insignificant.
22
San Francisco
23
San Francisco
24
Munich
blurred/anonymized image:
25
Open Pose Demo
Compression
• Modular buildups
• Several CPU, and GPUs
• No space for customer
Compression
• Compress Deep Neural Networks to make faster predictions on embedded hardware and reduce memory
footprint
Cars
Pedestrians Data
Collection
Bikes
Corner Case Semantic
Example: Detection Segmentation
Object detection only Perception
10 fps
Path Planning
Data
Processing
Multiple cameras, LiDAR, CAN
29
Deep Network Compression
• Memory efficient architectures
TensorRT from Nvidia
• Rethinking the design of the architecture (Squeeze-net)
• Platform dependent compression
• Not supporting custom layers
• Needs specific hardware
30
Convolutional Neural Network
Network Structure
Process
• Read Sensor data
• Feature extraction and fusion
• Link parts together
Unstructured pruning [Han2015]
channels
Remove channel slices
by importance
Benchmark
Multiple pruning rounds: compress model by [5, 10, …, 50] %
• Pruning rounds (10, 100, 1) one complete loop
(prune + retrain + evaluation)
• Filters to prune in each round (93, 9, 930)
• Pruning batch size (32)
• Number of pruning steps (4)
10x
Different pruning strategies
0. Baseline (no retrain)
1. Retrain after each pruning round
1 Prune Retrain
93 channels
10 rounds with 93 channels
100x
9 channels
1x
3. Retrain once at end
1 round with 930 channels
3 Prune Retrain
930 channels
Results: Structured Fisher pruning (10 rounds, no retraining)
0.25 0.6
0.2
0.4
0.15
0.1 0.2
0.05 0
0 1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11
% compression % compression
fulleval/baseline_prune_10x93
Results: Structured Fisher pruning (10 rounds, with retraining)
0.6 Loss [abs]
0.5
0.4
0.3
0.2
0.1
0 % compression
1 2 3 4 5 6 7 8 9 10
Series1 Series2
Only retrained if accuracy < 98%
Results: Structured Fisher pruning (1 round, with retraining)