Advanced Topics in Autonomous Driving Using Deep Learning: Presenter: Nasim Souly

Advanced Topics in Autonomous Driving
using
Deep Learning
Presenter: Nasim Souly

Electronics Research Laboratory https://aid-driving.eu/
www.vwerl.com/
New challenges in Autonomous driving
• AVs have to gain people trust
• AVs need to have a better social understanding of pedestrians
Intent prediction and communication
• Models need to be
• Fast -> low latency to react quickly
• Small -> to run on embedded systems
Autonomous driving
• Benefits of autonomy in driving
• Save lives (~1 million people dies every year: human recklessness (DUI, distracted,…)
• Mobility is increased
• Labor-cost savings
• Reduction in carbon dioxide (CO2) emissions by optimized driving.
• Personalized transportation
Autonomous Driving
SEDRIC is the first concept car from the Volkswagen
Group, the prototype of an autonomous vehicle
DARPA Grand Challenge

‘Stanley’
Audi A8 Traffic Jam Pilot
6
Intelligent System
Artificial Intelligence: A Modern Approach, Global

Edition
by RUSSELL / NORVIG
Utilizing smart functional modules for driving
Intelligent module Actions

Sensor data
(e.g. Deep networks, (e.g. accelerate,
(e.g images)
Planning, etc.) break)
Why deep learning?
• All about the features
• Driving is also related with certain features

• Features can determine what action to take next
Why Deep learning
• Feature extraction
• Representation learning
• Scalable
Deep Neural Networks
• Universal estimators (special purpose functions)
• Break through:
• GPU
• Large datasets (e.g. Imagnet)
• Research (algorithms)
• Infrastructures(AWS ,…)
• Softwares (caffe, tensorflow, pytorch,…)
Common perception models
• Object recognition, object detection, semantic segmentation
• Cars need these models to perceive the surroundings
Need - Rise of Machines
• AVs have to gain people trust

• AVs need to have a better social understanding of pedestrians
Communicate
Predict intent
intent
Intent prediction and communication
12
Predict pedestrian intentions
Predict intent Communicate intent

• Recognition of pedestrian’s movements
and prediction of intent
• Need to recognize each pedestrian’s activity over time (in video context)
• Sensor-agnostic representation of human movements

noun: streetsmart
the experience and knowledge necessary to deal with the
potential difficulties or dangers of life in an urban
environment.
14
Modular AV Architecture
…
Gesture/Activity
Recognition
Keypoints
(skeleton) Intent
Estimation Prediction
Sensors Perception Interpretation & Path Planning

Prediction
15
Keypoints Detection
Figure from OpenPose [Cao, Z 2017]
16
Approach: Top-down
• Top-down
• Detect a human first
• Run pose detection model on each person detected
• Slightly better accuracy, at the cost of latency for crowded scenes
Figures from [Cao, Z 2017]
17
Approach: Bottom-up
• Bottom-up
• Find all joints/keypoints in image
• Post-process joints to separate each person from another
• Real-time capable for crowded scenes
Figures from CMU-pose [Zhe Cao, 2017l ]
18
Multi-Person Pose Estimation using Part Affinity Fields [Cao, Zhe 2017]
19
Approach
• Jointly Learning Parts Detection and Parts Association
21
Pose methods Comparison
• For the realtime purpose one of the best approach is OpenPose
• post-processing is needed ; quadratic in terms of scaling to the number of people
• Inference time is mostly spent on the CNN
• Even for large number of people (~20), the post-processing time is insignificant.
• Some other bottom-up approaches

• Deep(er) Cut [Insafutdinov]
• PersonLab [Papandreou, G 2018]
22
San Francisco
23
San Francisco
24
Munich
blurred/anonymized image:
25
Open Pose Demo
Compression
• Modular buildups
• Several CPU, and GPUs
• No space for customer
Compression
• Compress Deep Neural Networks to make faster predictions on embedded hardware and reduce memory
footprint
Cars
Pedestrians Data
Collection
Bikes
Corner Case Semantic
Example: Detection Segmentation
Object detection only Perception
10 fps
Path Planning
Data
Processing
Multiple cameras, LiDAR, CAN
29
Deep Network Compression
• Memory efficient architectures
TensorRT from Nvidia
• Rethinking the design of the architecture (Squeeze-net)
• Platform dependent compression
• Not supporting custom layers
• Needs specific hardware
• Generic Network Compression

• Can be applied on different architectures
• Does not depend on particular platform
30
Convolutional Neural Network
Network Structure
Process
• Read Sensor data
• Feature extraction and fusion
• Link parts together
Unstructured pruning [Han2015]
Conv layer : 512 x 512 x 3 x 3 = 2.4M

Fc: feed 7-by-7-feature maps into the first 4096-node-fc-
layer accounts for 512 x 7 x 7 x 4096 = 102.8M
parameters alone
Low rank decompositions 2D [Xue2013]
• Replace weight matrix by two smaller weight matrices

Comparison compression methods for fully connected layers
Approach Size reduction Speed Accuracy
Unstructured 100x smaller Similar Comparable

pruning
[Han2015]
Low rank 40x smaller 2.4x faster Comparable
decompositions
[Xue2013]
Structured Fisher pruning [Molchanov2017, Theis2018]
1.4 0.8 2.7

height
width
channels
Remove channel slices
by importance
Benchmark
Multiple pruning rounds: compress model by [5, 10, …, 50] %
• Pruning rounds (10, 100, 1)  one complete loop
(prune + retrain + evaluation)
• Filters to prune in each round (93, 9, 930)
• Pruning batch size (32)
• Number of pruning steps (4)
10x
Different pruning strategies
0. Baseline (no retrain)
1. Retrain after each pruning round
1 Prune Retrain
93 channels
10 rounds with 93 channels
100x
2. Retrain after each pruning round

100 rounds with 9 channels
2 Prune Retrain
9 channels
1x
3. Retrain once at end
1 round with 930 channels
3 Prune Retrain
930 channels
Results: Structured Fisher pruning (10 rounds, no retraining)
Loss [abs] Accuracy [%]

0.45 1.2
0.4
1
0.35
0.3 0.8
0.25 0.6
0.2
0.4
0.15
0.1 0.2
0.05 0
0 1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10 11
% compression % compression
fulleval/baseline_prune_10x93
Results: Structured Fisher pruning (10 rounds, with retraining)
0.6 Loss [abs]
0.5
0.4
0.3
0.2
0.1
0 % compression
1 2 3 4 5 6 7 8 9 10
Series1 Series2
Only retrained if accuracy < 98%
Results: Structured Fisher pruning (1 round, with retraining)
Baseline Compressed Compressed

model model model
(before retrain) (after retrain)
Size 3,726,801 1,725,092 1,725,092
parameters parameters parameters
45 MB 21 MB 21 MB
Accuracy 28% dropped 13% improved

Loss: 0.097977 Loss: 0.352622 Loss: 0.086778
Compressed to 50% 40 epochs, ~18 hours

Summary
• Autonomous driving has come a long way
• Deep learning shows promising results for autonomous driving (also it is Scalable )
• Common perception models are not enough -> need more complex analysis
• Typical networks are too big for cars ->generic compression to reduce the size and time
References
• Cao, Z., Simon, T., Wei, S. E., & Sheikh, Y. (2017). Realtime multi-person 2d pose estimation
using part affinity fields. In Proceedings of the IEEE Conference on Computer Vision and
Pattern Recognition (pp. 7291-7299
• Song Han, Jeff Pool, John Tran, and William Dally. “Learning both weights and connections
for efficient neural network.” In NIPS, 2015.
• Xue, Jian, et al. "Singular value decomposition based low-footprint speaker adaptation and
personalization for deep neural network." 2014 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP). IEEE, 2014.
• Theis, Lucas, et al. "Faster gaze prediction with dense networks and fisher pruning." arXiv
preprint arXiv:1801.05787 (2018).
• Insafutdinov, Eldar, et al. "Deepercut: A deeper, stronger, and faster multi-person pose
estimation model." European Conference on Computer Vision. Springer, Cham, 2016.
• Papandreou, George, et al. "Personlab: Person pose estimation and instance segmentation
with a bottom-up, part-based, geometric embedding model." Proceedings of the European
Conference on Computer Vision (ECCV). 2018.

Advanced Topics in Autonomous Driving Using Deep Learning: Presenter: Nasim Souly

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Advanced Topics in Autonomous Driving Using Deep Learning: Presenter: Nasim Souly

Hochgeladen von

Copyright:

Verfügbare Formate

Advanced Topics in Autonomous Driving

Presenter: Nasim Souly

Intent prediction and communication

DARPA Grand Challenge

Audi A8 Traffic Jam Pilot

Artificial Intelligence: A Modern Approach, Global

Utilizing smart functional modules for driving

Intelligent module Actions

• All about the features

• Driving is also related with certain features

• AVs have to gain people trust

Intent prediction and communication

Predict intent Communicate intent

• Sensor-agnostic representation of human movements

Sensors Perception Interpretation & Path Planning

Figure from OpenPose [Cao, Z 2017]

Figures from [Cao, Z 2017]

Figures from CMU-pose [Zhe Cao, 2017l ]

• Some other bottom-up approaches

• Generic Network Compression

Conv layer : 512 x 512 x 3 x 3 = 2.4M

• Replace weight matrix by two smaller weight matrices

Approach Size reduction Speed Accuracy

Unstructured 100x smaller Similar Comparable

1.4 0.8 2.7

2. Retrain after each pruning round

Loss [abs] Accuracy [%]

Baseline Compressed Compressed

Accuracy 28% dropped 13% improved

Compressed to 50% 40 epochs, ~18 hours

Das könnte Ihnen auch gefallen