Saxena ICML2005

http://ai.stanford.
edu/~asaxena/rccar/
High Speed Obstacle

Avoidance using Monocular
Vision and Reinforcement
Learning
Jeff Michels
Ashutosh Saxena
Andrew Y. Ng
ICML
2005.
Problem
Drive a remote control car

at high speeds
Unstructured outdoor
environments
Off the shelf hardware,
inexpensive cameras and
little processing power
Vision and Driving Control
ICML 2005.
Jeff Michels, Ashutosh
QuickTime and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Prior Work: Vision
Estimating depth from multiple

images:
Stereovision (e.g., Scharstein &

Szeliski, 2002)
Depth from Defocus (e.g.,
Klarquist et al., 1995)
Optical Flow/Structure from
motion (e.g., Barron et al., 1994)
Motivation #1: Monocular vision.

Stereo vision has limits
baseline distance between
cameras
vibration and blur
We would like to explore the use of
monocular cues.
ICML 2005.
Prior Work: Driving Control
Driving
Stereo-vision for driving (LeCun, 2003)

Highways with clear lane markings (Pomerleau, 1989)
Single camera for indoor robot, but known color and texture of ground
(Gini & Marchi, 2002)
Motivation #2: Reinforcement learning

Many past successes used model-based RL.
Does model-based RL still make sense even for tasks requiring
complex perception?
(To simulate vision input, we need to use computer graphics!)
ICML 2005.
Approach
Vision System
Estimate distance to nearest obstacle in each
possible steering direction.
Driving Control
Map from the output of the vision system into
steering commands for the car.
Use reinforcement learning to learn the policy.
ICML 2005.
Vision System:
Training Data
Image divided into vertical

columns corresponding to
possible steering
directions.
Image labeled with depth
for each vertical column
Laser range finder -ground truth distances
ICML 2005.
Vision System: Monocular

Cues
Monocular Cues used by

humans for depth perception
Texture Variations - Laws

Texture Gradient (Linear
Perspective) - Radon, Harris
Haze - Color
Occlusion
Known Object Size
(Loomis, Nature 2001)
ICML 2005.
Feature Vector: Monocular

Cues
Texture Variation
Texture Gradient
Occlusion, Object Size,
Global structure
Overlapping windows
Appending adjacent stripes
vectors
The feature vector size is

858 dimension
ICML 2005.
Learning Algorithm
Supervised learning to estimate the distance d in each

column of the image.
Learn weights w via ordinary least squares with quadratic
cost.
depth
weights
arg minw i (di - wT xi )2

i = columns, images
features
Other regression methods (SVR, robust regression) gave

similar results
ICML 2005.
Results: Learning Depth

Estimates
0.4
Errors on a log scale
E = | log10(d) log10(destimated) |
0.35
0.3
0.25
Able to predict depth with

a average error of 0.26
orders of magnitude.
0.2
Radon
(Texture
Gradient)
Harris
(Texture
Gradient)
ICML 2005.
Laws
(Texture
Variations)
All
Synthetic Graphics Data
Graphics images for

training the vision
system.
Variable degree of
graphical realism
Can a system trained
on synthetic images
predict distances on
real images?
ICML 2005.
Results: Combined Vision

System
When the distance to

nearest obstacle in the
chosen direction is less than
5 m, then it is a hazard.
Hazard rate improves by
combining the real and
synthetic trained system.
24% hazard rate reduction

over using only real images.
ICML 2005.
Control: Reinforcement
Learning
Model based RL -- hard perception problem

Randomly generated environment in
graphics simulator
Pegasus (Ng & Jordan, 2000) to learn
control policy
Car initialized at (0,0) and ran for fixed time
horizon.
Learning algorithm converged after 1674
iterations of policy search.
ICML 2005.
Reinforcement Learning:
Parameters
1: spatial smoothing of predicted distances
2: threshold distance for evasive action
3: steering angle parameter
4, 5: evasive action parameters
6: throttle parameter
ICML 2005.
QuickTime and a
decompressor
ICML 2005.
Results: Actual Driving

Experiments
ICML 2005.
QuickTime and a
Cinepak decompressor
ICML 2005.
QuickTime and a
decompressor
ICML 2005.
Results: Driving Times
QuickTime and a
TIFF (LZW) decompressor
ICML 2005.
Summary
Monocular depth estimation is an interesting

and important problem.
Supervised learning for depth estimation.
Model-based RL, using computer graphics
simulator, to learn controller.
ICML 2005.
Extensions/Future Work
Learn complete
depth maps
Markov Random
Field (MRF) to
estimate depths.
Learning depth from single

monocular images,
Ashutosh Saxena, Sung H. Chung,
Andrew Y. Ng.
In NIPS 2005.
Image
ICML 2005.
Ground Truth
Predicted
[also with Sung Chung.]
Contact:
Ashutosh Saxena, asaxena@cs.stanford.edu
http://ai.stanford.edu/~asaxena/rccar/
http://ai.stanford.edu/~asaxena/learningdepth/
ICML 2005.

Saxena ICML2005

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Saxena ICML2005

Hochgeladen von

Copyright:

Verfügbare Formate

http://ai.stanford.

High Speed Obstacle

Drive a remote control car

Vision and Driving Control

Jeff Michels, Ashutosh

Prior Work: Vision

Estimating depth from multiple

Stereovision (e.g., Scharstein &

Motivation #1: Monocular vision.

Jeff Michels, Ashutosh

Prior Work: Driving Control

Stereo-vision for driving (LeCun, 2003)

Motivation #2: Reinforcement learning

Jeff Michels, Ashutosh

Jeff Michels, Ashutosh

Image divided into vertical

Jeff Michels, Ashutosh

Vision System: Monocular

Monocular Cues used by

Texture Variations - Laws

(Loomis, Nature 2001)

Jeff Michels, Ashutosh

Feature Vector: Monocular

The feature vector size is

Jeff Michels, Ashutosh

Supervised learning to estimate the distance d in each

arg minw i (di - wT xi )2

Other regression methods (SVR, robust regression) gave

Jeff Michels, Ashutosh

Results: Learning Depth

Errors on a log scale

Able to predict depth with

Jeff Michels, Ashutosh

Synthetic Graphics Data

Graphics images for

Jeff Michels, Ashutosh

Results: Combined Vision

When the distance to

24% hazard rate reduction

Jeff Michels, Ashutosh

Model based RL -- hard perception problem

Jeff Michels, Ashutosh

1: spatial smoothing of predicted distances

2: threshold distance for evasive action

3: steering angle parameter

4, 5: evasive action parameters

Jeff Michels, Ashutosh

Jeff Michels, Ashutosh

Results: Actual Driving

Jeff Michels, Ashutosh

Jeff Michels, Ashutosh

Jeff Michels, Ashutosh

Results: Driving Times

Jeff Michels, Ashutosh

Monocular depth estimation is an interesting

Jeff Michels, Ashutosh

Learning depth from single

Jeff Michels, Ashutosh

[also with Sung Chung.]

Jeff Michels, Ashutosh

Das könnte Ihnen auch gefallen