Sie sind auf Seite 1von 27

Generalizing Backpropagation to

Include Sparse Coding


David M. Bradley (dbradley@cs.cmu.edu)
and Drew Bagnell
Robotics Institute
Carnegie Mellon University

Outline
Discuss value of modular and deep gradient based
systems, especially in robotics
Introduce a new and useful family of modules
Properties of new family
Online training with non-gaussian priors
E.g. encourage sparsity, multi-task weight sharing

Modules internally solve continuous optimization


problems
Captures interesting nonlinear effects such as inhibition that
involve coupled outputs
Sparse Approximation

Modules can be jointly optimized by a generalization of


backpropagation

Deep Modular Learning systems


Efficiently represent complex functions
Particularly efficient for closely related tasks

Recently shown to be powerful learning machines


Greedy layer-wise training improves initialization

Greedy module-wise training is useful for


designing complex systems
Design and Initialize modules independently
Jointly optimize the final system with backpropagation

Gradient methods allow the incorporation of


diverse data sources and losses
G. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for deep belief networks., Neural Computation 2006
Y. Bengio, P. Lamblin, H. Larochelle, Greedy layer-wise training of deep networks., NIPS 2007
Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, 1998

Mobile Robot Perception


Ladar

RGB Camera
NIR Camera

Lots of unlabeled data


Hard to define traditional
supervised learning data
Target task is defined by
weakly-labeled structured
output data

Perception Problem: Scene labeling

Cost for each


2-D cell

Motion
Planner

Goal System
Camera

Lighting
Variance Cost

Object Classification Cost


Proprioception
Prediction Cost
Point
Classifier
Max Margin
Planner
Ground Plane
Estimator
Laser

Data Flow
Gradient

Webcam Data

Labelme
Labelme

IMU data

Classification
Cost

Labeled 3-D
points

Motion plans
Human-Driven
Example Paths

Observed Wheel Heights

New Modules
Modules that are important in this system
require two new abilities
Induce new priors on weights
Allow modules to solve internal optimization
problems

Standard Backpropagation assumes L2 prior

Gradient descent with convex loss functions:


Small steps with early stopping imply L 2
regularization
Minimizes a regret bound by solving the optimization:

Which bounds the true regret


M. Zinkevich, Online Convex Programming and Generalized Infinitesimal Gradient Ascent, 03

Alternate Priors
KL-divergence

Useful if many features are irrelevant


Approximately solved with exponentiated gradient
descent

multi-task priors (encourage sharing between


related tasks)
Argyriou and Evgeniou, Multi-task Feature Learning, NIPS 07
Bradley and Bagnell 2008

L2 Backpropagation
Loss Function

c
w1t 1 w1t 1c
Module (M1)
Input

w2t 1 w2t 2c
Module (M2)

c
c

w3t 1 w3t 3c
Module (M3)

c
c

Loss Function

With KL prior modules


Loss Function

c
w1t 1 w1t e 1c
Module (M1)
Input

w2t 1 w2t e 2c
Module (M2)

c
c

w3t 1 w3t 3c
Module (M3)

c
c

Loss Function

General Mirror Descent


Loss Function

c
t 1
1

( w ) c ( w )

t
1

t
1

Module (M1)
Input
Module (M2)

t 1
2

b
( w ) c( w2t )

t
2

w3t 1 ( w3t ) c( w3t )

Module (M3)

c
c

Loss Function

New Modules
Modules that are important in this system
require two new abilities
Induce new priors on weights
Allow modules to solve internal
optimization problems
interesting nonlinear effects such as inhibition
that involve coupled outputs
Sparse Approximation

Inhibition
Input

Basis

Inhibition
Input

Projection

Basis

Inhibition
Input

KL-regularized Optimization

Basis

Sparse Approximation
Assumes the input is a sparse combination of
elements, plus observation noise
Many possible elements
Only a few present in any particular example

True for many real-world signals


Many applications
Compression (JPEG), Sensing (MRI), Machine
Learning

Produces effects observed in biology


V1 receptive fields, Inhibition
Tropp et al. Algorithms For Simultaneous Sparse Approximation, 2005
Raina et al. Self Taught Learning: Transfer Learning from unlabeled data, ICML 07

Olhausen and Field, Sparse Coding of Natural Images Produces Localized, Oriented, Bandpass Receptive Fields, Nature 95
Doi and Lewicki, Sparse Coding of natural images using an overcomplete set of limited capacity units, NIPS 04

Sparse Approximation

Semantic meaning
is sparse

Visual Representation is Sparse (JPEG)

MNIST Digits Dataset

60,000 28x28 pixel handwritten digits


10,000 reserved for a validation set

Separate 10,000 digit test set

Sparse Approximation
Basis Coefficients (w1)
Error
gradient

Input
Reconstruction Error
(Cross Entropy)

r1=Bw

Sparse Approximation
KL-regularized Coefficients on a KL-regularized Basis

Input

Output

Sparse Coding

Basis Coefficients (w(i))

r=Bw(i)
Input
Training
Examples

Reconstruction Error
(Cross Entropy)

Minimize over W and B

Optimization Modules
L1 Regularized Sparse Approximation
Reconstruction Loss

Regularization Term

Convex

L1 Regularized Sparse Coding


Not Convex

Lee et al. Efficient Sparse Coding Algorithms, NIPS '06

KL-regularized Sparse Approximation


Unnormalized KL
Reconstruction Loss

Since this is continuous and differentiable, at the minimum


we have:

Differentiating both sides with respect to B, and solving for


the kth row we get:

Preliminary Results
L1 sparse coding

KL improves classification performance

Backpropagation further improves performance

KL sparse coding with backpropagation

Main Points
Modular, gradient based systems are an important
design tool for large scale learning systems
Need new tools to include a family of modules that
have important properties
Presented a generalized backpropagation
technique that
Allow priors that encourage, e.g. sparsity (KL prior): uses
mirror descent to modify weights
Uses implicit differentiation to compute gradients through
modules (e.g. sparse approximation) that internally solve
optimization

Demonstrated work-in-progress on building deep,


sparse coders using generalized backpropagation

Acknowledgements
The Authors would like to thank the UPI
team, especially Cris Dima, David Silver,
and Carl Wellington
DARPA and the Army Research Office for
supporting this work through the UPI
program and the NDSEG fellowship

Das könnte Ihnen auch gefallen