Sparse Coding

Introduction The Basics Adding Prior Knowledge Conclusions
Sparse Coding: An Overview
Brian Booth
SFU Machine Learning Reading Group
November 12, 2013
The aim of sparse coding

Every column
of D is a
prototype
Every column
of D is a
prototype
Similar to, but

more general
than, PCA
Example: Sparse Coding of Images
Sparse Coding in V1
Example: Image Denoising
Example: Image Restoration

Sparse Coding and Acoustics
Sparse Coding and Acoustics
Inner ear (cochlea) also

does sparse coding of
frequencies
Sparse Coding and Natural Language Processing
Outline
1 Introduction: Why Sparse Coding?

Outline
2 Sparse Coding: The Basics
3 Adding Prior Knowledge
Outline
4 Conclusions
Outline
4 Conclusions
Outline
4 Conclusions
The aim of sparse coding, revisited
We assume our data x satises

n
x i di = D
i=1


n
x i di = D
i=1

Learning:
Given training data xj , j {1, , m}
Learn dictionary D and sparse code

n
x i di = D
i=1

Learning:
Given training data xj , j {1, , m}
Learn dictionary D and sparse code
Encoding:
Given test data x, dictionary D
Learn sparse code
Learning: The Objective Function
Dictionary learning involves optimizing:

m n
j
arg min x ij di 2
j=1 i=1
{di },{j }


m n n
j
arg min x ij di 2 + |ij |
j=1 i=1 j=1 i=1
{di },{j }

m

m n n
j
j=1 i=1 j=1 i=1
{di },{j }

m
2
subject to di c, i = 1, , n.

m n n
j
j=1 i=1 j=1 i=1
{di },{j }

m
2
In matrix notation:
D,A
arg minX AD2F + |i,j |
i,j

subject to D2i,j c, i = 1, , n.
i


m n n
j
j=1 i=1 j=1 i=1
{di },{j }

m
2
In matrix notation:
D,A
i,j

i

Split the optimization over D and A in two.

Step 1: Learning the Dictionary
Reduced optimization problem:
D
arg minX AD2F
i

D
arg minX AD2F
i

Introduce Lagrange multipliers:

n
L (D, ) = tr (X AD)T (X AD) + j Di,j c

j=1 i

D
arg minX AD2F
i

Introduce Lagrange multipliers:

n
L (D, ) = tr (X AD)T (X AD) + j Di,j c

j=1 i

where each j 0 is a dual variable...
Step 1: Moving to the dual
From the Lagrangian

n
L (D, ) = tr (X AD)T (X AD) + j D2i,j c

j=1 i

From the Lagrangian

n

j=1 i

minimize over D to obtain Lagrange dual
D
D () = min L (D, ) =
From the Lagrangian

n

j=1 i

T T T T
XA
D
D () = min L (D, ) = tr X X XA AA + c
1 T
From the Lagrangian

n

j=1 i

T T T T
XA
D
1 T
The dual can be optimized using congugate gradient
From the Lagrangian

n

j=1 i

T T T T
XA
D
1 T
The dual can be optimized using congugate gradient

Only n, values compared to D being n k
Step 1: Dual to the Dictionary
With the optimal , our dictionary is
DT = AAT + XAT
1 T
Step 1: Dual to the Dictionary
With the optimal , our dictionary is
T T T
D = AA + XA
1 T
Key point: Moving to the dual reduces the number of

optimization variables, speeding up the optimization.
Step 2: Learning the Sparse Code
With D now xed, optimize for A
A
i,j

A
i,j

Unconstrained, convex quadratic optimization

A
i,j


Many solvers for this (e.g. interior point methods, in-crowd
algorithm, xed-point continuation)
A
i,j


Many solvers for this (e.g. interior point methods, in-crowd
algorithm, xed-point continuation)
Note:
Same problem as the encoding problem.
Runtime of optimization in the encoding stage?
Speeding up the testing phase
Fair amount of work on speeding up the encoding stage:
H. Lee et al., Efcient sparse coding algorithms

http://ai.stanford.edu/~hllee/
nips06-sparsecoding.pdf
K. Gregor and Y. LeCun, Learning Fast Approximations of

Sparse Coding
http://yann.lecun.com/exdb/publis/pdf/
gregor-icml-10.pdf
S. Hawe et al., Separable Dictionary Learning

http://arxiv.org/pdf/1303.5244v1.pdf
Outline
4 Conclusions
Outline
4 Conclusions
Relationships between Dictionary atoms
Dictionaries are
over-complete bases
Dictionaries are
over-complete bases
Dictate relationships
between atoms
Dictionaries are
over-complete bases
Dictate relationships
between atoms
Example: Hierarchical
dictionaries
Example: Image Patches
Example: Document Topics

Problem Statement
Goal:
Have sub-groups of sparse code
all be non-zero (or zero).
Problem Statement
Goal:
Hierarchical:
If a node is non-zero, its parent
must be non-zero
If a nodes parent is zero, the
node must be zero
Problem Statement
Goal:
Hierarchical:
If a node is non-zero, its parent
must be non-zero
If a nodes parent is zero, the
node must be zero
Implementation:
Change the regularization
Enforce sparsity differently...
Grouping Code Entries
Level k included in k + 1 groups

Grouping Code Entries
Level k included in k + 1 groups

Add |i | to objective function once for each group
Group Regularization
Updated objective function:
j j 2
arg min x D
j=1
D,{j }
m

j j 2
arg min x D + j
j=1
D,{j }
m

j j 2
arg min x D + j
j=1
D,{j }
m

where
() = wg |g
gP

j j 2
arg min x D + j
j=1
D,{j }
m

where
() = wg |g
gP

|g are the code values for group g.
j j 2
arg min x D + j
j=1
D,{j }
m

where
() = wg |g
gP


wg weights the enforcement of the hierarchy
j j 2
arg min x D + j
j=1
D,{j }
m

where
() = wg |g
gP


wg weights the enforcement of the hierarchy
Solve using proximal methods.
Other Examples
Other examples of structured sparsity:
M. Stojnic et al., On the Reconstruction of Block-Sparse

Signals With an Optimal Number of Measurements,
http://dx.doi.org/10.1109/TSP.2009.2020754
J. Mairal et al., Convex and Network Flow Optimization for

Structured Sparsity, http://jmlr.org/papers/
volume12/mairal11a/mairal11a.pdf
Outline
4 Conclusions
Outline
4 Conclusions
Summary
Summary
Two interesting
directions:
Summary
Two interesting
directions:
Increasing
speed of the
testing phase
Summary
Two interesting
directions:
Increasing
speed of the
testing phase
Optimizing
dictionary
structure

Sparse Coding

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Sparse Coding

Hochgeladen von

Copyright:

Verfügbare Formate

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding: An Overview

SFU Machine Learning Reading Group

November 12, 2013

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding

The aim of sparse coding

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding

Similar to, but

Example: Sparse Coding of Images

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Image Denoising

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Image Restoration

Sparse Coding and Acoustics

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding and Acoustics

Inner ear (cochlea) also

Sparse Coding and Natural Language Processing

Introduction The Basics Adding Prior Knowledge Conclusions

1 Introduction: Why Sparse Coding?

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

Introduction The Basics Adding Prior Knowledge Conclusions

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

Introduction The Basics Adding Prior Knowledge Conclusions

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

The aim of sparse coding, revisited

We assume our data x satises

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding, revisited

We assume our data x satises

The aim of sparse coding, revisited

We assume our data x satises

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:

Learning: The Objective Function

Dictionary learning involves optimizing:

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:

Learning: The Objective Function

Dictionary learning involves optimizing:

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:

Split the optimization over D and A in two.

Step 1: Learning the Dictionary

Reduced optimization problem:

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Learning the Dictionary

Reduced optimization problem:

Introduce Lagrange multipliers:

Step 1: Learning the Dictionary