Sie sind auf Seite 1von 32

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding: An Overview

Brian Booth

SFU Machine Learning Reading Group

November 12, 2013

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding


Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding

Every column
of D is a
prototype

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding

Every column
of D is a
prototype

Similar to, but


more general
than, PCA
Introduction The Basics Adding Prior Knowledge Conclusions

Example: Sparse Coding of Images

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding in V1
Introduction The Basics Adding Prior Knowledge Conclusions

Example: Image Denoising

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Image Restoration


Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding and Acoustics

Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding and Acoustics

Inner ear (cochlea) also


does sparse coding of
frequencies
Introduction The Basics Adding Prior Knowledge Conclusions

Sparse Coding and Natural Language Processing

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?


Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions
Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions
Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding, revisited

We assume our data x satises


n
x i di = D
i=1

Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding, revisited

We assume our data x satises


n
x i di = D
i=1

Learning:
Given training data xj , j {1, , m}
Learn dictionary D and sparse code
Introduction The Basics Adding Prior Knowledge Conclusions

The aim of sparse coding, revisited

We assume our data x satises


n
x i di = D
i=1

Learning:
Given training data xj , j {1, , m}
Learn dictionary D and sparse code

Encoding:
Given test data x, dictionary D
Learn sparse code

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:


m n
j
arg min x ij di 2
j=1 i=1
{di },{j }

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:


m n n
j
arg min x ij di 2 + |ij |
j=1 i=1 j=1 i=1
{di },{j }

m

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:


m n n
j
arg min x ij di 2 + |ij |
j=1 i=1 j=1 i=1
{di },{j }

m

2
subject to di c, i = 1, , n.
Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:


m n n
j
arg min x ij di 2 + |ij |
j=1 i=1 j=1 i=1
{di },{j }

m

2
subject to di c, i = 1, , n.

In matrix notation:

D,A
arg minX AD2F + |i,j |
i,j

subject to D2i,j c, i = 1, , n.
i

Introduction The Basics Adding Prior Knowledge Conclusions

Learning: The Objective Function

Dictionary learning involves optimizing:


m n n
j
arg min x ij di 2 + |ij |
j=1 i=1 j=1 i=1
{di },{j }

m

2
subject to di c, i = 1, , n.

In matrix notation:

D,A
arg minX AD2F + |i,j |
i,j

subject to D2i,j c, i = 1, , n.
i

Split the optimization over D and A in two.


Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Learning the Dictionary

Reduced optimization problem:

D
arg minX AD2F

subject to D2i,j c, i = 1, , n.
i

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Learning the Dictionary

Reduced optimization problem:

D
arg minX AD2F

subject to D2i,j c, i = 1, , n.
i

Introduce Lagrange multipliers:


n
L (D, ) = tr (X AD)T (X AD) + j Di,j c

j=1 i

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Learning the Dictionary

Reduced optimization problem:

D
arg minX AD2F

subject to D2i,j c, i = 1, , n.
i

Introduce Lagrange multipliers:


n
L (D, ) = tr (X AD)T (X AD) + j Di,j c

j=1 i

where each j 0 is a dual variable...

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian


n
L (D, ) = tr (X AD)T (X AD) + j D2i,j c

j=1 i

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian


n
L (D, ) = tr (X AD)T (X AD) + j D2i,j c

j=1 i

minimize over D to obtain Lagrange dual

D
D () = min L (D, ) =

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian


n
L (D, ) = tr (X AD)T (X AD) + j D2i,j c

j=1 i

minimize over D to obtain Lagrange dual

T T T T
XA
D
D () = min L (D, ) = tr X X XA AA + c
1 T
Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian


n
L (D, ) = tr (X AD)T (X AD) + j D2i,j c

j=1 i

minimize over D to obtain Lagrange dual

T T T T
XA
D
D () = min L (D, ) = tr X X XA AA + c
1 T

The dual can be optimized using congugate gradient

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Moving to the dual

From the Lagrangian


n
L (D, ) = tr (X AD)T (X AD) + j D2i,j c

j=1 i

minimize over D to obtain Lagrange dual

T T T T
XA
D
D () = min L (D, ) = tr X X XA AA + c
1 T

The dual can be optimized using congugate gradient


Only n, values compared to D being n k
Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Dual to the Dictionary

With the optimal , our dictionary is

DT = AAT + XAT
1 T

Introduction The Basics Adding Prior Knowledge Conclusions

Step 1: Dual to the Dictionary

With the optimal , our dictionary is

T T T
D = AA + XA
1 T

Key point: Moving to the dual reduces the number of


optimization variables, speeding up the optimization.
Introduction The Basics Adding Prior Knowledge Conclusions

Step 2: Learning the Sparse Code

With D now xed, optimize for A

A
arg minX AD2F + |i,j |
i,j

Introduction The Basics Adding Prior Knowledge Conclusions

Step 2: Learning the Sparse Code

With D now xed, optimize for A

A
arg minX AD2F + |i,j |
i,j

Unconstrained, convex quadratic optimization


Introduction The Basics Adding Prior Knowledge Conclusions

Step 2: Learning the Sparse Code

With D now xed, optimize for A

A
arg minX AD2F + |i,j |
i,j

Unconstrained, convex quadratic optimization


Many solvers for this (e.g. interior point methods, in-crowd
algorithm, xed-point continuation)

Introduction The Basics Adding Prior Knowledge Conclusions

Step 2: Learning the Sparse Code

With D now xed, optimize for A

A
arg minX AD2F + |i,j |
i,j

Unconstrained, convex quadratic optimization


Many solvers for this (e.g. interior point methods, in-crowd
algorithm, xed-point continuation)

Note:
Same problem as the encoding problem.
Runtime of optimization in the encoding stage?
Introduction The Basics Adding Prior Knowledge Conclusions

Speeding up the testing phase

Fair amount of work on speeding up the encoding stage:

H. Lee et al., Efcient sparse coding algorithms


http://ai.stanford.edu/~hllee/
nips06-sparsecoding.pdf

K. Gregor and Y. LeCun, Learning Fast Approximations of


Sparse Coding
http://yann.lecun.com/exdb/publis/pdf/
gregor-icml-10.pdf

S. Hawe et al., Separable Dictionary Learning


http://arxiv.org/pdf/1303.5244v1.pdf

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions
Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Introduction The Basics Adding Prior Knowledge Conclusions

Relationships between Dictionary atoms

Dictionaries are
over-complete bases
Introduction The Basics Adding Prior Knowledge Conclusions

Relationships between Dictionary atoms

Dictionaries are
over-complete bases

Dictate relationships
between atoms

Introduction The Basics Adding Prior Knowledge Conclusions

Relationships between Dictionary atoms

Dictionaries are
over-complete bases

Dictate relationships
between atoms

Example: Hierarchical
dictionaries
Introduction The Basics Adding Prior Knowledge Conclusions

Example: Image Patches

Introduction The Basics Adding Prior Knowledge Conclusions

Example: Document Topics


Introduction The Basics Adding Prior Knowledge Conclusions

Problem Statement

Goal:
Have sub-groups of sparse code
all be non-zero (or zero).

Introduction The Basics Adding Prior Knowledge Conclusions

Problem Statement

Goal:
Have sub-groups of sparse code
all be non-zero (or zero).

Hierarchical:
If a node is non-zero, its parent
must be non-zero
If a nodes parent is zero, the
node must be zero
Introduction The Basics Adding Prior Knowledge Conclusions

Problem Statement

Goal:
Have sub-groups of sparse code
all be non-zero (or zero).

Hierarchical:
If a node is non-zero, its parent
must be non-zero
If a nodes parent is zero, the
node must be zero

Implementation:
Change the regularization
Enforce sparsity differently...

Introduction The Basics Adding Prior Knowledge Conclusions

Grouping Code Entries

Level k included in k + 1 groups


Introduction The Basics Adding Prior Knowledge Conclusions

Grouping Code Entries

Level k included in k + 1 groups


Add |i | to objective function once for each group

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

j j 2
arg min x D
j=1
D,{j }
m

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

j j 2
arg min x D + j
j=1
D,{j }
m

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

j j 2
arg min x D + j
j=1
D,{j }
m

where
() = wg |g
gP

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

j j 2
arg min x D + j
j=1
D,{j }
m

where
() = wg |g
gP

|g are the code values for group g.

Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

j j 2
arg min x D + j
j=1
D,{j }
m

where
() = wg |g
gP

|g are the code values for group g.


wg weights the enforcement of the hierarchy
Introduction The Basics Adding Prior Knowledge Conclusions

Group Regularization

Updated objective function:

j j 2
arg min x D + j
j=1
D,{j }
m

where
() = wg |g
gP

|g are the code values for group g.


wg weights the enforcement of the hierarchy
Solve using proximal methods.

Introduction The Basics Adding Prior Knowledge Conclusions

Other Examples

Other examples of structured sparsity:

M. Stojnic et al., On the Reconstruction of Block-Sparse


Signals With an Optimal Number of Measurements,
http://dx.doi.org/10.1109/TSP.2009.2020754

J. Mairal et al., Convex and Network Flow Optimization for


Structured Sparsity, http://jmlr.org/papers/
volume12/mairal11a/mairal11a.pdf
Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions

Introduction The Basics Adding Prior Knowledge Conclusions

Outline

1 Introduction: Why Sparse Coding?

2 Sparse Coding: The Basics

3 Adding Prior Knowledge

4 Conclusions
Introduction The Basics Adding Prior Knowledge Conclusions

Summary

Introduction The Basics Adding Prior Knowledge Conclusions

Summary

Two interesting
directions:
Introduction The Basics Adding Prior Knowledge Conclusions

Summary

Two interesting
directions:

Increasing
speed of the
testing phase

Introduction The Basics Adding Prior Knowledge Conclusions

Summary

Two interesting
directions:

Increasing
speed of the
testing phase

Optimizing
dictionary
structure

Das könnte Ihnen auch gefallen