Sie sind auf Seite 1von 74

Compressive Sampling

and Frontiers in Signal Processing


`
Emmanuel Candes

New directions short course, IMA, University of Minnesota, June 2007

Lecture 4: The uniform uncertainty principle and its implications


The uniform uncertainty principle (UUP)
The UUP and general signal recovery from undersampled data
Examples of measurements obeying the UUP
Gaussian measurements
Binary measurements
Random orthonormal projections
Bounded orthogonal systems

So far...

Last time: exact recovery of sparse signals

So far...

Last time: exact recovery of sparse signals


1

We need to deal with compressible signals (not exactly sparse)

We need to deal with noise

So far...

Last time: exact recovery of sparse signals


1

We need to deal with compressible signals (not exactly sparse)

We need to deal with noise


We deal with the first issue today
We will deal with the second issue next time

Last time: uncertainty relation

Key by-product of signal recovery result:


Rmn (sensing matrix)
T arbitrary set of size S
Near isometry: for all x supported on T
1m
3m
kxk2`2 kxk2`2
kxk2`2
2n
2n

Last time: uncertainty relation

Key by-product of signal recovery result:


Rmn (sensing matrix)
T arbitrary set of size S
Near isometry: for all x supported on T
1m
3m
kxk2`2 kxk2`2
kxk2`2
2n
2n
We interpreted this as an uncertainty relation; e.g.
If x is supported on T
Then the energy of x on a set of size m is just about proportional to m

The uniform uncertainty principle


Definition (Restricted isometry constant S )
For each S = 1, 2, . . . , S is the smallest quantity such that
(1 S )kxk2`2 kxk2`2 (1 + S ) kxk2`2 ,

S-sparse x

The uniform uncertainty principle


Definition (Restricted isometry constant S )
For each S = 1, 2, . . . , S is the smallest quantity such that
(1 S )kxk2`2 kxk2`2 (1 + S ) kxk2`2 ,

S-sparse x

Or equivalently for all T with |T| S


1 S min (T T ) max (T T ) 1 + S
T Rm|T| columns with indices in T, |T| S
T

The uniform uncertainty principle


Definition (Restricted isometry constant S )
For each S = 1, 2, . . . , S is the smallest quantity such that
(1 S )kxk2`2 kxk2`2 (1 + S ) kxk2`2 ,

S-sparse x

Or equivalently for all T with |T| S


1 S min (T T ) max (T T ) 1 + S
T Rm|T| columns with indices in T, |T| S
T

Sparse subsets of column vectors are approximately orthonormal

Why is this is an uncertainty principle?

Suppose =

pn
m

R F

F is the n by n Fourier isometry


is a set of frequencies

Why is this is an uncertainty principle?

Suppose =

pn
m

R F

F is the n by n Fourier isometry


is a set of frequencies
Suppose S = 1/2
Arbitrary support T with |T| S
Arbitrary signal supported on T
1m
3m
kxk2`2 k1 xk2`2
kxk2`2
2n
2n

Why is this is an uncertainty principle?

Suppose =

pn
m

R F

F is the n by n Fourier isometry


is a set of frequencies
Suppose S = 1/2
Arbitrary support T with |T| S
Arbitrary signal supported on T
1m
3m
kxk2`2 k1 xk2`2
kxk2`2
2n
2n
Uniform because holds for all T

Foundational result of CS?


min ksk`1

s = y = x

xS : best S-term approximation of x (S largest entries)

Foundational result of CS?


min ksk`1

s = y = x

xS : best S-term approximation of x (S largest entries)


Theorem (C., Tao (2004)a )
a although

the statement below is due to C. (2007)

Assume 2S <

2 1 = .414. Then

kx? xk`1 . kx xS k`1 ,

kx? xk`2 .

kx xS k`1

Foundational result of CS?


min ksk`1

s = y = x

xS : best S-term approximation of x (S largest entries)


Theorem (C., Tao (2004)a )
a although

the statement below is due to C. (2007)

Assume 2S <

2 1 = .414. Then

kx? xk`1 . kx xS k`1 ,


Deterministic: nothing is random here!

kx? xk`2 .

kx xS k`1

Foundational result of CS?


min ksk`1

s = y = x

xS : best S-term approximation of x (S largest entries)


Theorem (C., Tao (2004)a )
a although

the statement below is due to C. (2007)

Assume 2S <

2 1 = .414. Then

kx? xk`1 . kx xS k`1 ,

kx? xk`2 .

kx xS k`1

Deterministic: nothing is random here!


Exact if x is S-sparse
Otherwise, essentially reconstructs the S largest entries of x
Powerful if S is close to m

The condition 2S < 1 is necessary


Exact recovery if x is S-sparse
With 2S = 1, one can have h Rn , 2S-sparse with
h = 0
Decompose h = x x0 each S-sparse with
(x x0 ) = 0

x = x0

Two S sparse signals produce the same measurements need 2S < 1

The condition 2S < 1 is necessary


Exact recovery if x is S-sparse
With 2S = 1, one can have h Rn , 2S-sparse with
h = 0
Decompose h = x x0 each S-sparse with
(x x0 ) = 0

x = x0

Two S sparse signals produce the same measurements need 2S < 1

`1 recovery condition is only slightly stronger: 2S <

21

Formal equivalence

Suppose there is an S-sparse solution to y = x


Combinatorial optimization problem
(P0 )

min ksk`0 ,

sRn

s = y

Convex optimization problem (Linear Program)


(P1 )

min ksk`1 ,

sRn

s = y

Formal equivalence

Suppose there is an S-sparse solution to y = x


Combinatorial optimization problem
(P0 )

min ksk`0 ,

sRn

s = y

Convex optimization problem (Linear Program)


(P1 )

min ksk`1 ,

sRn

s = y

Equivalence
If 2S < 1, solution to (P0 ) is unique

If 2S < 2 1, the solutions to (P0 ) and (P1 ) are unique and the same!

Proof of general `1 recovery


min ksk`1
Proof is elementary but nontrivial

s. t.

y = s = (x)

Proof of general `1 recovery


min ksk`1

s. t.

y = s = (x)

Proof is elementary but nontrivial


Solution x? = x + h, h = 0
x solution gives
kx + hk`1 kxk`1
T0 location of the S largest coefficients

Proof of general `1 recovery


min ksk`1

s. t.

y = s = (x)

Proof is elementary but nontrivial


Solution x? = x + h, h = 0
x solution gives
kx + hk`1 kxk`1
T0 location of the S largest coefficients

kx + hk`1 =

X
iT0

X
iT0

|xi + hi | +

|xi + hi |

iT0c

|xi | |hi | +

|hi | |xi |

iT0c

= kxk`1 + khk`1 (T0c ) khk`1 (T0 ) 2kxk`1 (T0c )

Implication
khk`1 (T0c ) khk`1 (T0 ) + 2kxk`1 (T0c ) = khk`1 (T0 ) + 2kx xS k`1

Implication
khk`1 (T0c ) khk`1 (T0 ) + 2kxk`1 (T0c ) = khk`1 (T0 ) + 2kx xS k`1

Uniform uncertainty principle implies


khk`1 (T0 ) khk`1 (T0c ) ,

<1

()

Implication
khk`1 (T0c ) khk`1 (T0 ) + 2kxk`1 (T0c ) = khk`1 (T0 ) + 2kx xS k`1

Uniform uncertainty principle implies


khk`1 (T0 ) khk`1 (T0c ) ,

<1

This gives
khk`1 (T0c )

2
kx xS k`1
1

()

Implication
khk`1 (T0c ) khk`1 (T0 ) + 2kxk`1 (T0c ) = khk`1 (T0 ) + 2kx xS k`1

Uniform uncertainty principle implies


khk`1 (T0 ) khk`1 (T0c ) ,

<1

This gives
khk`1 (T0c )

2
kx xS k`1
1

and
kx? xk`1 2

1+
kx xS k`1
1

()

Implication
khk`1 (T0c ) khk`1 (T0 ) + 2kxk`1 (T0c ) = khk`1 (T0 ) + 2kx xS k`1

Uniform uncertainty principle implies


khk`1 (T0 ) khk`1 (T0c ) ,

<1

This gives
khk`1 (T0c )

2
kx xS k`1
1

and
kx? xk`1 2
We are done!

1+
kx xS k`1
1

()

Implication
khk`1 (T0c ) khk`1 (T0 ) + 2kxk`1 (T0c ) = khk`1 (T0 ) + 2kx xS k`1

Uniform uncertainty principle implies


khk`1 (T0 ) khk`1 (T0c ) ,

<1

This gives
khk`1 (T0c )

2
kx xS k`1
1

and
kx? xk`1 2

1+
kx xS k`1
1

We are done! Well almost, we need to check ()

()

By Cauchy-Schwarz
khk`1 (T0 )

S khk`2 (T0 )

By Cauchy-Schwarz
khk`1 (T0 )

S khk`2 (T0 )

into subsets of size S and enumerate T0c as k1 , k2 , . . . , knS in


Divide
decreasing order of magnitude of hT0c . Set
T0c

Tj = {k` : (j 1)S0 + 1 ` jS0 }

T1 : indices of the S0 largest coefficients of hT0c ,


T2 : indices of the next S0 largest coefficients,
and so on...

By Cauchy-Schwarz
khk`1 (T0 )

S khk`2 (T0 )

into subsets of size S and enumerate T0c as k1 , k2 , . . . , knS in


Divide
decreasing order of magnitude of hT0c . Set
T0c

Tj = {k` : (j 1)S0 + 1 ` jS0 }

T1 : indices of the S0 largest coefficients of hT0c ,


T2 : indices of the next S0 largest coefficients,
and so on...
Now h = 0 gives
hT0 T1 = h(T0 T1 )c =

X
j2

hTj

By Cauchy-Schwarz
khk`1 (T0 )

S khk`2 (T0 )

into subsets of size S and enumerate T0c as k1 , k2 , . . . , knS in


Divide
decreasing order of magnitude of hT0c . Set
T0c

Tj = {k` : (j 1)S0 + 1 ` jS0 }

T1 : indices of the S0 largest coefficients of hT0c ,


T2 : indices of the next S0 largest coefficients,
and so on...
Now h = 0 gives
hT0 T1 = h(T0 T1 )c =

hTj

j2

This is interesting because


X
X
p
p
1 S+S0 khk`2 (T0 T1 ) khT0 T1 k`2
khTj k`2 1 + S0
khTj k`2
j2

j2

For each j 2
khTj k`2

1
S0 khTj k` khTj1 k`1
S0

For each j 2
khTj k`2
and thus
X
j2

1
S0 khTj k` khTj1 k`1
S0

khT c k`1
1
khTj k`2 (khT1 k`1 + khT2 k`1 + . . .) 0
0
S
S0

For each j 2
khTj k`2
and thus
X
j2

1
S0 khTj k` khTj1 k`1
S0

khT c k`1
1
khTj k`2 (khT1 k`1 + khT2 k`1 + . . .) 0
0
S
S0

In short

s
khk`2 (T0 T1 )

1 + S0 khT0c k`1

1 S+S0
S0

For each j 2
khTj k`2
and thus
X
j2

1
S0 khTj k` khTj1 k`1
S0

khT c k`1
1
khTj k`2 (khT1 k`1 + khT2 k`1 + . . .) 0
0
S
S0

In short

s
khk`2 (T0 T1 )

1 + S0 khT0c k`1

1 S+S0
S0

It then follows that


khk`1 (T0 )

S khk`2 (T0 )

S khk`2 (T0 T1 ) khT0c k`1 ,

2 =

S 1 + S 0
S0 1 S+S0

For each j 2
khTj k`2
and thus
X
j2

1
S0 khTj k` khTj1 k`1
S0

khT c k`1
1
khTj k`2 (khT1 k`1 + khT2 k`1 + . . .) 0
0
S
S0

In short

s
khk`2 (T0 T1 )

1 + S0 khT0c k`1

1 S+S0
S0

It then follows that


khk`1 (T0 )

S khk`2 (T0 )

S khk`2 (T0 T1 ) khT0c k`1 ,

Pick S0 = 2S, then


=

1 1 + 2S
<1
2 1 3S

iff

2S + 3S < 1

2 =

S 1 + S 0
S0 1 S+S0

Slight improvement

Lemma

khT0 k`1

2 2S
khT0c k`1
1 2S

A slight improvement
Definition (Restricted orthogonality numbers)
S,S0 : smallest quantity for which
|hx, x0 i| S,S0 kxk`2 kx0 k`2
x, x0 supported on disjoint subsets T, T 0 {1, . . . , n} with |T| S, |T 0 | S0

A slight improvement
Definition (Restricted orthogonality numbers)
S,S0 : smallest quantity for which
|hx, x0 i| S,S0 kxk`2 kx0 k`2
x, x0 supported on disjoint subsets T, T 0 {1, . . . , n} with |T| S, |T 0 | S0
Suppose x and x0 are unit vectors as above. Then
2(1 S+S0 ) kx + x0 k2`2 2(1 + S+S0 )
2(1 S+S0 ) kx x0 k2`2 2(1 + S+S0 )
Parallelogram identity
|hx, x0 i| =


1
kx + x0 k2`2 kx x0 k2`2 S+S0
4

A slight improvement
Definition (Restricted orthogonality numbers)
S,S0 : smallest quantity for which
|hx, x0 i| S,S0 kxk`2 kx0 k`2
x, x0 supported on disjoint subsets T, T 0 {1, . . . , n} with |T| S, |T 0 | S0
Suppose x and x0 are unit vectors as above. Then
2(1 S+S0 ) kx + x0 k2`2 2(1 + S+S0 )
2(1 S+S0 ) kx x0 k2`2 2(1 + S+S0 )
Parallelogram identity
|hx, x0 i| =
Therefore S,S0 S+S0


1
kx + x0 k2`2 kx x0 k2`2 S+S0
4

Go back to the proof of the recovery result and set S0 = S,


X
hT0 T1 = h(T0 T1 )c =
hTj
j2

gives
khT0 T1 k2`2 = hhT0 T1 ,

X
j2

hTj i = hhT0 ,

hTj i hhT1 ,

j2

S,S (khT0 k`2 + khT1 k`2 )

hTj i

j2

X
j2

khTj k`2

Go back to the proof of the recovery result and set S0 = S,


X
hT0 T1 = h(T0 T1 )c =
hTj
j2

gives
khT0 T1 k2`2 = hhT0 T1 ,

hTj i = hhT0 ,

j2

hTj i hhT1 ,

j2

hTj i

j2

S,S (khT0 k`2 + khT1 k`2 )

khTj k`2

j2

In other words and since khT0 k`2 + khT1 k`2


(1 2S )khT0 T1 k2`2 khT0 T1 k2`2

2khT0 T1 k`2 ,

2 S,S khT0 T1 k`2

X
j2

khTj k`2

Go back to the proof of the recovery result and set S0 = S,


X
hT0 T1 = h(T0 T1 )c =
hTj
j2

gives
khT0 T1 k2`2 = hhT0 T1 ,

hTj i = hhT0 ,

j2

hTj i hhT1 ,

j2

hTj i

j2

S,S (khT0 k`2 + khT1 k`2 )

khTj k`2

j2

In other words and since khT0 k`2 + khT1 k`2


(1 2S )khT0 T1 k2`2 khT0 T1 k2`2

2khT0 T1 k`2 ,

2 S,S khT0 T1 k`2

X
j2

and thus

khT0 T1 k`2 0

X
j2

khTj k`2 ,

0 =

2 S,S
1 2S

khTj k`2

We finish as before and obtain


khT0 k`1 0 khT0c k`1 ,

About the constant


Recall h = x? x and
khk`1 (T0 ) khT0c k`1

khk`1 = 2

1+
kx xS k`1
1

About the constant


Recall h = x? x and
khk`1 (T0 ) khT0c k`1

khk`1 = 2

1+
kx xS k`1
1

Since

2 2S
12S ,

we have

1 + ( 2 1)2S
1

1+
1 (1 + 2)2S

and
kx? xk`1 2

1 + ( 2 1)2S

kx xS k`1
1 (1 + 2)2S

About the constant


Recall h = x? x and
khk`1 (T0 ) khT0c k`1

khk`1 = 2

1+
kx xS k`1
1

Since

2 2S
12S ,

we have

1 + ( 2 1)2S
1

1+
1 (1 + 2)2S

and
kx? xk`1 2

1 + ( 2 1)2S

kx xS k`1
1 (1 + 2)2S

If 2S < 1/4, then


kx? xk`1 5.5kx xS k`1 !

What if one is interested in the `2 error?


The kth largest value of hT0c obeys
|hT0c |(k)

khT0c k`1
k

and, therefore, choosing S0 = S gives


kh(T0 T1

)c )

k2`2

kh

T0c

k2`1

n
X
khT0c k2`1
1

2
k
S

k=S+1

What if one is interested in the `2 error?


The kth largest value of hT0c obeys
|hT0c |(k)

khT0c k`1
k

and, therefore, choosing S0 = S gives


kh(T0 T1

)c )

k2`2

kh

T0c

k2`1

n
X
khT0c k2`1
1

2
k
S

k=S+1

With

2 2S
12S ,

previous analysis gives


1
khT0 T1 k`2 khT0c k`1
S
2
khT0c k`1
kx xS k`1
1

What if one is interested in the `2 error?


The kth largest value of hT0c obeys
|hT0c |(k)

khT0c k`1
k

and, therefore, choosing S0 = S gives


kh(T0 T1

)c )

k2`2

kh

T0c

k2`1

n
X
khT0c k2`1
1

2
k
S

k=S+1

With

2 2S
12S ,

previous analysis gives


1
khT0 T1 k`2 khT0c k`1
S
2
khT0c k`1
kx xS k`1
1

And thus
khT0c k`1
1 + kx xS k`1

2
1
S
S
This is exactly the same constant!
khk`2 (1 + )

Examples of matrices obeying the UUP

Gaussian matrices
Random projections
Binary matrices
Bounded orthonormal systems

Gaussian matrices

Phi =randn(m,n)/sqrt(m); indep. normal entries with mean 0 and


std. 1/ m

Gaussian matrices

Phi =randn(m,n)/sqrt(m); indep. normal entries with mean 0 and


std. 1/ m

Condition holds if
m & S log(n/S)

Random projections

X = randn(n,m); (sample m dimensional Gaussian vectors)


Phi = qr(X,0); (orthonormalize the columns and transpose)
Phi = Phi/sqrt(n/m); (renormalize)

Random projections

X = randn(n,m); (sample m dimensional Gaussian vectors)


Phi = qr(X,0); (orthonormalize the columns and transpose)
Phi = Phi/sqrt(n/m); (renormalize)

Condition holds if
m & S log(n/S)

Binary matrices

with symmetric i.i.d. entries P(i,j = 1/ m) = 1/2

Binary matrices

with symmetric i.i.d. entries P(i,j = 1/ m) = 1/2

Condition holds if
m & S log(n/S)

Binary matrices

with symmetric i.i.d. entries P(i,j = 1/ m) = 1/2

Condition holds if
m & S log(n/S)
DeVore et al., Pajor et. al.

Binary matrices

with symmetric i.i.d. entries P(i,j = 1/ m) = 1/2

Condition holds if
m & S log(n/S)
DeVore et al., Pajor et. al.
Same result is true for large classes of distributions, i.e. subGaussian
distributions

Bounded orthogonal systems


U : n n orthogonal matrix; U U = I
obtained by selecting
m rows from U uniformly at random (and
p
normalizing by n/m)
Recall coherence
(U) =

n sup |Uj,k |
j,k

Bounded orthogonal systems


U : n n orthogonal matrix; U U = I
obtained by selecting
m rows from U uniformly at random (and
p
normalizing by n/m)
Recall coherence
(U) =

n sup |Uj,k |
j,k

Theorem (C. and Tao (2004))


Condition holds if
m & 2 (U) S (log n)6

Bounded orthogonal systems


U : n n orthogonal matrix; U U = I
obtained by selecting
m rows from U uniformly at random (and
p
normalizing by n/m)
Recall coherence
(U) =

n sup |Uj,k |
j,k

Theorem (C. and Tao (2004))


Condition holds if
m & 2 (U) S (log n)6
Rudelson and Vershynin improved this and got (log n)5 instead

Bounded orthogonal systems


U : n n orthogonal matrix; U U = I
obtained by selecting
m rows from U uniformly at random (and
p
normalizing by n/m)
Recall coherence
(U) =

n sup |Uj,k |
j,k

Theorem (C. and Tao (2004))


Condition holds if
m & 2 (U) S (log n)6
Rudelson and Vershynin improved this and got (log n)5 instead
This is not trivial at all
Certainly not optimal
Improvements will be hard

Last time: incoherence

Sparse in basis 1
Undersample in basis 2
System
U = 2 1
Coherence
(U) =

n sup |Uj,k | =
j,k

(1)

(2)

n sup |hj , k i| = (1 , 2 )
j,k

Last time: incoherence

Sparse in basis 1
Undersample in basis 2
System
U = 2 1
Coherence
(U) =

n sup |Uj,k | =
j,k

(1)

(2)

n sup |hj , k i| = (1 , 2 )
j,k

Condition holds if
m & 2 (1 , 2 ) S (log n)5

What is Compressed Sensing?


Classical viewpoint
Measure everything (all the pixels, all the coefficients)
Keep largest coefficients: distortion is kf fS k

What is Compressed Sensing?


Classical viewpoint
Measure everything (all the pixels, all the coefficients)
Keep largest coefficients: distortion is kf fS k
Compressed sensing
Take m incoherent (e.g. Gaussian) measurements: y = f
Reconstruct by linear programming
f ? = arg min kf k`1

subject to f = y

What is Compressed Sensing?


Classical viewpoint
Measure everything (all the pixels, all the coefficients)
Keep largest coefficients: distortion is kf fS k
Compressed sensing
Take m incoherent (e.g. Gaussian) measurements: y = f
Reconstruct by linear programming
f ? = arg min kf k`1

subject to f = y

Same performance!
(Sketch) If m & S log(n/S),
kf ? f k kf fS k

Incoherent measurements compressed version of the object


Simultaneous signal acquisition and compression!

Incoherent measurements compressed version of the object


Simultaneous signal acquisition and compression!
All is needed is to decompress...

Incoherent measurements compressed version of the object


Simultaneous signal acquisition and compression!
All is needed is to decompress...

This suggests the possibility of compressed data acquisition


protocols which perform as if it were possible to directly acquire just
the important information about the image of interest.
Dennis Healy, DARPA

Das könnte Ihnen auch gefallen