Compressive Sensing For Super Resolution

U NIVERSITÀ DEGLI STUDI DI B ARI A LDO M ORO
M ASTER T HESIS
Compressive Sensing for Super

Resolution
Author: Supervisor:
Paolo Sylos Labini Prof. Sebastiano Stramaglia
September 27,2018
iii
Contents
1 Overview 1
1.1 Compressive acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Recovery conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Super-Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 1-pixel camera . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.3 Nonlinear predictions . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Compressive Sensing Theory 11

2.1 Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Compressibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Conditions for Recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.1 `0 minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 `1 minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.3 Noisy recovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.4 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.5 Restricted isometric Constant . . . . . . . . . . . . . . . . . . . . 18
2.2.6 Recovery with random matrices . . . . . . . . . . . . . . . . . . 19
2.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Orthogonal Matching Pursuit . . . . . . . . . . . . . . . . . . . . 20
2.3.2 Iterative Hard Thresholding . . . . . . . . . . . . . . . . . . . . . 21
2.3.3 NIHT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Other tresholding algorithms . . . . . . . . . . . . . . . . . . . . 23
iv
2.4 A toy problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 Digital Imaging 29
3.1 Representing digital images . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.1 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . 30
3.1.2 Discrete Cosine Transform . . . . . . . . . . . . . . . . . . . . . . 32
3.1.3 Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Manipulating Digital Images . . . . . . . . . . . . . . . . . . . . . . . . 35
3.2.1 Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.2 Blurring and Deblurring . . . . . . . . . . . . . . . . . . . . . . . 36
3.2.3 Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2.4 Affine transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4 Super Resolution 43
4.1 The forward acquisition model . . . . . . . . . . . . . . . . . . . . . . . 44
4.2 Registration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.3.1 The difficulties of reconstruction . . . . . . . . . . . . . . . . . . 46
4.4 Super-resolution methods . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5 Compressive Sensing for Super Resolution 49

5.1 The SR acquisition model revisited . . . . . . . . . . . . . . . . . . . . . 49
5.2 Recovery though ASR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.1 Sparsifying Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.2 Structurally Random Matrices . . . . . . . . . . . . . . . . . . . 51
5.2.3 The super-resolution factor . . . . . . . . . . . . . . . . . . . . . 51
5.2.4 Blur . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Normalised Iterative Hard Tresholding for Super Resolution . . . . . . 52
5.3.1 evaluating A and A∗ . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3.2 Applying the NIHT . . . . . . . . . . . . . . . . . . . . . . . . . 54
6 Experimental results 57
6.1 Synthetic images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
v
6.2 Natural images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.3 Real world images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3.1 Results on "Target" dataset . . . . . . . . . . . . . . . . . . . . . 62
6.3.2 Results on "Car" dataset . . . . . . . . . . . . . . . . . . . . . . . 64
7 Conclusions 65
vii
List of Figures
2.1 ` p unit spheres in two dimensions . . . . . . . . . . . . . . . . . . . . . 12

2.2 A signal x in the canonical basis. N = 300 . . . . . . . . . . . . . . . . . 24
2.3 The signal x of fig 2.2 is 10-sparse when represented in the cosine ba-
sis. N = 300, k = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 The measured vector y, N = 300, M = 60 . . . . . . . . . . . . . . . . . 25
2.5 The recovered coefficient vector w, N = 300, k = 10, e < 10−14 . . . . . 26
2.6 Measured MSE vs relative number of measurements. Note that error
drops to zero around M = 4k. ( N = 300, k = 10) . . . . . . . . . . . . . 27
2.7 Percentage of perfect reconstructions vs relative number of measure-
ments. Note that accuracy approaches one around M = 4k. ( N =
300, k = 10) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 A 250 · 250 black and white digital image . . . . . . . . . . . . . . . . . 29

3.2 A digital image (left); its DCT coefficients, sorted by magnitude (rigth) 32
3.3 Four diverse periodic and symmetric extension gives rise to four dif-
ferent types of DCT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 An image (left), its Haar coefficient arranged in decreasing order (right) 35
3.5 A digital image (left). Its gaussian-blurred version, σ = 3 (right) . . . . 37
6.1 Synthetic dataset: Percentage of perfectly reconstructed coefficients

(DCT basis), plotted against sparsity k, for various number of acqui-
sitions m. ( N = 10000, λ = 4) . . . . . . . . . . . . . . . . . . . . . . . . 59
(db8 wavelet basis), plotted against sparsity k, for various number of
acquisitions m. ( N = 10000, λ = 4) . . . . . . . . . . . . . . . . . . . . . 59
viii

(DCT basis), plotted against λ for a variety of m and high k. ( N =
10000, k = 3000) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
(DCT basis), plotted against λ for a variety of m and low k. ( N =
10000, k = 300) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 Natural dataset: An example of the reconstruction process. (m =
6, λ = 4, N = 256 · 256) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.6 Natural dataset: MSE and reconstruction accuracy plotted vs the num-
ber of acquisition, m, for the DCT reconstruction scheme. (λ = 4, N =
256 · 256) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
6.7 Natural dataset: MSE and reconstruction accuracy plotted vs the num-
ber of acquisition, m, for the wavelet(db8) reconstruction scheme. (λ =
4, N = 256 · 256) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.8 Real world dataset: DCT-based reconstruction of the "Target" HR im-
age (on the left) from m = 5 LR samples (rigth). (λ = 4, N = 458864 . 63
6.9 Real world dataset: DCT-based reconstruction of the "Car" HR image
(on the left) from m = 5 LR samples (rigth). (λ = 4, N = 219296 . . . . 64
ix
List of Abbreviations
CS Compressive Sensing
SR Super Resolution
HR High Resolution
LR Low Resolution
IHT Iterative Hard Thresholding
NIHT Normalised Iterative Hard Thresholding
OMP Orthogonal Matching Pursuit
DCT Discrete Cosine Transform
1
Chapter 1
Overview
The ability to easily acquire, store and manipulate very high dimensional data is cru-
cial in many fields, from magnetic resonance imaging to networks tomography, from
the study of dynamical systems to seismic prediction and radio astronomy. When
we measure such signals, we usually capture all of their many components and then
store only the most important ones, knowing that reconstructions of the signal from
those few coefficients will suffer from minimal error. This strategy is far from opti-
mal, and particularly so when acquisition is costly: we capture all the information in
the first place, even if we are going to ignore most of it. Couldn’t we measure only
those few, important coefficients in the first place?
Compressive sensing (CS), also known as sparse recovery, refers to the idea that
certain signals indeed can be precisely reconstructed from a small number of non-
adaptive linear measurements, provided such measurements satisfies certain con-
ditions. The topic of CS, originating from the seminal works of Donoho [13], Tao
and Candès [5] , has rapidly developed into a wide research area; the discovery of
efficient and reliable algorithms with solid theoretical guarantees has touched all the
above-mentioned fields of study and many others, causing a blossoming of new and
exciting applications.
The mathematical tools developed for compressive sensing have been widely ap-
plied to all kinds of inverse problem, and in this thesis we will apply a collection of
techniques born in this field to the problem of super-resolution (SR) - the task of
reconstructing an high-resolution image from its low-resolution acquisitions. Our
2 Chapter 1. Overview
first task will be that of casting the multiple image super-resolution problem as a
CS problem. Then, we will be confronted with evaluating and manipulating the
measurement matrices, so that they are well conditioned for compressive sensing.
Finally, we will be able to recover high-resolution images from collections of low-
resolution samples.
The thesis is organized as follows:
1. In Chapter 1, we provide an overview of the main concepts in compressive

sensing, sketching three applications: super-resolution, our main focus in this
thesis; 1-pixel cameras [14], a special kind of image acquisition devices which
heavily rely on CS ideas; and finally a method for solving non-linear dynamical
systems [26] for loosely coupled systems of high dimension.
2. In Chapter 2, we review the mathematical framework of Compressing Sens-

ing, recalling the key concepts we will be using in the rest of the thesis, along
with normalized Iterative Hard Thresholding [2] - our algorithm of choice for
tackling the super-resolution problem. We end the chapter with a classic toy
problem in CS - retrieving a sum of sinusoids from few random measures - in
order to show the methods and possibilities of the theory.
3. In Chapter 3, we introduce some concepts and methods from digital signal

processing [22], with particular attention to digital images and sparse repre-
sentations. This will be useful when, later, we define the super-resolution task
and cast it as a CS problem.
4. In Chapter 4, we present the super-resolution problem, reviewing some of the

most known and successful approaches [17][19][22][27].
5. In Chapter 5 we present our theoretical approach to super-resolution - we re-

state it as a CS problem, and show how to solve it through the application of
normalized Iterative Hard Thresholding
6. in Chapter 6, we test our approach on super-resolution problems for three

dataset of low-resolution images: synthetic images, obtained from known trans-
formations of a randomly generated artificially sparse vector; natural synthetic
1.1. Compressive acquisition 3
images, obtained from known transformations of a given digital image; and

finally real world images, obtained from a real camera.
1.1 Compressive acquisition
A linear acquisition process of a signal x ∈ C N - resulting in a measurement vector

y ∈ C M - is modeled by a measurement matrix, A ∈ C N · M ,
y = Ax + n (1.1)
where the noise term n accounts for any measurement error. For N > M, that is,
when the number of measure is less than the signal length, equation 1.1 is underde-
termined, thus not uniquely solvable in general: priors need to be assumed on the
domain of A for the reconstruction to be even theoretically feasible.
Compressive sensing works precisely in a special kind of restrained domains,
those of sparse vectors. A vector is said to be k-sparse in a basis when its support
is spanned by less than k vectors of that basis. This is a property first appreciated
in the field of signal compression: a sparse vector can be stored or transmitted just
by recording its non-zero entries location and magnitude. A more versatyle object,
a compressible vector, need not to be sparse, but his non-zero components must
decrease fast enough that it is well approximated by a sparse vector.
Sparse vectors live in a small, non-convex subset of their native linear space.
Intuitively, this means that the complexity or “intrinsic” information content of such
signals is much smaller than its length. In addition to storing and transmission,
one may hope that even acquisition may be sped up by sparsity, using a number
of measures proportional to this intrinsic information content rather than the signal
length.
Two questions are to be set in order to make this intuition into a well built theory.
First, we will ask ourselves what linear measurements processes are well suited for
the recovery of sparse vectors. Then, when we are sure all the relevant information
about a signal is stored in a measurement vector y, we should know how to faithfully
and efficiently reconstruct the original signal from this information, even in presence
of noise or sparsity defects; that is, we will need to find reliable, stable and robust
reconstruction algorithms.
1.2 Recovery conditions
We will see in Chapter 2 that not every matrix is well suited for sparse recovery. If
different sparse signals get sent by a measurement matrix A into the same measure-
ment vector, no amount of computation will be able to distinguish them. This insight
is formalized by results regarding the mutual coherence of A’s columns, that is, cor-
relations between images through A of the basis vectors. If the coherence is below
a certain threshold recovery is indeed possible. Since coherence bounds are often
pessimistic, recovery guarantees are sometimes refined introducing the Restricted
Isometry Property (RIP) for matrices . A number of related recovery theorems [7]
states that A should act roughly as an isometry on k-sparse vectors for recovery to
be possible. They quantify the distance from a perfect isometry with an appropriate
restricted isometry constant (RIC) δk , and demand that such constant be small.
It’s still an open problem that of explicitly constructing matrices that are prov-
ably optimal in a compressive sensing setting: RIP is not computationally easy to
verify for a given matrix, nor it is easy to build ad-hoc matrices satisfying it [24].
In this thesis, we’ll have little freedom in building our matrix, since the measure-
ment process we will study, the low-resolution acquisition of real-word images, will
be already set by the given application. Yet, it is worth mentioning a key result in
compressive sensing stating that, with high probability on the random draw of a
Gaussian or Bernoulli matrix A ∈ C M× N , A is going to satisfy the RIP condition as
long as the number of measure M satisfies
M > Ck log ( N/k ), (1.2)
where k is the maximal sparsity of the vectors to recover, N is the length of the
original signal and C is a constant independent from both N and k. This result,
whose discovery by Emmanuel Candès and Terence Tao [5] essentially kick-started
the field of compressive sensing from theory into the real world, provides a universal
1.3. Algorithms 5
way to implement compressive sensing ideas into a variety of settings, and has far
reaching implications. It also gives an optimal bound for the minimum number of
measures one needs in order to reconstruct a k-sparse vector, and will provide a
benchmark for the performances in our own task.
1.3 Algorithms
In addition to fine theoretical results, a number of efficient algorithms has been de-
veloped for the reconstruction in practice. These algorithms are also stable, meaning
their error stays under know and acceptable bounds when the signal are not exactly
sparse and the measurements are inaccurate. The central problem of compressive
sensing is called `0-minimization
x = arg min kzk0 subject to y = Az (1.3)

z ∈C N
where k x k0 is a shorthand for the sparsity of x. The equation asks us to find the
sparsest vector compatible with the given measurements. Despite its simple state-
ments, it is provably an NP-hard problem [1] - no polynomial time algorithm is
expected to solve it, making it an infeasible task to use it to recover the big vec-
tors usually involved in signal processing. Fortunately, its convex relaxation, `1-
minimization
x = arg min kzk1 subject to y = Az (1.4)
z ∈C N
is not plagued by the same limitations, and yet its implementations finds the rigth
solution in a number of different setting. (1.4), called basis pursuit, is one of the most
studied [9][16][18] problem in optimization science, and comes with a number of
theoretical guarantees and efficient solving algorithms. It’s easily extended to the
noisy setting, where it’s closely linked to the famous LASSO problem.
While `1 recovery is simple and theoretically pleasing, it’s rarely the right choice
for very big problems. In chapter 2 we will briefly describe a greedy method, Or-
thogonal Matching Pursuit[25], that iteratively enlarges the estimated support trying
to identify the entries most correlated with the signal, but we will concentrate most
of the attention on the (normalized) Iterative Hard Thresholding (IHT), our algo-
rithm of choice for the Super Resolution problem. IHT is concisely represented in
the following box
Algorithm 1 normalized Iterative Hard Tresholding

1: repeat
2: xn+1 = Hk ( xn + αn A∗ (y − Axn ))
3: until stop critierion is met
Iterative hard thresholding performs a gradient descent on the error function

k Ax − yk22 , and enforce k-sparsity at every step by through Hk , a strongly non-linear
operator that keeps only the biggest k entries of its argument. The factor αn is equal
to one in the standard version of IHT, but we’ll be instead adjusting at every step to
ensure convergence.
1.4 Applications
Having surveyed the theory, we move to the applications. As we have seen, for a
measurement problem to be casted in the CS framework there need to be two con-
ditions: that the signal of interest is sparse in some basis, and that the measurement
matrix is able to extract all the relevant information from the signal in that basis.
We’ll now provide some intuitions regarding how this applies to the problem of
Super-resolution, the main argument of this thesis, and then end the chapter with
two more applications example.
1.4.1 Super-Resolution
Super-resolution (SR) is the name for a number of techniques that recover a high-
resolution (HR) image from one or several of its low-resolution (LR) acquisitions,
removing the degradations caused by the imaging process of a camera. SR aims
to combine the non-redundant information contained in multiple low-resolution
frames to generate a high-resolution image.
In order to apply compressive sensing to this problem, we’ll need to find a basis
that sparsifies the image set of interest. A number of bases are known to sparsify
“natural” images, the most known being the discrete cosine basis - the representation
1.4. Applications 7
at the heart of the JPEG compression - and the wavelet basis. Whatever the choosen
basis, the HR image will be decomposed as x HR = BwHR , where B is the basis
change and wHR a sparse vector.
The acquisition process of the i-th image is easily formalized as an affine trans-
form Ti followed by a filter Hi and a downsampling matrix Di . So, a set of sparse
coefficient wHR in the wavelet or cosine basis gives a vector y of measures
yi = Di Hi Ti Bw HR , i = 1...m
If multiple images are taken, we can stack their acquisition process on top of each
other - for example, setting T̄ = [ T1 , T2 ....Tm ] - and group these staked matrices in a
single measurement
A = D̄ H̄ T̄B
If the matrix A satisfies the appropriate recovery conditions, we should be able to

recover the coefficient vector w HR (and thus, the HR image). This is far from trivial,
for we will have little freedom in building the matrix.
1.4.2 1-pixel camera
1-pixel cameras[14] are a special kind of image acquisition devices where the mea-
surement process is carried out by a number of random projection of the whole
image, each measured by a single pixel at different (but very close) times. This is
achieved through a microarray of mirrors that can be individually turned on and
off. An image that arrives on the camera lens is directed to the microarray, and from
there to a lens that combines all the incoming light on a single sensor.
Formally, the microarray and lens combined realize an inner product between
the image v ∈ R N and a specific configuration of the mirrors, a ∈ {0, 1} N . By
fast changes of the mirrors, one can observe multiple inner products yi = hv, ai i as
intensity on the single pixel at times ti . This is restated as a linear equation y = Ax,
were A = {ai } is akin to a random Bernoulli matrix if the mirrors are switched on
and off randomly in each configuration. If x is sparse in some basis B other than the
canonical one, such as the wavelet one, we may write y = ABw, and recover w (and
thus x = Bw) with only few measurements. For situations where achieving a high
pixel density is technically impossible or simply too expensive, the potential of such
a setup is really expected to pay off.
1.4.3 Nonlinear predictions
In complex system identification, the aim is to infer the mathematical equations that
govern the dynamical evolution of a system from measurements of that system’s lo-
cation in phase space at different time. This may be useful for various reasons, from
predictions of the system’s future behavior to estimation of the causal relationships
among its components. We consider a first order dynamical system
x0 = F(x)
where x ∈ R N represents the set of externally accessible dynamical variables and F

is a smooth nonlinear function.
In order to understand how CS applies to this task, consider as suggested in [26]
the expansion of the j-th component of F into a power series up to order m:
m
Fj (x) = ∑ (a j )l ....l
1 N
x1 l1 , ...., x N l N (1.5)
l1 ...l N
with a total of (1 + m) N coefficients to determine. If we probe the system at certain

times (ti , ..., tn ), we will have a collection of positions x(ti ) and velocities v(ti ) =
F(x(ti ))). Defining
g(t) = [ x1 (t)0 x2 (t)0 x...x N (t)0 x1 (t)1 x2 (t)0 x...x N (t)0 x1 (t)m x2 (t)m ...x N (t)m ]
and
G = [g(t1 ), ..., g(tn )]
we can write
[F(x(ti ))] j = g(ti ) · aj .
1.4. Applications 9
the measurment process is now concisely expressed by
v j = Ga j
It is a reasonable assumption - equivalent to that of a loosely coupled system -

that the coefficient vector a j should be sparse, so that only few unknown combina-
tion of powers contribute to the behavior of the system. Also, if we are careful (or
just lucky) with the choice of the measurement times, the columns in G will be un-
correlated with each-other. Thus, compressed sensing algorithms can be applied to
determine a j , and the same procedure can be repeated for every other component,
thus unveiling the power expansion of F. These powerful techniques has been ap-
plied to a various range of problem in nonlinear dynamics, such as predicting the
location in parameter-space of catastrophic bifurcation, tipping points or attractors,
and in the study of complex networked systems.
11
Chapter 2
Compressive Sensing Theory
We will now formalize and expand over the concepts of Chapter 1. The most of the
basic concepts appearing in this chapter we will follow the elegant mathematical
treatment of Foucart and Rauhut in [16].
2.1 Sparsity
The support of a vector x ∈ C N is the index set of its non-zero components:
supp(x) := { j ∈ [1, ..., N ] : x j 6= 0} (2.1)
The support obviously depends on the basis choice. A basis vector e j , for example,
has always its own index j as only element of its support. Given a basis (we will
often omit this precisation henceforth) a vector with at most k non-zero coefficient is
said to be k-sparse in that basis iff
kxk0 := |supp(x)| ≤ k
The notation kxk0 is sometimes reffered at as `0 pseudo − norm, and we will use
it to indicate the sparsity of a vector. This "norm" notation comes from the trivial
observation that k·k0 can be intended as the limit of the k·k pp quasi-norm for p → 0.
Figure 2.1 provides a visual demonstration of this fact for 2 dimensional ` p -spheres.
12 Chapter 2. Compressive Sensing Theory
F IGURE 2.1: ` p unit spheres in two dimensions
Given a basis, we may want to study the set of k-sparse vectors for a given linear
space V. We refer to this set as Vk :
Vk := {x ∈ V : kxk0 ≤ k }.
This is not a linear subspace of V, since the sum of two k-sparse vectors can have
as much as 2k non-zero component - this happens when the supports of the vectors
being summed are disjoint. Vk is a non-convex subset obtained as union of linear
subsets - those spanned by collections of k basis vectors. We refer to the subset
spanned by a specific index set S by VS .
VS := {x ∈ V : supp(x) ⊂ S.}.
In a similar fashion, we define vS to be the projection of a vector v over the set VS .
2.1.1 Compressibility
In practice, sparsity can be a strong constraint to impose, and we may prefer the
weaker concept of compressibility. For instance, we may consider vectors that are
nearly k-sparse, as measured by the ` p error of their best k-term approximation.
2.1. Sparsity 13
σ( x ) p,k := lim inf kx − zk p

z∈ Vk
the infimum is achieved by a k-sparse vector z ∈ C N that shares with x the k

largest absolute entries of x, and is zero otherwise. We may call x a compressible
vector if the error of its best k-term approximation decays quickly (as a power law)
in k;
Here follow some examples of sparse and compressible vectors appearing in var-
ious scientific fields:
1. Many natural signals are approximately sparse in the frequency or wavelet

bases (see Cap 3).
2. In network theory, sparse matrices represents loosely coupled systems.
3. Vectors belonging to the unit sphere of < N , equipped with norm l p , 0 < p < 1
are compressible - their sorted coefficients x j decays with a power law in j.
4. In signal processing, errors can be sometimes modeled as sparse or compress-

ible vectors - the difference between the original signal and the receveid is
non-zero only for a few faulty coefficients.
2.1.2 Dictionaries
The notion of sparsity is not limited to bases, but extends to dictionaries as well. A
dictionary is a spanning sets for a given linear space. Vectors belonging to a dictio-
nary, called atoms, need not to be independent - if they are, the dictionary is simply a
basis. Dictionaries are thus usually redundant or overcomplete: they admit multiple
decompositions for the same vector. Overcompleteness is a great sparsity enhancer,
as a richer dictionary helps to build shorter sentences. For a trivial example, consider
that the biggest possible dictionary, the linear space itself, is also the one providing
the sparsest representations - it uses a single “word” for every single vector. Exam-
ple of overcomplete dictionaries are concatenation of orthonormal bases.
One may also ask, given a certain subset S of a linear space, if there exist a
dictionary (or a basis) that “sparsify” its elements, that is, one that minimizes the
maximum or cumulative sparsity of vectors in S. This is, unfortunately, an NP-hard

problem in general, though procedures exists to find sub-optimal substitutes of such
dictionaries.
2.2 Conditions for Recovery
Now that we have defined sparse vectors, we’ll act on them with a linear operator in
order to retrieve a measurement vector. We will then be interested in how to recover
uniquely the original sparse vectors from few measures. Let’s consider again V, a
linear space over the field C. Let B ∈ C N × N be a basis change and P ∈ C N × M a
measurement matrix. We will be interested in solving the linear equation
y = Ax A = PB (2.2)
For N > M, (2.2) is underdetermined. An underdetermined linear system is one

with more degrees of freedom than constraints, that is, it has more variables than
equations. Such system has either infinitely many solutions or none at all; thus, its
action on a vector cannot be inverted unambiguously.
2.2.1 `0 minimization
In general, we are not interested in every solution of (2.2), but only in those be-
longing to Vk the set of k-sparse vector in base B. We state this requirement in the
following optimization problem
x = arg min kzk0 subject to y = Az, (2.3)

z ∈C N
so that a k-sparse vector is the unique minimizer of (2.3) if and only if it is also the
unique k-sparse solution of y = Ax.
We now ask: for what choices of A, if any, our intuition is correct and we are able
(for now, let’s not concern ourselves with how) to recover uniquely a k-sparse signal
x from M measures, M < N? We may state that the information of content of Vk , or
its “degrees of freedom”, should roughly equal its sparsity - in some applications, a
2.2. Conditions for Recovery 15
much smaller number than its full length. Thus, we may hope to recover an N-long,
k-sparse vector in Vk with a number of measures of the order of k, rather than N.
As an example, let’s construct A from {el1 , ..., elm }, m arbitrarily chosen column of
the identity matrix. A is then a simple projection that picks m out of N components,
and discard the others. When such a matrix is applied to a k-sparse vector, we’ll be
able to recover that vector if and only if its support lies in the span of {el1 , ..., elm } :
any other component will be destroyed by the measurement process. In general we
are interested in recovering sparse vectors of unknown support, so that such matrix
will never have the desired recovery property.
Any matrix A that goes from a linear space to another lower-dimensional lin-
ear space has a non-trivial null space, ker A 6= {0}. From our last observation we
now know that the null space of A cannot contain any basis vector bi (and indeed,
any k-sparse vector), for otherwise necessary information would be destroyed, and
recovery will be impossible. We can refine this intuition in the following theorem,
showing that not even 2k-sparse vector should be allowed in the null space of A :
Theorem 2.2.1 The following two proposition are equivalent:
1. Every k-sparse vector z ∈ Vk is the unique k-sparse solution of Ax = Az;.
2. The only 2k-sparse vector in the null space of A is 0: ker( A) ∩ V2k = {0}.
Proof :
(2) → (1) Suppose that Ax = Az, with x, z ∈ Vk . Then A(x − z) = 0, but since
v = x − z is 2k-sparse, and since there are no 2k-sparse vectors in ker( A) apart from
0, this means x = z.
(1) → (2) Consider v ∈ V2k ∩ ker( A). As for any 2k-sparse vector, we can write
v = x − z for some x, z ∈ Vk . Since v is in ker( A), we can write Ax = Az, and (1)
then implies x = z, that is, v = 0;
Theorem 2.2.1 demonstrates that, in order to achieve recovery, every submatrix
formed from 2k columns of A must be injective - thus providing a lower bound
m > 2k for the number of measures needed to ensure reconstruction.
There exists algorithms to perfectly recover an unknown k-sparse vector from 2k
exact measures. However, in real world problems x is never exactly sparse, but at
most compressible. If we also require the reconstruction scheme to be stable (i.e., to

recover a compressible vector with an error bounded by its sparse approximation
error), then the minimal number of required measurements additionally involves an
additional factor of ln( N ), so that recovery will never be stable with only 2k mea-
surements.
2.2.2 `1 minimization
We should find an alternative approach to the optimization problem 2.3, considering

its NP-hardness. We start recalling that k·k pp quasi-norm approaches k·k0 as p → 0.,
so that we may approximate the first problem by
x = arg min kzkq subject to y = Az, (2.4)

z ∈C N
and hope it is easier to solve than what we started with. For q > 1, even basis vectors
are not solutions of 2.4. For 0 < q < 1, it is again a nonconvex problem, which is also
NP-hard in general. q = 1, on the other hand, gives rise to the convex optimization
problem
x = arg min |z| subject to Az = y (2.5)
z ∈C N
Problem 2.5 is called basis pursuit or `1-minimization. A number of efficient algo-

rithms exists to actually compute its minimizer. It can be shown that this minimizer
is the same founded by 2.3, provided A satisfies a simple property akin to the second
proposition in 2.2.1.
Theorem 2.2.2 Given a matrix A ∈ C N M and a sparsity k, every k-sparse vector is the
unique minimizer of the `1 minimization problem 2.5 iff
kvk1 < 2σ(v)1,k k ∀v ∈ ker A, v − 0 (null space property)
We recall that σ(v)1,k is the k-approximation `1 error of v, so that recovery is

ensured if vectors in ker( A) are bad approximation of k-sparse vectors. `1 mini-
mization, also known as basis pursuit, was quite important to the developement of
compressed sensing theory, as it was the object of study of Tao, Candes, Romber
2.2. Conditions for Recovery 17
[5] and Donoho [13] in their seminal works, and later refined in a number of works
such as [4][6][9][18][23]. In these works it is demostrated how linear programming
techniques can be used to efficently recover sparse signals from few measurment.
2.2.3 Noisy recovery
We have established conditions for the recovery of exactly sparse vectors from per-
fectly know measures. As already suggested, this is not a plausible situation in real-
world problem, where we want our algorithms stable (against sparsity defects) and
robust (against noisy measurements).
Problem 2.5 is easily extended to the noisy setting:
x = arg min |z| subject to k Az − yk < η (2.6)

z ∈C N
When casted in this form, it is called quadratically constrained basis pursuit or

basis pursuit denoising. It has strong similarity with the famous LASSO optimiza-
tion problem. The null space property (2.2.2) can also be refined to enforce exact
recovery in this noisy case, but we will now define two useful tools in determin-
ing the recovery property of a matrix: mutual coherence and the restricted isometry
property.
2.2.4 Coherence
Mutual coherence is a measure of the correlations among the images of 1-sparse

vectors through A.
Definition 2.2.2.1 For a matrix A ∈ C M× N with `2-normalized columns {ai }i = 1...M,

mutual coherence, or simply coherence µ(A) is defined as
µ( A) := max |hai , a j i|
i< j
it follows from the Cauchy–Schwarz inequality hai , a j i ≤ k ai2 k a j 2 , that the co-
herence of a matrix with normalized columns never exceeds one, µ ≤ 1.
Coherence of a matrix A ∈ C M× N has also a lower bound:

s
N−M
µ( A) ≥ (2.7)
M ( D − 1)
Coherence was first analyzed in the context of signal processing, and then en-
tered the field of sparse representation in [8]. In the compressed sensing setting we
will seek incoherent matrices, that is, matrices with low coherence. Incoherence is a
useful thing to have when solving an underdetermined problem: when two columns
of the measurement matrix A are highly correlated, it is impossible in general to de-
termine if the energy of a measures comes from one or the other.
2.2.5 Restricted isometric Constant
A finer indicator of the recovery property of a matrix is the Restricted Isometry Con-
stant (RIC) [7].
Definition 2.2.2.2 The restricted isometry constant of order k for a matrix A ∈ C M× N is

the smallest δk ≥ 0 such that
(1 − δk )kxk2 ≤ k Axk2 ≤ (1 + δk )kxk2 . (RIC)
for every k-sparse vector x.
A matrix with small δk acts roughly as an isometry for k-sparse vectors. Also, a low
RIC implies a low coherence, as it can easily be seen that
δk = (K − 1)µ.
Matrices with small enough RICs guarantee perfect or error-bounded recovery for
a number of reconstruction strategies, as we will see in section 2.3. Before that, we
will briefly consider the problem of finding such matrices.
2.3. Algorithms 19
2.2.6 Recovery with random matrices
One of the most foundamental (and useful) result in compressed sensing theory is
that random projection of a compressible, high-dimensional signal onto a lower-
dimensional space will almost certainly contain enough information to enable signal
reconstruction. This usefulness comes from the fact that no efficient procedure exists
that deterministically produces matrices with small coherence or RIC. Yet a number
of results, such as [6], showed that a random Gaussian or Bernoulli matrix A ∈
C M× N will have with high probability a RIC small enough for recovery of k-sparse
vectors if
M > Ck ln( N/k ). (2.8)
Were C is a constant that depends on the particular recovery strategy, but not on N
or k. Similar results holds for the wider class of subgaussian random matrices. Tao
also argues that this bound is in a certain sense optimal, that is, non-adaptive strate-
gis cannot reconstruct k-sparse vectors with a number of measurments that grows
slower than k log( N ). Numerical experiments in [6] showed that the prescription
M > 4k (2.9)
permits reconstruction in almost all practical application, with the factor growing
over 4 for sub-optimal measures.
2.3 Algorithms
We already explored the possibilities of basis pursuit, even if no specific algorithm

was presented. We’ll now move on to different methods better suited to the analysis
of big datasets. After briefly considering Orthogonal Matching Pursuit as an exam-
ple of a greedy algorithm, we will concern ourselves with Iterative Hard Threshold-
ing, the algorithm of our choice for the Super Resolution problem.
2.3.1 Orthogonal Matching Pursuit
The Orthogonal Matching Pursuit (OMP) algorithm [21][25] iteratively enlarges the
target support Sn , and then updates the target vector xn as the vector supported on
Sn that minimizes the measurement error. Here it is:
Algorithm 2 Orthogonal Matching Pursuit

input: A, y
start: S0 = 0, x0 = 0
repeat
jn+1 = arg max | A∗ (y − Axn )| j (OMP1 )

j
Sn+1 = Sn ∪ jn+1 (OMP2 )

xn+1 = arg min kz − yk2 (OMP3 )
z ∈C N
until |S| = k or k Az − yk2 < e
The rule used by OMP to select which index to add to the support is dictated by
a greedy strategy where one aims to minimize at each step the residual norm k Az −
yk2 ; we’ll now give a result due to [ref] suggesting why adding to the support an
index j maximizing | A∗ (y − Axn )| j forces a decrease in the `2-norm of the residual.
Theorem 2.3.1 Consider an `2-normalized matrix A ∈ C M× N , S an index set and j an

index outside S. If
x := arg min ky − Az|2
z∈VS+ j
then
ky − Ax|22 ≤ ky − Av|22 − | A∗ (y − Av)| j
It is shown in [25] that if a matrix A ∈ C M× N satisfies
1
δk+1 ( A) < √ (2.10)
1+2 k
than 2 can exactly recover an arbitrary k-sparse signal (in k steps) from measures
through A. In the same paper, reasonable error bounds are retrieved in the case of
inaccurate measurments.
2.3. Algorithms 21
2.3.2 Iterative Hard Thresholding
The next algorithm we consider, Iterative Hard Thresholding [3][15][20], makes use
of the same quantity studied in theorem (2.3.1), - A∗ (y − Av - but in a slightly dif-
ferent manner.
Algorithm 3 Iterative Hard Tresholding

input: measurement matrix A, measurement vector y, vdesired sparsity k
start: x0 = 0
repeat
zn+1 = xn + A∗ (y − Axn ) (IHT1 )
xn+1 = Hs (zn+1 ) (IHT2 )

until stop criterion for residue is met
Here, Hk is the non-linear operator that sets to zero all but the k largest (in magni-
tude) coefficient of a vector. If there is no unique such set, a set can be selected either
randomly or based on a predefined ordering of the elements. The stopping crite-
rion can involve the attainment of fixed number of iterations, a fixed `2 difference
between consecutive iteration of x, or of a fixed `2 norm of the residual.
The motivation behind IHT is this: instead of solving Ax = y, we solve the
square system
A∗ Ax = A∗ y
Such system can be recasted as the fixed-point equation
x = ( Id − A∗ A)x + A∗ y. (2.11)
Then classical iterative methods suggest the fixed-point iteration xn+1 = ( Id − A∗ A)xn +
A∗ y, that is equivalent to step I HT1 .
To gain some more intuition on this first step, note that the quantity A∗ (y − Ax)
is, in fact, proportional to the gradient with respect to x) of k Ax − yk22 . The second
step I HT2 , which only requires the application of the thresholding operator Hk , is
a simple way to enforce k-sparsity at each iteration. We can look at this step as a
projection on the non-convex set of k-sparse vectors. Thus, IHT is a form of gradient
descent that regularizes the current state at each step by forcing k-sparsity on it.
The convergence of 3 was studied in [3], were was shown that if
1
δ3k ( A) < √ (2.12)
32
the algorithm converge to the rigth solution almost certainly.

IHT is a very conceptually and computationally simple algorithm. Its compu-
tational bottleneck is due to the operators A and A∗ : if they are general matrixes,
evaluating their action on a vector will need O( M × N ) operations per step. Fortu-
nately, for large problems, we can use structured operators such as the fast fourier
transform that substantially reduce the number of operations. For our simulations,
we will use a slightly modified version of IHT, the normalized IHT, as it provised
some additional guarantees, along with faster result.
2.3.3 NIHT
Numerous variants of tresholding algorithms were proposed in the literature, expe-

cially in the field of image processing, and denoising in particular.
The most relevant to this thesis, as it will be employed in the numerical simu-
lation of Chapter 6, is normalized iterative hard thresholding (NIHT), first defined
in [2]; there, it is noted that IHT as defined in the previous section suffers from a
relevant flaw - it is sensitive to rescaling of the matrix A. The usual strategy for this
problem is to normalized the columns of A, but it is shown that the algorithm perfor-
mances in this setup fall short of the nera ideal bounds achievable with other strate-
gies such as `1-minimization. To overcome this weakness, NIHT autonomously ad-
just the step size of the gradient descent in 3, accelerating it and ensuring the con-
vergence even when the columns of the matrix A are not normalized. That is, it
substitute step I HT1 with
zn+1 = xn + αn A∗ (y − Axn ) (NIHT1 )

2.4. A toy problem 23
The same author proposes a method for choosing αn + 1:
kxn+1 − xn k22
αn + 1 < (2.13)
k A(xn+1 − xn )k22
The procedure requires to check if αn stays under the desired value, and if not, to
resize it appropriately and repeat the step. NIHT has the great advantage of always
converging to a local minimum, irrespectively of the isometric constant and of the
scaling of the matrix A. Furthermore, if the bound in (2.12) holds, this minimum is
guaranteed to be (an error-bounded distance from) the solution of the `0 minimiza-
tion problem.
2.3.4 Other tresholding algorithms
Another possibilities is to use the operator Hθ , defined for θ > 0 as



0
 if |x j | < θ
Hθ (x) j =

xj
 if |x j | ≥ θ
Hθ nullifies every coefficient under a certain treshold θ.

As another example, we consider the iteartive soft thresholding algorithm; it
makes use of the soft thresholding operator, Sθ ,


0
 if |x j | < θ
Sθ ( x ) j =

sign( x j )[| x j | − θ ]
 if |x j | ≥ θ
2.4 A toy problem
In this section, we provide a brief demonstration of the Compressive Sensing frame-

work by retrieving a N-long signal that is sum of few sinusoids from a small number
M of measurement comparable with its sparsity k:
M ≈ k << N.
Let’s consider a signal x ∈ C N , fig. 2.2.
The signal is k-sparse in the discrete cosine basis if its decomposition
F IGURE 2.2: A signal x in the canonical basis. N = 300
xi = ∑ aω cos(ω · i) (2.14)
{ω }
has at most k nonzero coefficient { a1 , ..., ak }. fig. 2.3.
F IGURE 2.3: The signal x of fig 2.2 is 10-sparse when represented in

the cosine basis. N = 300, k = 10
We acquire M random measures of the signal, M << N; fig. 2.4.

We now use Iterative Hard Thresholding for recovering the original signal x from
the measurement vector y. The last iteration of the process is displayed in fig 2.5,
showing that the signal has been faithfully reconstructed.
F IGURE 2.4: The measured vector y, N = 300, M = 60
Figures 2.6 and 2.7 shows how the algorithm perform on different parameter
settings, by measuring minimum square error (MSE) and reconstruction efficency
against the number of measure M for random k-sparse, N-long signal.
F IGURE 2.5: The recovered coefficient vector w, N = 300, k = 10, e <
10−14
F IGURE 2.6: Measured MSE vs relative number of measurements.

Note that error drops to zero around M = 4k. ( N = 300, k = 10)
F IGURE 2.7: Percentage of perfect reconstructions vs relative number

of measurements. Note that accuracy approaches one around M =
4k. ( N = 300, k = 10)
29
Chapter 3
Digital Imaging
Digital images are a class of discrete signals. A black and white digital image is
represented as a function x : D → [0, 1] from a discrete, two dimensional array D
2
of equally spaced points called pixels, for example {1, 2, ..., N } , into the continuous
range [0, 1]. In such images, x [i, j] = 0 indicates a black pixels at position (i, j),
x [i, j] = 1 indicates a white pixel, and all values between them represents shades of
gray.
F IGURE 3.1: A 250 · 250 black and white digital image
In many situation, it is convenient to treat an L-wide image as a simple vector,

L× L
x ∈ [0, 1] , relegating the task of keeping track of the 2D structure of the image
to the basis indices. Most digital image are obtained from a continue signal f (t) by
sampling - a linear approximation that "discretizes" the signal introducing an error
dependent on the sampling rate. This is why sometimes we will talk, for example,
about the smoothness of a discrete signal - we will be referring implicitly to its analog
30 Chapter 3. Digital Imaging
counterpart. It should be clear from contex what object is being considered, and we
will make the relationship between the two more precise when when talking about
sampling theory in section 3.2.3.
3.1 Representing digital images
Digital images, as vector in a linear space, have quite a special basis, the pixel one.
Pixel indices keep track of the 2D geometric structure of the image, and pixel coeffi-
cient are easily interpreted as light intensity emitted by the correspondent geometric
points. Working in the pixel basis makes digital images very intuitive to display. Yet,
it is obviously possible to represent images in other basis, such as the Fourier one, in
order to highlight other useful characteristics or to further facilitate storage.
Some of these basis have being developed precisely for the study of natural sig-
nals and images. Natural is here a fuzzy label which refers to structured characteris-
tics usually found in real world images. How exactly we formalize this structure is
going to determine which basis will be best suited for our purpose.
3.1.1 Discrete Fourier Transform
One of the most relevant basis in signal processing is the Fourier one. We start by
recalling the classic Fourier transform of f ∈ L2 (<),
Z +∞
fˆ(ω ) = f (t)e−iωt dt. (3.1)
−∞
The discrete counterpart of (3.1) is a Fourier series
+∞
fˆ(ω ) = ∑ f [n]e−iωn . (3.2)
k =−∞
Fourier series behave essentially as a Fourier transform, since they are particular
∞
instances of the latter, acting on Dirac sums f (t) = ∑+
− ∞ f [ n ] δ ( t − n ).
Since digital signals are not only discrete, but also finite, we should further mod-
ify (3.2) to account for functions f [n] defined only in a finite domain, i.e. 0 ≤ n < N.
3.1. Representing digital images 31
The standard procedure is to prolong f [n] outside its domain in a periodic (or "cir-
cular") way, that is
f [ n ] = f [ n + j · N ], ∀j ∈ Z (3.3)
We will see in a few pages, when discussing the discrete cosine transform, that this
is just one of many way of extending f outside its domain.
We now applies 3.2 to a function satisfying 3.3. To stress the finiteness of the
signal, we shall use the vector form xk instead of the function x [k].
Definition 3.1.0.1 The discrete Fourier transform (DFT) of a signal x ∈ C N is defined as

the signal x̂ ∈ C N such that
N −1
∑
2πi
( Fx)k = x̂ [k] = hx, fk i = x (n) · e N · nk . (DFT)
n =0
Instead, considered as a function of k, the new signal could be defined even outside
the interval {1, ..., N }, and is again periodic with period N, so that (??) maps periodic
signals into periodic signals.
We have introduced the matrix F. Its rows are the Fourier basis vector:
2πi
f k (n) = e N · kn 0≤k<N (3.4)
and they form an othogonal basis of C N , so that F is an orthogonal matrix.

For two dimensional signals, like images, the 2D discrete Fourier transform is
simply applied two times, one per dimension
N1 −1 N2 −1
∑ ∑
2πi 2πi
x̂ [k, l ] = hx, fk,l i = x [n1 , n2 ]) · e N · n1 k ·e N · n2 l . (3.5)
n1 =0 n1 =0
and is again an orthogonal basis change in C N1 N2 .

A quality of many real world images is their smoothness - they vary slowly in
space, with large patches of the same or similar intensity. For this very reason, we
expect their Fourier coefficient to decay rapidly, so that they can be easily approxi-
mated by few low-frequency components: The more regular x (t),the faster the decay
of the sinusoidal wave amplitude | x̂ (k )|2 when frequency increases. For example, it
is a well known result in Fourier analysis that
Theorem 3.1.1 If there exist two constants K and e such that
K
| x̂ (ω )| < then f ∈ C p (3.6)
1 + | ω | p +1+ e
So that such signal coefficient decade with a power law and are thus compressible.
F IGURE 3.2: A digital image (left); its DCT coefficients, sorted by

magnitude (rigth)
3.1.2 Discrete Cosine Transform
To obtain the DFT from the regular Fourier Transform, one implicitly extend the
function of interest periodically. But it’s worth nothing that other extensions give
rise to different basis changes. The Discrete Cosine Transform (DCT), is related to
the Fourier Series coefficients of a 2N-periodic and symmetrically extended sequence.
Four slightly different types of DCT exist, related to different ways of realizing the
symmetric extension.
Type-II DCT is both the most intuitive and useful.
Definition 3.1.1.1 The Type-II Discrete Cosine Transform (DCT) of a finite signal x [n] ∈
C N is defined as
N −1
kπ
x̂ [k ] = hx, fk i = ∑ x [n]cos[
N
(n + 1/2)] k = 0, ..., N − 1 (3.7)
n =0
3.1. Representing digital images 33
F IGURE 3.3: Four diverse periodic and symmetric extension gives rise
to four different types of DCT
The 2D DCT is obtained in similar fashion as the 2D DFT - by applying it two times,
one for each dimension. When treating real valued functions, the DCT has the ad-
vantage of keeping them real through the transform. It is often used in signal and
image processing, especially for lossy compression as in MP3 and JPEG. The reason
for this popularity is its strong "energy compaction” property - in typical applica-
tions, most of the signal information tends to be concentrated in a few low-frequency
components. This is usually attributed to its well-behaved boundaries - a symmetric
extension tends to be "smoother" than a periodic extension, as figure 3.3 suggests.
3.1.3 Wavelets
When transient phenomenons are present the Fourier transform is not adequate,
as it requires many coefficients to represent localized events. The phrase "a zone of
high frequency" makes no sense in the Fourier domain, but in a lot of applications
we would like to talk about such events. In order to capture transient structures in
the signal, other transforms are required that decompose signals over elementary
functions with narrow (but never perfect, for uncertainty principles get in the way)
localization in both time and frequency.
A very general way of doing so requires the use of wavelets. A wavelet dictio-
nary is defined starting from a mother wavelet function w ∈ L2 (R), usually normal-
ized to unity and centered on zero. A dictionary of time-frequency atoms (or child
wavelets) is then obtained by scaling φ by s ∈ <, and translating it by u ∈ <:
t−u
φs,u (t) = K (s) · w( ) (3.8)
s
where K(s) is a normalization factor. Depending on φ, it may be possible to con-

struct an orthogonal basis for L2 (R) from such collection of wavelets. Dictionaries
of discrete and finite wavelets can be obtained by appropriate substition of u and s
by discrete parameters, and again, some of them provides orthogonal basis for the
relevant < N .
The first and most widely known of such wavelets was proposed by Haar. Its
mother wavelet "sees" the differences between adiacent zones of the signal





 1 0 ≤ t < 1/2


φ(t) = −1 1/2 ≤ t < 1





0
 otherwise
Such mother wavelet is then traslated and rescaled, giving the pairwise orthogonal
Haar functions
φ(t)k,l = 2k/2 φ(2n t − l ), k, l ∈ Z. (3.10)
3.2. Manipulating Digital Images 35
The collection {φk,l }k,l ∈Z is an orthormal basis in L2 (<). As opposed to a Fourier ba-
sis, a wavelet basis defines a sparse representation of piecewise regular signals,which
may include transients and singularities. In images, large wavelet coefficients are lo-
cated in the neighborhood of edges and irregular textures.
F IGURE 3.4: An image (left), its Haar coefficient arranged in decreas-

ing order (right)
In Chapter 6 we will perform simulations using the collection of compactly sup-

ported and orthonormal Daubecheis wavelets, defined in [11] and since a classic in
image processing. They are slightly less localized than Haar wavelets, and thus
more inchoerent with point sampling.
3.2 Manipulating Digital Images
There are a number of transformations that happens, sometimes against our will, to
digital images. Among the most relevant to this thesis are Filtering, Downsampling
and Affine Transformations.
3.2.1 Filtering
A great number of standard signal-processing operations are implemented through

linear and time invariant operators L : L2 (<)− > L2 (<); these can be implemented
through convolution with a filter h L , that is
Z +∞
L( f ) = f ∗ h := f (u) · h L (t − u)du (3.11)
−∞
The well know property of Fourier functions.
F (h ∗ g) = F (h) · F ( g) (3.12)
states that convolution in the spatial domain is equivalent to multiplication in the

frequency domain. These two equation taken together imply that Fourier functions
are the eigenfunctions of the convolution operator. This is the reason why filtering
is often performed in the Fourier domain, where it’s diagonal.
For discrete signals, the integral in 3.11 is simply discretized into an infinite sum.
But for discrete and finite signals, some assumptions must be made, in a fashion sim-
ilar to those carried out in section 3.1.1. Circular convolution is defined extending the
signals of interest, say x, y{1, ....N } → C in a periodic fashion outside their domain,
and then computing their discrete convolution:
N −1
( x ∗ y)[n] := ∑ x [i ] · y[ n − i ] (3.13)
i =0
Note that discrete Fourier functions are the eigenfunction of circular convolution.
3.2.2 Blurring and Deblurring
As an example of a linear filter, let’s consider image degradation due to optical blur;
this is in typical applications locally modeled with a gaussian Point Spread Function
(PSF),
1 − x2
gσ ( x ) = √ exp{ } (3.14)
2π σ 2σ
Since exponential functions mantains their shape when transformed to the Fourier
domain, applying a Gaussian blur has the effect of reducing the magnitude of the
image’s high-frequency components; a Gaussian blur is thus a low pass filter.
F IGURE 3.5: A digital image (left). Its gaussian-blurred version, σ = 3

(right)
It’s a reasonable request to try and invert the action of a filter on an image; that
is, to recover the unfiltered image from its filtered version. This task, that takes
the name of deconvolution, is possible in theory when no frequency is totally an-
nihilated by the filter - it should suffice to multiplicate back each frequency for the
inverse of the relevant coefficient. In practice, deconvolution is extremely prone to
error magnification - when frequencies that have been damped by the filter are mag-
nified back, errors get magnified too.
3.2.3 Sampling
If we want to store or perform computations on analog signals, we need to turn them

into discrete objects. The most straightforward way requires us to store the value of
the signal at equal intervals ∆, { f ( j∆)} j∈Z . We can perform this uniform sampling
through convolution with a dirac function δ:
+∞
f ∆ (t) = ∑ f ( j s ) δ ( t − j ∆ ). (3.15)
j=−∞
f ∆ (t) is now zero except for a discrete, ∆-spaced set of points, were it takes the same
value as f . Obviously, it’s not possible in general to reconstruct perfectly and un-
ambiguously an analog signal from its digital version, nor a high-resolution discrete
signal from its low-resolution samples. One of the first result in sampling theory,
due to Whittaker and rediscovered by Shannon, explicits the necessary conditions
for the recovery of f from its discrete version f ∆ . Before showing the result, we note
that the Fourier transform of equation 3.15 is :
1 2kπ
fˆ∆ (ω ) = fˆ(ω − ). (3.16)
∆ ∆
We are now ready to state the Shannon-Whittaker sampling theorem
Theorem 3.2.1 If fˆ has support included in [−π/∆, π/∆], then f can be reconstructed
from its sampled version f ∆ as
+∞
sin(πt/∆)
f (t) = ∑ f (n∆)
πt/∆
. (3.17)
n=−∞
2
Thus, a rate of acquisition of at least ∆ is needed to perfectly determine a func-
tion whose higher frequency is ∆1 . Given an acquisition rate, the higher perfectly re-
constructed frequency is usually called Nyquist frequency. On the other hand, given
a frequency range, the lowest acquisition rate that ensure perfect reconstruction is
named Nyquist rate.
When sampling a signal with higher frequencies than the Nyquist frequency, It
would be nice to know that, at least on the frequencies covered by the Shannon
sampling, the acquisition is faithful. Unfortunately, frequencies inside the sampling
range are corrupted by those outside of it. This is called aliasing, and happens every
time we try to acquire without prior filtering a low-resolution version of an image
whose higher frequency exceed the Nyquist frequency.
Aliasing can be removed by use of filters. An obvious strategy is to lowpass the
signal before sampling, so that it’s forced to fulfill Shannon’s requirements. This
strategy enforces recovery in the desired range of frequencies, at the cost of com-
pletely destroying any high-frequency information in the signal.
Most real life acquisition devices don’t satisfy the restrictive requirements of
Shannon’s theorem. But we’ll see in the next chapter that, provided some informa-
tion is given on the signal, recovery may be possible with sample rate much smaller
than those required by Shannon sampling, and even that destroyed frequency can
sometimes be recovered.
3.2.4 Affine transforms
In addition to their simple vector structure, images have also a two dimensional
affine structure we may want to tamper with. In the context of image processing,
we define affine transformations to act over the image domain preserving structural
geometric elements such as lines and distance ratios. In the following chapters, we
will use these transformations to model the the small distortions that happens when
a camera acquires images from slightly different points of view.
For a signal, inteded as a function f: C N → C, a general affine transform T looks
like this:
TM,b f (x) = f ( Mx + b)
With the linear operator M and the vector b being the parameters of the transform.
The simplest non-trivial 2D affine transformation is a translation, Ta,b
Tx0 ,y0 f ( x, y) = f ( x − x0 , y − y0 )
All 2D affine transformation are indexed by 6 parameters: 4 for the matrix M and
2 for the translation vector b. They can always be described as combined translation,
shearing, scaling and rotating, though not all of these are independent transforma-
tions.
When performing affine transformations on discrete and finite grids a number
of complications arise. First, even a simple traslation uses points found outside the
domain of the function for determining point inside it. Again, a common strategy
is to define finite transform to be circular, and periodize the function out of its do-
main. Furthermore, except from trivial cases, like translations of an integer number
of pixels, points of the original grid will fall off the new grid when transformed.
Some kind of interpolation will then be needed to determine the value of pixels on
the new grid.
43
Chapter 4
Super Resolution
Spatial resolution refers to the pixel density in an image and measures in pixels per
unit area. It is determined during acquisition by the sensor size, or the sensor den-
sity per unit area. The higher the sensors density, the higher spatial resolution is
attainable for the imaging system. The need to acquire images at the higher possi-
ble resolution is the same as the need of capturing as more informations as possible.
In order to increase the spatial resolution, one can straightforwordly try to increase
the sensor density by reducing its size. However, as the sensor size decreases, the
amount of light incident on each sensor also decreases, causing the so called shot
noise. Also, the sensor size is limited by the optics, due to lens blurs, aberration
effects, diffractions and motion blurring.
A modern way to address this limitation requires us to post process the captured
images, realizing a trade off between computational and hardware cost. These tech-
niques are specifically referred as SuperResolution (SR) reconstruction. The aim of
Super Resolution is to generate a high-resolution (HR) image from one or multiple
low-resolution (LR) images of the same scene, as one can obtain from one camera
with several captures or from multiple cameras located in different positions. SR
image reconstruction becomes possible when such motions can be estimated with
sub-pixel accuracy. What SR accomplishes, apart from enhanced visual quality, is
the recovery of information that went lost or degraded during the acquisition pro-
cess.
44 Chapter 4. Super Resolution
4.1 The forward acquisition model
In a quite general setting, m low-resolution images are obtained by affine-transforming,

degrading and then downsampling the same high-resolution image. In this thesis
we formalize the process as follow:
yi = DHTi x HR + ni i = {1, ..., m}, (4.1)
where
yi ∈ < M is the i-th LR measure, arranged as a vector in lexycographical order;
D ∈ < M· N is the downsampling matrix;
H ∈ < N · N is the degrading or filtering matrix;
Ti ∈ < N · N is the i-th affine transform - what makes every measure different;
x HR ∈ [0, 1] N is the original HR image, arranged as a vector in lexycographical
order;
n ∈ < M is a noise term, usually an i.i.d. white gaussian noise.
The set of equations 4.1 can be rearranged in a single linear system,
   
 y0   DHTi 
 ..  
= .. 
 x HR

 .   . 
   
ym DHTm
or equivalently
y = D̄ H̄ T̄x HR + n (4.2)
with y ∈ <mM being the vector obtained as concatenation of yi , and with the over-
line representing concatenation of matrices in the obvious way. We’ll be referring
to equation (4.2) as the forward acquisition model, or simply the forward model;
inverting this equation and retrieving the original image x HR is the goal of super-
resolution.
If from a square HR image of side l HR we capture low resolution images of side
l HR
l LR , we define λ := l LR to be the super-resolution factor. The bigger it is, the harder
a super-resolution problem become, as we will have to guess more information. It
4.2. Registration 45
doesn’t keep track of the number of low-res measure we have access to, so it’s not a
totally faithful indicator of our knowledge. We define
Mm m
ω := = 2; (4.3)
N λ
to be the fraction of total point covered by our measures; it still doesn’t count for
information that is redundant (such as points mapped by two different acquisition)
or corrupted (such as when the image is subject to optical blur). While SR algorithms
are as varied as they come, virtually all of them face two fundamental issues in order
to invert the forward model: registration and reconstruction.
4.2 Registration
Registration amounts to determine the transformations Ti (sometimes assumed to

be pure translations) that link the HR imagex HR to each LR image Yi . In other words,
registration is the task of transforming and superposing the LR images with sub-
pixel accuracy. The images to be geometrically aligned can be taken at different
times, from different viewpoints and/or by different sensors. In a video of a still
scene, for example, registration of the various frame will require to determine the
spatial movements of the camera from the video itself.
4.3 Reconstruction
Reconstruction, on the other end, consists in recovering the HR image from the LR
sample(s), knowing or assuming the whole chain of transformations that created the
former from the latter. With enough, perfectly registered low-res measures, recon-
struction would be a trivial task. Unfortunately, this is rarely the case - most of the
time, we know the value of just a fraction of the HR pixels, and with accuracy very
far from perfect. This is where prior information on the signal structure is put into
action.
In this thesis, we are going to focus on the reconstruction step, and assume the
registration parameters are given. Thus, our data will consist of a set of LR images
46 Chapter 4. Super Resolution
and of the affine transformations linking them to the same HR frame; our task will
be to reconstruct a single high-resolution image from those data.
4.3.1 The difficulties of reconstruction
In general, the forward model is a mapping from a linear space to another of lower
dimension, that is N > Mm, and it’s impossible to recover without ambiguity the
HR image given the LR one - super resolution with few measures is an ill-posed
problem, where we aim to recover information that has been destroyed.
4.4 Super-resolution methods
A wide variety of approaches [19] has been developed since the birth of the SR dis-
cipline, each one representing a different way of guessing what the HR image may
be among the various compatible with the LR measures. These approaches, at least
with regard to the reconstruction step, can be roughly divided in three broad cate-
gories: interpolation, learning-base and reconstruction based.
Interpolation algorithms are the most straightforward approach to reconstruc-
tion: they try to recover the unknown value of a pixel from the values of its known
neighbors. “Nearest-neighbor” interpolation is the simplest of this kind of proce-
dures. It simply assign to an unknown pixel the value of the closest known one. It
is well-known [22], and not surprising, that these algorithms generally over-smooth
the high frequency details. Other approaches [10] make their interpolations in do-
mains other than the spatial one, such as the wavelet or the Fourier one.
Learning-based (or example-based) super-resolution algorithms use a dictionary
generated from an image database [17] to invert the process of down-sampling in a
reasonable manner. They use known correlations to guess the unobservable parts
of an image from the visible ones. The effectiveness of learning-based algorithms
highly depends on the training image database.
Reconstruction-based super-resolution algorithms don’t use a training set but
rather define reasonable constraints for the target high-resolution image to improve
the quality of the reconstruction. This is usually achieved either by incorporating
4.4. Super-resolution methods 47
priors in a Bayesian framework or by introducing regularizers into the ill-posed in-

verse problem. A number of priors or regularizations have already been proposed
for the SR problem. This includes Tikhonov regularization [27] and total variation
[19] methods.
Our approach, which will be explained in detail in the next two chapters, belongs
to this last kind.
49
Chapter 5
Compressive Sensing for Super

Resolution
In this thesis we try to overcome the ill-posedness of the super-resolution problem by

the means of compressive sensing - namely, by assuming that the high-resolution im-
age is sparse in some basis, and then solving the under-determined acquisition prob-
lem accordingly through our algorithm of choice - the normalised Iterative Hard
Thresholding. We will now delve into the relatively simple task of restating the
multiple image SR problem as a CS problem.
5.1 The SR acquisition model revisited
We have seen how natural images, as opposed to general white noise, are sparse in a
number of basis. We can use this property to build a forward acquisition model that
transforms a sparse vector into a whole set of LR acquisitions.
N
We start by assuming that the high-resolution image x HR ∈ [0, 1] ,is sparse in
some known basis B ∈ < N · N , that is
x HR = Bw HR , |w HR |0 < k; (5.1)
In most situation, w HR will be compressible, not sparse, but we have seen that good
compressive sensing algorithms are robust against sparsity defects. Following 4.2,
50 Chapter 5. Compressive Sensing for Super Resolution
our measurement matrix A will be defined as
y = D̄ H̄ T̄x HR = D̄ H̄ T̄Bw HR := ASR w HR (5.2)
so that our measurment matrix is now ASR = D̄ H̄ T̄B, and it maps sparse coefficient
vector w HR ∈ < N into a concatenation of m low-res acquisition, y ∈ < Mm
Given ASR and y, our task will be to recover w HR , and thus x HR using the com-
pressive sensing framework. Before applying any solving algorithm to this inverse
problem, we should inquire whether ASR permits recovery at all.
5.2 Recovery though ASR
As surveyed in Chapter 2, not every matrix is a good CS matrix. In order to study

the suitability of ASR , as defined in the forward model (4.2), our principal guid-
ance should be the reduction of its coherence. We will not check if ASR satisfies the
RIP condition relevant to this problem - we can informally argue why it should be
roughly an isometry on sparse vectors, but its suitability to the task is to be judged
by its nice experimental results. Fortunately, we have some freedom in determining
the shape and structure of ASR .
5.2.1 Sparsifying Basis
The basis choice is the first and most important variable when establishing the coher-
ence properties of ASR . Intuitively, we can see that the better a basis is at representing
localized features, the more coherent it will be to point sampling because it can rep-
resent small spatial features (e.g., point samples) with only a few coefficients. Thus,
we will benefit from a basis with enough localized feature to infer unobserved pixel
from close ones, but not so much localized to suffer from point sampling Among
sparsifying bases for natural images, we selected two: DCT and wavelet.
5.2. Recovery though ASR 51
5.2.2 Structurally Random Matrices
To further reduce the coherence of the sensing matrix, we employ a trick due to [12]:
we randomize the sparse coefficient vector w HR , that is, we substitute the matrix B
for the alternative basis B̃ = BR, where R is a randomizing matrix. We randomize
a target signal by either flipping its sample signs or uniformly permuting its sample
locations. The resulting sensing matrix is called a structurally random matrix (SRM).
5.2.3 The super-resolution factor
Another relevant variable to the task of SR is the choice of the target resolution l HR
for the starting HR image, or equivalently, the super-resolution factor λ. The latter
will greatly influence our algorithm: it will determine the dimensions of matrices B,
T and D, with greater dimensions meaning greater computation time. We hope to set
λ as big as possible (as we are searching for the highest possible resolution), but in
practice there is only so much information we can extract from the measures y, even
with the additional assumption of sparsity - so that choosing a unnecessary high λ
will result in unnecessary high computation time and, possibly, in image artifacts.
2 of
The choice of λ is to be determined by two major factor: the length Mm = Ml LR
the measurement vector and the (presumed) sparsity k of the original HR image.
Through affine transformations, we can acquire points with relative distance smaller
than l HR , thus recovering informations on frequencies higher than the acquisition
rate of a single low resolution image. As long as we limit ourselves to translations
and rotations, we can image the x HR being divided in squares of side λ: in this
setting, each transformation amounts to choose (ignoring filtering) a pixel in each
square for the downsampling matrix to measure. Translations, for example, choose
always the same pixel in each square, while for rotations the pixel varies.
On average, m out of every λ2 points will be known. This number is to be re-
duced by the occasional redundancy of the acquisition, e.g. when two LR measures
map the same point from the original HR image. These considerations guides our
choice of λ. We know from (CAP) that, in order to recover a k-sparse signal, the
dimension Mm of a random acqusition should be approximately 4k. If we define

2 = l 2HR N
K = k/N we will need Mm > 4NK measures. Since M = l LR λ2
= λ2
, we have
r
m
λmax = . (5.3)
4K
This bound is the main guidance in performing compressive sensing super res-
olution. The maximum achievable increase on resolution thus depends only on the
number of measures and on the relative sparsity of the original image. In our prob-
lem, the acquisition matrix is far from purely random, so that the optimal M for a
desired λ was slightly higher, but comparable.
5.2.4 Blur
While filtering surely reduces our informations on high frequencies, encompassing

it in the matrix ASR helps in delocalizing the acquisition. This is an important step
when working in a wavelet basis. As already discussed in Chapter 4, some form of
regularization is to be applied in order to avoid error magnification.
5.3 Normalised Iterative Hard Tresholding for Super Reso-
lution
Convinced (or hoping, depending on the setting) that ASR permits recovery in gen-
eral, we can choose a specific algorithm for the recovery task. The choice falls on
normalised Iterative Hard Thresholding for a variety of reason, some of which have
been already discussed in Chapter 2.
The first and most relevant reason for choosing NIHT is that it’s fast compared to
greedy methods and `1 minimization, because it doesn’t require any projection step.
When working with vectors that are relatively big ( N = 105 ) and not excessively
sparse (s = 103 ), and with consequently big matrices, this is no little help. The
second is that its implementation doesn’t require explicit knowledge of the entries
of ASR or its adjoint - we just need to be able to evaluate these operator on a generic
vector of the appropriate dimension. The third and last one is that we know IHT
5.3. Normalised Iterative Hard Tresholding for Super Resolution 53
will eventually converge to a local minimum even if the sensing matrix coherence or
RIC fall outside the bounds of theorem 2.12.
5.3.1 evaluating A and A∗
Using the Iterative Hard Thresholding requires us to compute the matrix ASR and
its transpose, A∗SR . For reference, we will go now into the details of how every part
of ASR = D̄ H̄ T̄B and A∗SR is evaluated in practice.
1. B ∈ < N · N , basis change.
B transforms w, a sparse coefficient vector, in x, an HR image vector. We are

fortunate enough that, for most Fourier-like basis, a computationally fast trans-
form akin to the Fast-Fourier transform exist. For orthogonal bases, like Dis-
crete Cosine and Discrete Wavelet, the transpose equals the inverse, so that we
can use the fast inverse of the transform to efficiently obtain B∗ . The SRM trick
of section 5.2.2 is implemented in this step thorough the substitutions B ← BR
and B∗ ← R∗ B∗
2. M ∈ < N ·mN , stacking matrix.
M is here just a reference to the stacking process, the one that concatenates the
different acquisitions. It is composed of m indipendent copies of the identity
matrix, so that it creates m copies xi of x, ready to be transformed and down-
sampled. Its transpose, M∗ , is easily evaluated: x = M∗ [x1 ...xm ] T = ∑im=1 xi .
Apart from this matrix, each acquisition is processed independetly. We will
limit our discussion to the single matrices, being intented that each acts on a
single copy of x, and then the results are simply concatenated.
3. Ti ∈ < N · N , affine transform.
This is good as long as we work on infinitely fine grids, but when in the discrete
setting, some kind of interpolation is necessary. This will introduce at worst
(when using nearest neighbor interpolation) a positional error of one pixel, a
problem that grows as the super-resolution factor is reduced. If we restrict
ourselves to (circular) translation and rotation, we will have orthogonal Ti , so

that they equal their transpose.
4. H, the filter and D, downsampling matrix.
These two operation actually happen together in a variety of setting, such as

when the finite size of a camera sensor introduce an error in the measured
LR image. Their transpose simply involves applying a transposed filter to the
upsampled matrix. If the filter is a circular gaussian, as in our case, it’s even
orthogonal.
5.3.2 Applying the NIHT
Given: acquisitions yi and their corresponding affine transforms Ti
1. Choose sparsifying basis B and estimate coefficient density K.
2. Apply the "SRM trick", that is, randomize B;
3. Evaluate minimum viable λ (depends on K and number of acquisition m) ;
4. Estimate degrading filter H;
5. Build ASR and A∗SR (depends on B, Ti , H and λ) ;
6. Recover coefficient estimate w through N I HT ( AS R, y);
7. Finally recover an estimate of x HR as Bw.

57
Chapter 6
Experimental results
We tested our algorithm on three kind of images.
1. Synthetic images, where the LR dataset was obtained form known, random,
artificially sparse images.
2. Natural images, where the LR dataset was obtained form known, real life dig-
ital images
3. Real world images, where the LR dataset was all we had access to.
6.1 Synthetic images
Here we present some experimental results with synthetic datasets, i.e. those where
the LR images are obtained through known transformations from an appositely cre-
ated HR image - handrcafted to be k-sparse in the basis of choice.
For these image, we had freedom to choose the original HR image dimension N,
the sparsity k, the number m and dimension M of the LR acquisition, and the affine
transforms Ti .
LR data were obtained by generating a random k-sparse image in the selected
bases and subsequently applying ASR to it. After that, we used IHT to reconstruct
the original coefficient from knowledge of the LR measures (and of ASR ) only. We fi-
nally confronted the estimated HR image with the original, recording the percentage
of perfectly reconstructed non-zero coefficients and the MSE between the original
and the reconstructed image. Coefficients were considered "perfectly reconstructed"
if the reconstructed value was close enough (δ < 10− 4) to the original one.
58 Chapter 6. Experimental results
We performed tests with both wavelet(db8)-sparse and DCT-sparse images, eval-

uating the two error metrics for a range of possible k, m, M, and N. In what follows,
we show the most interesting results.
First, we evaluated the percentage of perfect reconstructions against growing
relative sparsity, K = k/N, for fixed number m and dimension M of acquisition, for
cosine and wavelet bases, figures 6.1 and 6.2.
For the DCT basis, as expected, as k grows the odds of a coefficient being recon-
structed drop to zero. We can try and identify a region of perfect reconstruction up to
4k ≈ Mm; for example, the zone should end at K ≈ 0.15 for m = 10, and at K ≈ 0.21
for m = 14, which is consistent with the simulations even if, not surprisingly, a little
bit too optimistic.
For wavelet basis, we observe a much slower decay in accuracy, with lesser per-
formances at low sparsity but greater (even if low) accuracy for high sparsity. This
is due to the higher coherence of the wavelet basis with point sampling: if a mea-
surement point falls in a zone highly correlated with a certain wavelet coefficient,
that particular coefficient will be accurately reconstructed, so that the accuracy will
never fall under a certain threshold. On the other hand, this very fact implies that a
lot of the signal energy will not be measured, because no measurement point falls in
the relative zone, so that perfect reconstruction will be harder to achieve even at low
sparsity.
Subsequently, we moved on studying how accuracy depends on super resolution
factor λ for different number of acquisitions, figures 6.3 and 6.4. We may compare
this results with λmax , as defined in 5.3. In the low sparsity example, fig 6.4, the
relative sparsity is K = k/N = 1/30; for m = 6 (purple line), this imply λmax ≈ 8.
In the high sparsity example, fig 6.3, the relative sparsity is K = k/N = 1/3; for
m = 10 (purple line), this imply λmax ≈ 3, which is quite accurate.
6.2. Natural images 59
F IGURE 6.1: Synthetic dataset: Percentage of perfectly reconstructed

coefficients (DCT basis), plotted against sparsity k, for various num-
ber of acquisitions m. ( N = 10000, λ = 4)

coefficients (db8 wavelet basis), plotted against sparsity k, for various
number of acquisitions m. ( N = 10000, λ = 4)
6.2 Natural images
We worked with natural images, such as "boats". Here, the original HR images
maximum dimension and sparsity were fixed from the beginning, so that our free-
dom amounted to choose the downsampling factor, the number of measures and the
affine transforms.

coefficients (DCT basis), plotted against λ for a variety of m and high
k. ( N = 10000, k = 3000)

coefficients (DCT basis), plotted against λ for a variety of m and low
k. ( N = 10000, k = 300)
We studied reconstruction both in the wavelet and in the cosine domain. As

figure 6.6 and 6.7 suggests, around 1/3 of the original coefficients were not recovered
by both method. This is not surprising, since these images were not sparse in the
first place, but as the two plots on MSE shows, these errors amount to around 2%
6.2. Natural images 61
F IGURE 6.5: Natural dataset: An example of the reconstruction pro-

cess. (m = 6, λ = 4, N = 256 · 256)
of the total energy of the image when enough measures are available. We also note
that new measures are less and less effective at reducing the MSE, and that they left
accuracy virtually unchanged, even if slightly increasing.
F IGURE 6.6: Natural dataset: MSE and reconstruction accuracy plot-

ted vs the number of acquisition, m, for the DCT reconstruction
scheme. (λ = 4, N = 256 · 256)
F IGURE 6.7: Natural dataset: MSE and reconstruction accuracy plot-

ted vs the number of acquisition, m, for the wavelet(db8) reconstruc-
tion scheme. (λ = 4, N = 256 · 256)
6.3 Real world images
In this part of the thesis, we present some experimental results with real datasets,
i.e. those where the LR images are obtained from an unknown HR image by trans-
formations which we can only estimate. Though no precise performance benchmark
can be obtained from these results, as we have nothing to compare them with, they
give hints of the achievable improvements in visual quality when real acquisition
processes are in play.
Two datased were studied:
1. “Target”, a series of photo of a printed target, designed as a way to determine

the effective resolution power of our techniques.
2. "Car", a series of frames from a video of a moving car, illustrating the versatility
of this Super Resolution technique.
6.3.1 Results on "Target" dataset
The “Target” dataset consists of a series of photo of a sheet of paper with a target
printed on it. The target arrangements of converging lines and shrinking symbols
6.3. Real world images 63
makes it easy to visually evaluate the algorithm performance. Results are shown in
figure 6.8.
F IGURE 6.8: Real world dataset: DCT-based reconstruction of the

"Target" HR image (on the left) from m = 5 LR samples (rigth).
(λ = 4, N = 458864
6.3.2 Results on "Car" dataset
“Car” LR images are the frames of a video footage of a car back. Results are shown
in figure 6.8.
F IGURE 6.9: Real world dataset: DCT-based reconstruction of the

"Car" HR image (on the left) from m = 5 LR samples (rigth). (λ =
4, N = 219296
65
Chapter 7
Conclusions
The purpose of this work has been a demonstration of the feasibility of a multi-
frame super-resolution reconstruction process - the recovery of a high-resolution
image from several low-resolution samples - through the normalized Iterative Hard
Thresholding algorithm, with a setup borrowed from the field of Compressive Sens-
ing.
The classical CS acquisition strategy - using a random subgaussian sampling ma-
trix - was not an option in this setting, since the acquisition matrix for the super-
resolution problem ASR was already constrained to contain a downsampling pro-
cess. The only randomness came from the subpixel shifts between different low-
res acquisitions, determined by the transforms Ti which were, again, fixed by the
dataset. Our freedom lied on the other end of the acquisition process: the basis
in which the high-resolution signal would have been reconstructed, along with the
dimension of such signal, was entirely for us to decide, only limited by the opti-
mal bounds of CS theory. Thus, we selected discrete cosine transform (DCT) and
Daubechies wavelets (db8) as sparsifying basis changes, and proceded to the recon-
struction on three different datasets.
The first dataset was entirely generated by us and consisted of random, artifi-
cially sparse images. It was the ideal benchmark for our procedure and enabled
simulation with arbitrary parameters on a wide range of configurations. We tested
both the DCT and the db8 bases, finding that the DCT performed nearly as good
as CS theory would suggest, with the db8 doing slightly worse. More interestingly,
66 Chapter 7. Conclusions
the two reconstruction processes showed a totally different behaviour, due to the
diverse coherence properties of the two bases with regard to point sampling.
For the second dataset, we generated low resolution samples from digital images
depicting real-world objects, and tried to reconstruct the original image from such
samples. These images had the additional difficulty of being not precisely sparse,
but just compressible. This fact didn’t allow for a perfect reconstruction of the origi-
nal image coefficients, but we noted that the missing coefficients only carried a small
fraction of the image total energy.
Finally, we confronted ourselves with noisy, unknown acquisition processes - we

tested the algorithm on low-resolution images taken from the real world. These data
did not allow for any proper performance measure, but the visual increase in qual-
ity - proven by distinguishable lines and readable text in the recovered images- is a
good indicator that the proposed reconstruction scheme is feasible even in this set-
ting.
References
[1] C. Blair. Problem complexity and method efficiency in optimization (a.
s. nemirovsky and d. b. yudin). SIAM Review, 27(2):264–265, 1985.
[2] T. Blumensath and M. E. Davies. Normalized iterative hard threshold-

ing: Guaranteed stability and performance. IEEE Journal of Selected
Topics in Signal Processing, 4(2):298–309, April 2010.
[3] T. Blumensath and Michael Davies. Iterative thresholding for sparse ap-
proximations. Journal of Fourier Analysis and Applications, 14(5):629–
654, 2008.
[4] E. Candes, J. Romberg, and T. Tao. Stable Signal Recovery from

Incomplete and Inaccurate Measurements. ArXiv Mathematics e-prints,
March 2005.
[5] E. Candes, J. Romberg, and T. Tao. Robust uncertainty principles:

exact signal reconstruction from highly incomplete frequency informa-
tion. IEEE Transactions on Information Theory, 52(2):489–509, Feb
2006.
[6] E. Candes and T. Tao. Near Optimal Signal Recovery From Random
Projections: Universal Encoding Strategies? ArXiv Mathematics e-
prints, October 2004.
[7] E. Candès. The restricted isometry property and its implications for
compressed sensing. Comptes Rendus Mathematique, 346(9):589 – 592,
2008.
[8] E. Candès and J. Romberg. Sparsity and incoherence in compressive

sampling. Inverse Problems, 23(3):969, 2007.
[9] S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis

pursuit. SIAM Review, 43(1):129–159, 2001.
[10] J. Clark, M. Palmer, and P. Lawrence. A transformation method for the

reconstruction of functions from nonuniformly spaced samples. IEEE
Transactions on Acoustics, Speech, and Signal Processing, 33(5):1151–
1165, October 1985.
[11] I. Dauchebies. Orthonormal Bases of Compactly Supported Wavelets

(appeared in Ten Lectures on Wavelets).
1
[12] T. T. Do, L. Gan, N. H. Nguyen, and T. D. Tran. Fast and efficient
compressive sensing using structurally random matrices. IEEE Trans-
actions on Signal Processing, 60(1):139–154, Jan 2012.
[13] D. L. Donoho. Compressed sensing. IEEE Transactions on Information

Theory, 52(4):1289–1306, April 2006.
[14] M. F. Duarte, M. A. Davenport, D. Takhar, J. N. Laska, T. Sun, K. F.

Kelly, and R. G. Baraniuk. Single-pixel imaging via compressive sam-
pling. IEEE Signal Processing Magazine, 25(2):83–91, March 2008.
[15] S. Foucart. Hard thresholding pursuit: An algorithm for compressive

sensing. SIAM Journal on Numerical Analysis, 49(6):2543–2563, 2011.
[16] Simon Foucart and Holger Rauhut. A Mathematical Introduction to

Compressive Sensing. Birkhäuser Basel, 2013.
[17] W. T. Freeman, T. R. Jones, and E. C. Pasztor. Example-based super-

resolution. IEEE Computer Graphics and Applications, 22(2):56–65,
March 2002.
[18] D. L.Donoho and M. Elad. Optimally sparse representation in general

(nonorthogonal) dictionaries via l1 minimization. Proceedings of the
National Academy of Sciences, 100(5):2197–2202, 2003.
[19] Zhang Liangpei, Linwei Yue, Huanfeng Shen, Jie Li, Qiangqiang Yuan,
Hongyan Zhang, and Liangpei Zhang. Image super-resolution: The
techniques, applications, and future. Signal Processing, 128:389 – 408,
2016.
[20] A. Maleki. Coherence analysis of iterative thresholding algorithms.

pages 236–243, Sept 2009.
[21] S. G. Mallat and Zhifeng Zhang. Matching pursuits with time-frequency

dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–
3415, Dec 1993.
[22] Stphane Mallat. A Wavelet Tour of Signal Processing, Third Edition:

The Sparse Way. Academic Press, Inc., Orlando, FL, USA, 3rd edition,
2008.
[23] M. Rudelson and R. Vershynin. Sparse reconstruction by convex re-

laxation: Fourier and Gaussian measurements. ArXiv Mathematics
e-prints, February 2006.
2
[24] A. M. Tillmann and M. E. Pfetsch. The computational complexity of
the restricted isometry property, the nullspace property, and related
concepts in compressed sensing. IEEE Transactions on Information
Theory, 60(2):1248–1259, Feb 2014.
[25] J. A. Tropp. Greed is good: algorithmic results for sparse approxi-

mation. IEEE Transactions on Information Theory, 50(10):2231–2242,
Oct 2004.
[26] W.-X. Wang, R. Yang, Y.-C. Lai, V. Kovanis, and C. Grebogi. Pre-
dicting Catastrophes in Nonlinear Dynamical Systems by Compressive
Sensing. Physical Review Letters, 106(15):154101, April 2011.
[27] Xin Zhang, Edmund Y. Lam, Ed X. Wu, and Kenneth K. Y. Wong. Ap-
plication of tikhonov regularization to super-resolution reconstruction
of brain mri images. 2007.

Compressive Sensing For Super Resolution

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Compressive Sensing For Super Resolution

Hochgeladen von

Copyright:

U NIVERSITÀ DEGLI STUDI DI B ARI A LDO M ORO

Compressive Sensing for Super

2 Compressive Sensing Theory 11

2.4 A toy problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5 Compressive Sensing for Super Resolution 49

6.2 Natural images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

2.1 ` p unit spheres in two dimensions . . . . . . . . . . . . . . . . . . . . . 12

3.1 A 250 · 250 black and white digital image . . . . . . . . . . . . . . . . . 29

6.1 Synthetic dataset: Percentage of perfectly reconstructed coefficients

6.3 Synthetic dataset: Percentage of perfectly reconstructed coefficients

1. In Chapter 1, we provide an overview of the main concepts in compressive

2. In Chapter 2, we review the mathematical framework of Compressing Sens-

3. In Chapter 3, we introduce some concepts and methods from digital signal

4. In Chapter 4, we present the super-resolution problem, reviewing some of the

5. In Chapter 5 we present our theoretical approach to super-resolution - we re-

6. in Chapter 6, we test our approach on super-resolution problems for three

images, obtained from known transformations of a given digital image; and

1.1 Compressive acquisition

A linear acquisition process of a signal x ∈ C N - resulting in a measurement vector

1.2 Recovery conditions

M > Ck log ( N/k ), (1.2)

x = arg min kzk0 subject to y = Az (1.3)

Algorithm 1 normalized Iterative Hard Tresholding

Iterative hard thresholding performs a gradient descent on the error function

If the matrix A satisfies the appropriate recovery conditions, we should be able to

1.4.2 1-pixel camera

1.4.3 Nonlinear predictions

where x ∈ R N represents the set of externally accessible dynamical variables and F

with a total of (1 + m) N coefficients to determine. If we probe the system at certain

the measurment process is now concisely expressed by

It is a reasonable assumption - equivalent to that of a loosely coupled system -

Compressive Sensing Theory

The support of a vector x ∈ C N is the index set of its non-zero components:

supp(x) := { j ∈ [1, ..., N ] : x j 6= 0} (2.1)

F IGURE 2.1: ` p unit spheres in two dimensions

In a similar fashion, we define vS to be the projection of a vector v over the set VS .

σ( x ) p,k := lim inf kx − zk p

the infimum is achieved by a k-sparse vector z ∈ C N that shares with x the k

1. Many natural signals are approximately sparse in the frequency or wavelet

2. In network theory, sparse matrices represents loosely coupled systems.

4. In signal processing, errors can be sometimes modeled as sparse or compress-

maximum or cumulative sparsity of vectors in S. This is, unfortunately, an NP-hard

2.2 Conditions for Recovery

For N > M, (2.2) is underdetermined. An underdetermined linear system is one

x = arg min kzk0 subject to y = Az, (2.3)

Theorem 2.2.1 The following two proposition are equivalent:

1. Every k-sparse vector z ∈ Vk is the unique k-sparse solution of Ax = Az;.

most compressible. If we also require the reconstruction scheme to be stable (i.e., to

We should find an alternative approach to the optimization problem 2.3, considering

x = arg min kzkq subject to y = Az, (2.4)

Problem 2.5 is called basis pursuit or `1-minimization. A number of efficient algo-

kvk1 < 2σ(v)1,k k ∀v ∈ ker A, v − 0 (null space property)

We recall that σ(v)1,k is the k-approximation `1 error of v, so that recovery is

2.2.3 Noisy recovery

x = arg min |z| subject to k Az − yk < η (2.6)

When casted in this form, it is called quadratically constrained basis pursuit or

Mutual coherence is a measure of the correlations among the images of 1-sparse

Definition 2.2.2.1 For a matrix A ∈ C M× N with `2-normalized columns {ai }i = 1...M,

Coherence of a matrix A ∈ C M× N has also a lower bound:

2.2.5 Restricted isometric Constant

Definition 2.2.2.2 The restricted isometry constant of order k for a matrix A ∈ C M× N is

(1 − δk )kxk2 ≤ k Axk2 ≤ (1 + δk )kxk2 . (RIC)

for every k-sparse vector x.

2.2.6 Recovery with random matrices

We already explored the possibilities of basis pursuit, even if no specific algorithm

2.3.1 Orthogonal Matching Pursuit