Beruflich Dokumente
Kultur Dokumente
I. INTRODUCTION
GORODNITSKY AND RAO: SPARSE SIGNAL RECONSTRUCTION FROM LIMITED DATA USING FOCUSS
601
602
AND
We review the nonparametric formulation of a signal estimation problem and the common minimum norm solutions.
We work in complex
space with the usual inner product and
norms defined. We carry out the development in the discrete
domain because it significantly simplifies the presentation and
is relevant to most signal processing applications, as most
computations are carried out in discrete form. The results are
directly extensible to analog (continuous) signals.
Linear extrapolation (reconstruction, estimation, approximation, interpolation) problems can be expressed in the matrix
equation form
(1)
where
is the
matrix operator from
an unknown signal
to a limited data set
. The conditions for the existence of
are given by
the Riesz representation theorem. The problem is to find
(reconstruct, estimate, approximate, extrapolate) the signal
from its representation .
We use an example of harmonic retrieval to facilitate the
presentation. In this example, each column of
represents
a sampled exponential sinusoid of some frequency , i.e.,
. The columns of are generated by selecting the values of
from within the range
to sample the frequency axis with the desired density.
The values of
may be non-uniformly spaced and can be
chosen to reflect prior knowledge. The data is a sample of a
process consisting of a few sinusoids. The chosen frequencies
may not match exactly the harmonics contained in . We
denote the real solution to be the solution whose nonzero
entries pick out the columns of that are the closest (in the
1-norm) to the harmonics contained in . Thus, nonparametric
estimation of sparse signals can be thought of as a basis
selection process for a signal. It is important to note that the
sinusoids represented by each column of must be sampled
at the same sampling density as that used to generate .
Non-uniqueness of solutions to (1) is a well-known problem.
The infinite set of solutions can be expressed as
,
where is any vector in the null space of , and
is the
minimum norm solution, which is defined next.
The minimum norm solution is the most widely used
estimate for (1) and is found by assuming the minimum
Euclidian or -norm criterion
on
the solution.1 This solution is unique and is computed as
(2)
1 Unless
2-norm.
k 1 k
where
denotes the MoorePenrose inverse [29]. The solution has a number of computational
advantages, but it does not provide sparse solutions. Rather, it
has the tendency to spread the energy among a large number
of entries of instead of putting all the energy into just a
few entries.
A closely related weighted minimum norm solution, on
which FOCUSS iterations are based, is defined as the solution
minimizing a weighted norm
, where
is a matrix.
It is given by
(3)
, we extend the definition of
To accommodate singular
the weighted minimum norm solution to be the solution
minimizing
. By changing , every possible solution
to (1) can be generated. When
is diagonal, the cost objective
simply becomes
, where
are
the diagonal entries of
.
For future discussion, it is useful to restate the definition of
a weighted minimum norm solution as follows:
find
where
subject to
(4)
GORODNITSKY AND RAO: SPARSE SIGNAL RECONSTRUCTION FROM LIMITED DATA USING FOCUSS
in Section VI. It may appear that we superficially induce nonuniqueness of sparse solutions by including the -dimensional
solutions since an underdetermined system is guaranteed to
artifactual -dimensional solutions. We
have at least
show, however, that solutions of dimension
can also be
non-unique. Hence, the definition of sparsity cannot depend on
the uniqueness argument. Rather, the -dimensional solutions
must be included in the definition because they provide valid
minimum support representations.
Sparse solutions also arise in LP and -norm minimization
problems, and we borrow some useful terminology from that
area.
Definition [30]: Given a set of simultaneous linear equabe any nonsingular
tions in unknowns (1), let
submatrix made up of columns of . Then, if all
components of not associated with the columns of
are
set equal to zero, the solution to the resulting set of equations
is said to be a basic solution to (1), with respect to the basis
. The components of associated with the columns of are
called basic variables. If one or more of the basic variables
in a basic solution has value zero, the solution is said to be
a degenerate basic solution.
The sparse solutions then are equivalently the basic and
degenerate basic solutions. We also refer to the basic and
degenerate basic solutions as low-dimensional solutions and
to the rest as high dimensional solutions.
B. Uniqueness Conditions for Sparse Solutions
The following uniqueness/non-uniqueness results are derived for systems satisfying the following property.
Unique Representation Property (URP): A system (1) is
said to have the URP if any
columns of
are linearly
independent.
The URP basically guarantees that every basis component
of the real signal is uniquely represented by a column of
. In many problems, the URP can be achieved by using
a sufficiently dense sampling rate to create
that unambiguously captures all of the components of the real signal.
This density does not need to correspond to the Nyquist
frequency used in spectral estimation, as explained below.
In other problems, such as physical tomography problems,
the URP can never be satisfied. An example of such a
problem is the extrapolation of electric currents inside a
volume conductor from externally measured electromagnetic
fields. Even when the sampling set is completely dense,
i.e., the field is completely known everywhere outside the
conducting volume, the current inside the volume cannot be
uniquely found [31]. Given such intrinsic ill-posedness, sparse
solutions, including the maximally sparse solutions, are never
unique. However, depending on the physics, the net effect
of the intrinsic ill-posedness on the uniqueness of sparse
solutions may be limited and must be considered in the context
of an individual problem. For example, in the case of the
electromagnetic extrapolation problem, its effect is limited to
the uncertainty in the neighborhood of each real solution point
[7]. How this affects the uniqueness results for sparse solutions
is discussed in Section VIII, when we present an example of
the neuroimaging problem.
603
604
diag
(6)
Step 3:
Since entries that are zero at initialization remain zero for
all iterations, we assume without loss of generality that the
number of nonzero components in an initial vector, which
defines the dimension of the problem, is always . The final
solutions produced by the algorithm will be denoted by to
differentiate them from the all feasible solutions to the linear
system (1).
Steps 2 and 3 of the algorithm together represent the
weighted minimum norm computation (3). The algorithm is
written in three steps solely for exposition purposes. In the
implementation, all the steps can be combined into one.
To understand how the AST constraint leads to pruning of
the solution space, we consider the objective minimized at
each step
(7)
The relatively large entries in
reduce the contribution of
the corresponding elements of to the cost (7), and vice versa.
Thus, larger entries in
result in larger corresponding
entries in
if the respective columns in
are significant
in fitting as compared to the rest of the columns of .
By starting with some feasible approximate solution to (1),
minimization of (7) gradually reinforces some of the already
prominent entries in while suppressing the rest until they
reach machine precision and become zeros.2 The algorithm
stops when a minimal set of the columns of that describe
is obtained. Note that the algorithm does not simply increase
the largest entries in the initial . In fact, the largest entries
in
can become zeros in the final . Note also that (7) is
never explicitly evaluated in (6). The weights
and the
corresponding subspaces are eliminated from the computation
through the product
.
While the entries of
converge to zero and nonzero values,
the corresponding entries in
converge to zeros or ones,
2 Theoretically, the elements of a solution asymptotically converge to zeros
but never reach zeros. In finite precision, the asymptotically diminishing
elements become zeros.
GORODNITSKY AND RAO: SPARSE SIGNAL RECONSTRUCTION FROM LIMITED DATA USING FOCUSS
2 4 example described
i.e.,
as
, and
as
approach nonzero values. Fig. 1 illustrates the convergence
of the elements of . The example uses a
matrix
and the vector equal to the ninth column of . The correct
sparse solution then is a vector of zeros with a one in the ninth
entry. Each line in Fig. 1 shows an element of the vector
as a function of the iteration index from initialization to
the fifth iteration. The ninth element
converges to one,
whereas the rest become zero. This indicates that the desired
sparse solution
with only one nonzero element in the ninth
position was found. The minimum norm solution (2) was used
for the initialization. Note that the ninth element was not the
largest in the initialization.
From our experience, the pattern of change in emerges
after few iterations, from which it is possible to identify the
entries converging to ones and zeros. Significant savings in
computation, as well as better convergence and performance
properties, are gained by eliminating the diminishing entries
of that are indicated by at each iteration. Further savings
can be achieved by implementing a hard thresholding operation to obtain the final result once the convergence pattern
becomes clear. Although for the purposes of the analysis we
do not explicitly include these truncation operations in the
algorithm, they should always be an integral part of FOCUSS
implementation.
B. General FOCUSS
We extend basic FOCUSS into a class of recursively constrained optimization algorithms by introducing two parameters. In the first extension, we allow the entries of
to be raised to some power , as shown in (8). The second
extension is the use of an additional weight matrixdenoted
which is independent of the a posteriori constraints.
This extension makes the algorithm flexible enough to be used
in many different applications. It also provides a way to input
a priori information. The general form of the algorithm then is
diag
(8)
605
where
denotes the set of all positive integers. For the uses
of the algorithm considered here, it is sufficient to assume
to be constant for all iterations.
In the applications where the positivity constraint
is
imposed, we can expand the range of to all real
.
The lower bound on is explained in Section V-A. The
positivity constraint on can be enforced by incorporating
the step size
into the algorithm, as is done
in many LP methods. The iterative solution then becomes
, where the step size is chosen to keep
all entries of
positive.
More generally, other nondecreasing functions of
can
be used to define the weights in (8), although the need for
more complicated weight functions is not evident for the
applications we have considered.
A cumulative form of the FOCUSS algorithm can be derived by using cumulative a posteriori weights in (8) that
are a function of more than one iteration, e.g.,
diag
. This form may prove to be more robust in
terms of convergence to solutions near the initialization, as was
found to be the case for the neuromagnetic imaging problem.
The convergence analysis of general FOCUSS (8), which is
presented next, is extensible to the cumulative form of the
algorithm.
V. ANALYSIS
We concentrate our analysis on the form (8) of FOCUSS,
unless indicated otherwise. The results are extensible to the
other forms. Since
is constant for all iterations, we assume
that
without affecting the results of the analysis.
The steps of the FOCUSS algorithm always exist and are
unique since the transformation (8) is a one-to-one mapping.
We next consider the global behavior of the algorithm. For
an algorithm to be a useful estimation tool, it must converge
to point solutions from all or at least a significant number
of initialization states and not exhibit other nonlinear system
behaviors, such as divergence or oscillation. Global convergence analysis is used to investigate this behavior. The term
global convergence, however, is sometimes used to imply
convergence to a global minimum, which is not the appropriate
meaning here. To avoid confusion, we use the term fixed
point convergence or absolute convergence to describe the
convergence properties of the algorithm. These terms mean
that an algorithm converges to a point solution from any
starting condition. The term absolute stability has also been
used for this property.
Global convergence analysis is not sufficient to understand
the complete behavior of even an absolutely convergent nonlinear algorithm. Typically, not all the convergence points
form a valid solution set, but this cannot be revealed by the
global convergence analysis alone. This point is sometimes
overlooked. Here, we add local convergence to our analysis to
characterize the different convergence points. We first provide
some background in nonlinear systems to motivate our analysis
steps. This material is a compilation from several sources. For
references, see, for example, [33] and the references therein.
606
A. Global Convergence
Theorem 2: The FOCUSS algorithm (8) is absolutely convergent, i.e., for any starting point , it converges asymptotically to a fixed point. The descent function associated with
the algorithm is
The set of fixed points
of the algorithm are solutions to
that have one
or more zero entries.
Proof: See Appendix.
(9)
subject to
. The modification can be viewed equivalently as using a modified Hessian of
, in which the
signs of its negative eigenvalues are reversed, and the positive
scaling
. Further, the modified Newton search
criteria for constrained minimization of
is
GORODNITSKY AND RAO: SPARSE SIGNAL RECONSTRUCTION FROM LIMITED DATA USING FOCUSS
607
608
original
The factors that control the shape of the basins are the
relative sizes of the entries in the columns of and the total
number of sparse solutions in a given problem, as shown next.
1) Effect of on the Basins: In any minimum norm based
solution the magnitude differences in the entries of different
of a weighted
columns of act analogously to the weights
minimum norm solution. This can be seen as follows. Suppose
we can express matrix as a product of two matrices
and
so that (1) becomes
(15)
is such that the entries in each of its columns span
where
exactly the same range of values.
is then a diagonal matrix
that reflects the size differences between the columns of the
GORODNITSKY AND RAO: SPARSE SIGNAL RECONSTRUCTION FROM LIMITED DATA USING FOCUSS
609
610
GORODNITSKY AND RAO: SPARSE SIGNAL RECONSTRUCTION FROM LIMITED DATA USING FOCUSS
611
(a)
(b)
(c)
(d)
Fig. 4. DOA estimation from single snapshots, with an array of eight unevenly spaced sensors, of three nonstationary sources with DOA [ 46 29 60 ]
for the first snapshot, [ 43 33 56 ] for the second snapshot, and [ 40 36 52 ] for the third. (a) MVDR estimates from the first snapshot
(dashed line) and from the three snapshots combined (solid line). (b) FOCUSS solution for the first snapshot. (c) FOCUSS solution for the second
snapshot. (d) FOCUSS solution for the third snapshot.
612
(a)
(b)
(c)
(d)
Fig. 5. DOA estimation using the same example as in Fig. 4 with random Gaussian noise added to the data. The SNR is 10 dB with respect to the
weakest source. (a) MVDR estimates using all three snapshots. (b) FOCUSS solution for the first snapshot. (c) FOCUSS solution for the second snapshot.
(d) FOCUSS solution for the third snapshot.
were independent of the MVDR solution that was used for the
initialization. FOCUSS solutions for this example with a SNR
of 10 db are shown in Fig. 5.
B. Neuroimaging
Functional imaging of the brain using the scalp electric
potentials (EEG) or the brain magnetic fields measured outside
the head (MEG) is an extrapolation problem, where the
objective is to find the current inside the head that generates
the measured fields. The problem is physically ill posed as
well as underdetermined. Physical ill posedness means that the
current cannot be determined uniquely even with the absolute
knowledge of the fields outside the head, i.e., when the number
of data is infinite. The only a priori constraint available is the
knowledge that current distributions imaged by EEG/MEG
are spatially compact, i.e., these currents are produced by
synchronous firing of neurons clustered in 1 to 4 cm areas.
Because neuronal activity is highly dynamic, i.e., the intensity
and location of the current flow can change fairly rapidly and
because simple models, such as electric or magnetic dipoles,
do not accurately describe extended and irregularly shaped
areas of activity, EEG/MEG solutions are difficult to model
either parametrically or through cost functions. The problem
is further compounded by complex statistical properties of the
noise and by the physical ill posedness. As a result, the success
of conventional algorithms under realistic conditions has not
GORODNITSKY AND RAO: SPARSE SIGNAL RECONSTRUCTION FROM LIMITED DATA USING FOCUSS
613
(a)
APPENDIX A
PROOF OF THEOREM 2
To show fixed-point convergence, we use the solution set
as defined in Theorem 2 and show that the conditions of the
global convergence theorem [30] hold:
i) The set
is compact.
For the finite-dimensional case, this condition is equivalent
to showing that a sequence of points
generated
by the mapping (8) of the algorithm is bounded.
At each iterative step, the solution is finite; therefore, we
need only to examine the convergence limits of the algorithm.
From ii), the convergence limits of
are the minima of the
descent function
, which occur only when at least one of
the entries of
becomes zero. The limit points of
that
are sparse solutions are clearly bounded. We later show that
the non-sparse limit points are only reachable through special
trajectories for which the limit of convergence has the same
properties as a sparse solution. Therefore, these solutions are
also bounded.
ii) There is a continuous descent function
such that
(b)
Fig. 6. MEG reconstructions of three neuronal ensembles represented by 3,
2, and 4 active nodes, respectively, with a 150 sensor array: (a) Solid black
circles mark the nodes of the three active sites. The weighted minimum norm
solution that includes a compensation for the bias as described in the text is
marked by gray transparent circles. (b) FOCUSS reconstruction of the active
nodes shown in solid black circles.
Thus,
iff
show that
a purely concave function, that is
, we note that
(A.3)
. To
is
outside
when
We can rewrite this as
or
, we have
From
and
(A.4)
(A.2)
614
change in
occurs, that is
(B.6)
Consider an arbitrary point
Since
in a neighborhood of
..
.
, we have
..
.
Substituting this into (A.3), we get
where we assume, without loss of generality, that
is the largest element of
. Note that
..
.
(A.5)
Substituting (A.4) into (A.2), we obtain the desired result
(B.7)
APPENDIX B
PROOF OF THEOREM 3
Consider the system (1) partitioned as follows:
(B.8)
where
has any
columns such that
is
compatible, and
contains the remaining columns. Similarly,
let
and
be the two partitions of the weighted matrix
at an iteration . Then
where
diag
and
diag
. Define
is the
..
.
Similarly
and
In the proof, we will also make use of the following results:
(B.1)
(B.9)
(B.2)
(B.3)
(B.4)
(B.5)
is a projection
The inequality (B.4) holds because
operator.
From the general form of the pseudoinverse of partitioned
matrices [29], together with the fact that
, the th
iterate of the basic algorithm can be written as
and
..
.
In the next iteration, we repeat the same calculations, starting
with , to obtain
and
.
For a sufficiently small , this shows asymptotic convergence from an arbitrary point
to
with at least a
quadratic rate of convergence.
The rate of convergence can be similarly derived for the
general algorithm (8) with
using (B.7) and (B.9). The
powers of
and in
in the general case will be
GORODNITSKY AND RAO: SPARSE SIGNAL RECONSTRUCTION FROM LIMITED DATA USING FOCUSS
615
616
[19] G. Thomas and S. D. Cabrera, Resolution enhancement in timefrequency distributions based on adaptive time extrapolations, in Proc.
IEEE-SP Int. Symp. Time-Frequency Time-Scale Anal., Oct. 1994, pp.
104107.
[20] A. A. Ioannides, J. P. R. Bolton, and C. J. S. Clarke, Continuous
probabilistic solutions to the biomagnetic inverse problem, Inverse
Problems, vol. 6, pp. 523542, 1990.
[21] I. F. Gorodnitsky, B. D. Rao, and J. S. George, Source localization
in magnetoencephalography using an iterative weighted minimum norm
algorithm, in Proc. 26th Asilomar Conf. Signals, Syst. Comput., Oct.
1992, vol. 1, pp. 167171.
[22] R. Srebro, Iterative refinement of the minimum norm solution of the
bioelectric inverse problem, IEEE Trans. Biomed. Eng., vol. 43, pp.
547552, May 1996.
[23] I. F. Gorodnitsky and B. D. Rao, A new iterative weighted norm
minimization algorithm and its applications, in Proc. 6th SP Workshop
Stat. Signal Array Processing, Oct. 1992, pp. 412415.
, A recursive weighted minimum norm algorithm: Analysis and
[24]
applications, in Proc. Int. Conf. Acoustic, Speech Signal Processing,
Apr. 1993, vol. III, pp. 456459.
[25] B. D. Rao and I. F. Gorodnitsky, A novel recurrent network for
signal processing, in Proc. 1993 IEEE Workshop: Neural Net. Signal
Processing, Sept. 1993, pp. 108117.
[26] I. F. Gorodnitsky, A novel class of recursively constrained algorithms
for localized energy solutions: Theory and application to magnetoencephalography and signal processing, Ph.D. dissertation, Univ. California, San Diego, La Jolla, 1995.
[27] B. D. Rao and I. F. Gorodnitsky, Affine scaling transformation based
methods for computing low complexity sparse solutions, in Proc. Int.
Conf. Acoustic, Speech Signal Processing, May 1996.
[28] I. F. Gorodnitsky and D. Beransky, Fast algorithms for biomedical
tomography problems, in Proc. 30th Asilomar Conf. Signals, Syst.
Comput., Nov. 1996.
[29] S. L. Campbell and C. D. Meyer, Jr., Generalized Inverses of Linear
Transformations. London, U.K.: Pitman, 1979.
[30] D. Luenberger, Linear and Nonlinear Programming. Reading, MA:
Addison-Wesley, 1989.