Beruflich Dokumente
Kultur Dokumente
Abstract—The performance of sparse signal recovery from In this work, we propose an efficient implementation us-
noise corrupted, underdetermined measurements can be im- ing the fast marginalized likelihood maximization (FMLM)
proved if both sparsity and correlation structure of signals are method [8]. Thanks to the BSBL framework, it can exploit
arXiv:1211.4909v7 [cs.IT] 29 Sep 2013
rich structures. One structure widely used is the block/group which means x has g blocks, and only a few blocks are
sparse structure [2]–[5], which refers to the case when nonzero nonzero. Here di is the block size for the ith block. Moreover,
entries of a signal cluster around some locations. Existing al- for time series data x, the samples within each block are
gorithms exploit such information showed improved recovery usually correlated. To model the block sparse and intra-block
performance. correlation, the BSBL framework [3] suggests to use the
Recently, noticing intra-block correlation widely exists in parameterized Gaussian distribution:
real-world signals, Zhang and Rao [3], [6] proposed the
block sparse Bayesian learning (BSBL) framework. A number p(xi ; γi , Bi ) = N (xi ; 0, γi Bi ). (2)
of algorithms have been derived from this framework, and
with unknown deterministic parameters γi and Bi . γi rep-
showed superior ability to recover block sparse signals or even
resents the confidence of the relevance of the ith block and
non-sparse signals [7]. But these BSBL algorithms are not fast,
Bi captures the intra-block correlation. The framework further
and thus cannot be applied to large-scale datasets.
assumes that blocks are mutually independent. Therefore, we
1 Benyuan Liu, Hongqi Fan and Qiang Fu are with The Science and write the signal model as,
Technology on Automatic Target Recognition Laboratory, National University
of Defense Technology. Changsha, Hunan, P. R. China, 410074. E-mail: p(x; {γi }, {Bi }) = N (x; 0, Γ), (3)
liubenyuan@gmail.com
2 Zhilin Zhang is with The Emerging Technology Lab, Samsung R&D where Γ is a block diagonal matrix with the ith principal
Institute America - Dallas, 1301 E. Lookout Drive, Richardson, TX 75082, diagonal given by γi Bi .
USA. E-mail: zhilinzhang@ieee.org The observation y is obtained by
B. Liu and H. Fan were supported in part by the National Natural Science
Foundation of China under Grant 61101186.
This is a technical report. An extended version was submitted to IEEE
y = Φx + n, (4)
Journal of Biomedical and Health Informatics with the title “Energy Efficient
Telemonitoring of Physiological Signals via Compressed Sensing: A Fast where Φ is a M × N sensing matrix and n is the observation
Algorithm and Systematic Evaluation”. noise. The sensing matrix Φ is an underdetermined matrix
TECHNICAL REPORT. 2
and the observation noise is assumed to be independent and III. T HE FAST M ARGINALIZED B LOCK SBL A LGORITHM
Gaussian with zero mean and variance equal to β −1 . β is also There are several methods to minimize the cost function
unknown. Thus the likelihood is given by (6). In [3], the author provided a bound optimization method
p(y|x; β) = N (Φx, β −1 I). (5) and a hybrid ℓ1 method to derive the update rules for γi and
Bi . In the following we consider to extend the marginalized
The main body of the BSBL algorithm iteratively between likelihood maximization method within the BSBL framework.
the estimation of the posterior p(x|y; {γi , Bi }, β) = N (µ, Σ) This method was used by Tipping et al. [8] for their fast
with Σ , (Γ−1 + ΦT βΦ)−1 and µ , ΣΦT βy and
SBL algorithm and later by Ji et al. [11] for their Bayesian
maximization the likelihood p(y|{γi , Bi }, β) = N (y; 0, C) compressive sensing algorithm.
with C = β −1 I+ ΦΓΦT . The update rules for the parameters
{γi , Bi } and β are derived using the Type II Maximum
Likelihood [3], [10] method, which leads to the following cost A. The Main-body of the Algorithm
function, The cost function (6) can be optimized in a block way. We
L({γi , Bi }, β) = log |C| + y C T −1
y, (6) denote by Φi the ith block in Φ with the column indexes
corresponding to the ith block of the signal x. Then C can be
Once all the parameters, namely {γi , Bi }, β, are estimated, the rewritten as:
MAP estimate of the signal x can be directly obtained from X
the mean of the posterior, i.e., C = β −1 I + Φm Am ΦTm + Φi Ai ΦTi , (13)
m6=i
x = ΣΦT βy. (7)
= C−i + Φi Ai ΦTi , (14)
B. The Extension to the BSBL Framework P
where C−i , β −1 I+ m6=i Φm Am ΦTm . Using the Woodbury
In the original BSBL framework [3], γi and Bi together Identity,
formed the covariance matrix of the ith block xi , which can
be conveniently modeled using a symmetric, positive semi- |C| = |Ai ||C−i ||A−1
i + si |, (15)
definite matrix Ai , C −1
= C−1
−i − C−1 −1
−i Φi (Ai
−1
+ si ) ΦTi C−1
−i , (16)
p(xi ; Ai ) = N (xi ; 0, Ai ). (8) where si , ΦTi C−1 T −1
−i Φi and qi , Φi C−i y. Equation (6) can
The diagonal element of Ai , denoted by Ajj be rewritten as:
i ,
models the
variance of the jth signal xji in the ith block xi . Such variance L = log |C−i | + yT C−1
−i y
parameter Ajji determines the relevance [10] for the signal xi .
j
+ log |Idi + Ai si | − qTi (A−1
i + si )
−1
qi , (17)
In the extension to the BSBL framework, we may conveniently
introduce the term block relevance, defined as the average of =L(−i) + L(i), (18)
the estimated variance of signals in xi ,
where L(−i) , log |C−i | + yT C−1
−i y, and
1
γi , Tr(Ai ). (9) L(i) = log |Idi + Ai si | − qTi (A−1 −1
di i + si ) qi , (19)
The notation block relevance cast the Relevance Vector Ma- which only depends on Ai .
chine [10] as a specialized form of one BSBL variant where Setting ∂L(i)
∂Ai = 0, we have the updating rule
we assume Ai has the following structure,
Ai = s−1 T −1
i (qi qi − si )si . (20)
Ai = diag−1 (A1i , · · · , Adi i ). (10)
This variant of BSBL is for the signals with block structure The block relevance γi is calculated using (9) and the
and each signal in the blocks are spiky and do not correlate correlation structural is inferred from (20) by investigating a
with each other. symmetric matrix Bi calculated as
The off-diagonal elements Ajk i , j 6= k models the covari- Ai
Bi = . (21)
ance of the block signal. It has been shown that exploit such γi
structural information is beneficial in recovering piecewise
The diagonal elements of Bi are regularized to unity (i.e.,
smooth block sparse signals [3], [7]. There are rich classes
diag(Bi ) = 1) to maintain the spectrum of Ai .
of covariance matrix, namely Compound Symmetric, Auto-
Regressive (AR), Moving-Average (MA) etc. These structures
can be inferred during the learning process of Ai . B. Imposing the Structural Regularization on Bi
The signal model and the observation model are built up in As noted in [3], regularization to Bi is required due to
a similar way, limited data. It has been shown [6] that in noiseless cases the
p(x; {Ai }) = N (x; 0, Γ), (11) regularization does not affect the global minimum of the cost
−1 function (6), i.e., the global minimum still corresponds to the
p(y; {Ai }, β) = N (y; Φx, β I). (12)
true solution; the regularization only affects the probability
The parameters {Ai } and β can be estimated from the cost of the algorithm to converge to the local minima. A good
function (6) using the type II maximization method. regularization can largely reduce the probability of local
TECHNICAL REPORT. 3
convergence. Although theories on regularization strategies are β −1 = 10−6 in noiseless simulations, β −1 = 0.1kyk22 in gen-
lacking, some empirical methods [3], [6] were presented. eral noisy scenarios (e.g. SNR < 20 dB), and β −1 = 0.01kyk22
The proposed algorithm is extensible in that it can incorpo- in high SNR scenarios (e.g. SNR ≥ 20 dB).
rate different time-series correlation models. In this paper we
focus on the following forms of correlation models.
1) Simple (SIM) Correlation Model: In the simple (SIM) D. The BSBL-FM algorithm
correlation model, we fix Bi = I(∀i) and ignores the correla- The proposed algorithm (denoted as BSBL-FM) is given in
tion within the signal block. Such regularization is appropriate Fig. 1. Within each iteration, it only updates the block signal
for recovering the signal in the transformed domain, i.e.,
via Fourier or Discrete Cosine transform. In these cases, the
coefficients are spiky sparse and may cluster into a few non- 1: procedure BSBL-FM(y,Φ,η)
zero blocks. Our algorithm using this regularization is denoted 2: Outputs: x, Σ
by BSBL-FM(0). 3: Initialize β −1 = 0.01kyk22
2) Auto-Regressive (AR) Correlation Model: We model the 4: Initialize {si }, {qi }
entries in each block as an AR(1) process with the coefficient 5: while not converged do
ri . As a result, Bi has the following form 6: Calculate A′i = s−1 T −1
i (qi qi − si )si , ∀i
7: Calculate the block relevance γi = d1i Tr(A′i )
Bi = Toeplitz([1, ri , · · · , ridi −1 ]). (22) 8: Inferring Correlation Models B∗i from A′i /γi
where Toeplitz(·) is a MATLAB command expanding a real 9: Re-build A∗i = γi B∗i
vector into a symmetric Toeplitz matrix. Thus the correlation 10: Calculate ∆L(i) = L(A∗i ) − L(Ai ), ∀i
level of the intra-block correlation is reflected by the value of 11: Select the îth block s.t. ∆L(î) = min{∆L(i)}
mi
ri . ri is empirically calculated [3] by ri , m1i , where mi0 (res. 12: Re-calculate µ, Σ, {si }, {qi }
0 13: end while
mi1 ) is the average of entries along the main diagonal (res. the
14: end procedure
main sub-diagonal) of the matrix Bi . This calculation cannot
ensure ri has a feasible value, i.e. |ri | < 0.99. Thus in practice, Fig. 1. The Proposed BSBL-FM Algorithm.
we calculate ri by
mi n mi o that attributes to the deepest descent of L(i). The detailed
ri = sign( i1 ) min 1i , 0.99 .
(23) procedures on re-calculation of µ, Σ, {si }, {qi } are similar
m0 m0
to [8]. The algorithm terminates when the maximum change
Our algorithm using this regularization (22)-(23) is denoted of the cost function is smaller than a threshold η. In the
by BSBL-FM(1). experiments thereafter, we set η = 1e−4 .
3) Additional Average Step: In many real-world applica-
tions, the intra-block correlation in each block of a signal
tends to be positive and high together. Thus, one can further IV. E XPERIMENTS
constrain that all the intra-block correlation of blocks have the
same AR coefficient r [3], In the experiments2 we compared the proposed algorithm
g with the state-of-the-art block based recovery algorithms. For
1X comparison, two BSBL algorithms, i.e., BSBL-BO and BSBL-
r= ri , (24)
g i=1 ℓ1 [3], were used (BSBL-ℓ1 used the Group Basis Pursuit
[12] in its inner loop). Besides, a variational inference based
where r is the average of {ri }. Then, Bi is reconstructed as
SBL algorithm (denoted by VBGS [5]) was selected. It used
Bi = Toeplitz([1, r, · · · , rdi −1 ]). (25) its default parameters. Model-CoSaMP [2] and Block-OMP
[9] (given the true sparsity) were used as the benchmark in
Our algorithm using this regularization (24)-(25) is denoted noiseless situations, while the Group Basis Pursuit [12] was
by BSBL-FM(2). used as the benchmark in noisy situations.
The performance indexes were the normalized mean square
C. Remarks on β error (NMSE) in noisy situations and the success rate
The parameter β −1 is the noise variance in our model. It in noiseless situations. The NMSE was defined as kx̂ −
can be estimated by [3], xgen k22 /kxgen k22 , where x̂ was the estimate of the true signal
M xgen . The success rate was defined as the percentage of
β= . (26) successful trials in total experiments (A successful trial was
Tr[ΣΦT Φ] + ky − Φµk22
defined the one when NMSE≤ 10−5 ).
However, the resulting updating rule is not robust due to the In all the experiments except for the last one, the sensing
constructive and reconstruction nature [5], [8] of the proposed matrix was a random Gaussian matrix, and it was generated
algorithm. In practice, people treat it as a regularizer and in each trial of each experiment. The computer used in the
assign some specific values to it 1 . Similar to [11], we select experiments had 2.5GHz CPU and 2G RAM.
1 For example, one can see this by examining the published codes of the
algorithms in [5], [11]. 2 Available on-line: http://nudtpaper.googlecode.com/files/bsbl fm.zip
TECHNICAL REPORT. 4
1
BSBL−FM(0)
0.9
BSBL−FM(1)
0.8 BSBL−FM(2)
−1 BSBL−BO
Sparsity ρ=K/M
0.7
10 BSBL−L1
Group BP
0.6 VBGS
NMSE
Oracle
0.5
0.4
BSBL−FM(0)
0.3 BSBL−FM(1)
BSBL−FM(2)
0.2 BSBL−BO
VBGS
0.1 Model−CoSaMP
Block OMP
0
0.1 0.2 0.3 0.4 0.5 512 768 1024 1280 2048
indeterminacy δ=M/N N
Fig. 2. Empirical 96% phase transitions of all algorithms. Each point on the
plotted phase transition curve corresponds to the success rate larger than or 2
10
equal to 0.96. Above the curve the success rate sharply drops.
CPU Time
10
FECG (see Fig. 5), and the decomposition was almost the
−1
same as the ICA decomposition on the original recordings.
10
In this experiment we noticed that VBGS took long time to
recover the FECG recordings, and had the largest NMSE.
Average NMSE
15 15
[1] E. Candes and M. Wakin, “An introduction to compressive sampling,”
10
5
10
5
Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 21 –30, march
0
100 200 300 400 500 600 700 800 900 1000
0
100 200 300 400 500 600 700 800 900 1000
2008.
5 4 [2] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde, “Model-based
2
0 0
−2
compressive sensing,” IEEE Transactions on Signal Processing, vol. 56
−5 −4 (4), pp. 1982–2001, 2010.
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
2 2 [3] Z. Zhang and B. Rao, “Extension of SBL algorithms for the recovery
0
−2
0
−2
of block sparse signals with intra-block correlation,” Signal Processing,
−4 −4 IEEE Transactions on, vol. 61, no. 8, pp. 2009–2015, 2013.
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
2 2
[4] M. Yuan and Y. Lin, “Model selection and estimation in regression with
0 0 grouped variables,” J. R. Statist. Soc. B, vol. 68, pp. 49–67, 2006.
−2 −2
[5] S. D. Babacan, S. Nakajima, and M. N. Do, “Bayesian group-sparse
100 200 300 400 500 600 700 800 900 1000 100 200 300 400 500 600 700 800 900 1000
modeling and variational inference,” Submitted to IEEE Transactions
on Signal Processing, 2012.
[6] Z. Zhang and B. D. Rao, “Sparse signal recovery with temporally
Fig. 5. ICA decomposition of the original dataset and the recovered dataset by correlated source vectors using sparse bayesian learning,” IEEE Journal
BSBL-FM(0) (only the first 1000 sampling points of the datasets are shown). of Selected Topics in Signal Processing, vol. 5, no. 5, pp. 912–926, 2011.
The fourth ICs are the extracted FECGs. [7] Z. Zhang, T.-P. Jung, S. Makeig, and B. Rao, “Compressed sensing
for energy-efficient wireless telemonitoring of noninvasive fetal ECG
via block sparse bayesian learning,” Biomedical Engineering, IEEE
Transactions on, vol. 60, no. 2, pp. 300–309, 2013.
Here we repeated the experiment in Section III.B in [7] [8] M. E. Tipping and A. C. Faul, “Fast marginal likelihood maximisation
3 for sparse bayesian models,” in Proceedings of the Ninth International
. using the same dataset, the same sensing matrix (a sparse
Workshop on Artificial Intelligence and Statistics, C. M. Bishop and
binary matrix with the size 256 × 512 and each column B. J. Frey, Eds., Key West, FL, 2003, pp. 3–6.
consisting of 12 entries of 1s with random locations), and [9] Y. C. Eldar, P. Kuppinger, and H. Bolcskei, “Block-sparse signals:
the same block partition (di = 32(∀i)). uncertainty relations and efficient recovery,” IEEE Transaction on Signal
Processing, vol. 58(6), pp. 3042–3054, 2010.
We compared our algorithm BSBL-FM with VBGS, Group- [10] M. E. Tipping, “Sparse bayesian learning and the relevance vector
BP and BSBL-BO. All the algorithms first recovered the dis- machine,” Journal of Machine Learning Research, vol. 1, pp. 211–244,
crete cosine transform (DCT) coefficients θ of the recordings 2001.
[11] S. Ji, Y. Xue, and L. Carin, “Bayesian compressive sensing,” IEEE
according to Transactions on Signal Processing, vol. 56 (6), pp. 2346–2356, 2008.
y = (ΦD)θ (28) [12] E. Van Den Berg and M. Friedlander, “Probing the pareto frontier for
basis pursuit solutions,” SIAM Journal on Scientific Computing, vol. 31,
using y and ΦD, where D was the basis of the DCT transform no. 2, pp. 890–912, 2008.
[13] D. Donoho and J. Tanner, “Observed universality of phase transitions in
such that x = Dθ. Then we reconstructed the original raw high-dimensional geometry, with implications for modern data analysis
FECG recordings according to x = Dθ using D and θ. and signal processing,” Philosophical Transactions of the Royal Society
The NMSE measured on the recovered FECG recordings A, vol. 367, no. 1906, pp. 4273–4293, 2009.
[14] H. Mamaghanian, N. Khaled, D. Atienza, and P. Vandergheynst, “Com-
is shown in Fig. 4. We can see although BSBL-FM had pressed sensing for real-time energy-efficient ECG compression on wire-
slightly poorer recovery accuracy than BSBL-BO, it had less body sensor nodes,” Biomedical Engineering, IEEE Transactions on,
much faster speed. In fact, the ICA decomposition on the vol. 58, no. 9, pp. 2456–2466, 2011.
[15] A. Hyvarinen, “Fast and robust fixed-point algorithms for independent
recovered recordings by BSBL-FM also presented a clean component analysis,” Neural Networks, IEEE Transactions on, vol. 10,
no. 3, pp. 626–634, 1999.
3 Available on-line: https://sites.google.com/site/researchbyzhang/bsbl