JANUARY 1995 95
Projection Approximation Subspace Tracking
Bin Yang
Abstract Subspace estimation plays an important role in a
variety of modern signal processing applications. In this paper,
we present a new approach for tracking the signal subspace recur
sively. It is based on a novel interpretation of the signal subspace
as the solution of a projection like unconstrained minimization
problem. We show that recursive least squares techniques can
be applied to solve this problem by making an appropriate
projection approximation. The resulting algorithms have a com
putational complexity of 0(11r), where I I is the input vector
dimension and r is the number of desired eigencomponents.
Simulation results demonstrate that the tracking capability of
these algorithms is similar to and in some cases more robust than
the computationally expensive batch eigenvalue decomposition.
Relations of the new algorithms to other subspace tracking
methods and numerical issues are also discussed.
I. INTRODUCTION
UBSPACEbased highresolution methods have been ap
S plied successfully to both temporal and spatial domain
spectral analysis. Typical examples are the multiple signal clas
sification (MUSIC) algorithm [ I ], the minimumnorm method
[2], the ESPRIT estimator [3], and the weighted subspace
fitting (WSF) algorithm [4] for estimating frequencies of sinu
soids or directions of arrival (DOA) of plane waves impinging
an antenna array. Another application is the data compression
based on the KarhunenLoCve (KL) transformation, where a
sequence of data vectors is coded by their principal compo
nents [5]. Implementations of these techniques, however, have
been based on batch eigenvalue decomposition (ED) of the
sample correlation matrix or on singular value decomposition
(SVD) of the data matrix. This approach is unsuitable for
adaptive processing because it requires repeated ED/SVD,
which is a task that is very time consuming.
In order to overcome this difficulty, a number of adaptive
algorithms for subspace tracking has been developed in the
past. Most of these techniques can be grouped into three
families. In the first one, classical batch ED/SVD methods like
QR algorithm, J acobi rotation, power iteration, and Lanczos
method have been modified for use in adaptive processing
[6][ lo]. In the second family, variations of Bunchs rankone
updating algorithm [ 1 I ] such as subspace averaging [12], [ 131
have been proposed. The third class of algorithms considers the
ED/SVD as a constrained or unconstrained optimization prob
lem. Gradient based methods [ 7] , [ 1414 191, GaussNewton
iterations 1201, 121 1, and conjugate gradient techniques [22]
can then be applied to seek the largest or smallest eigenvalues
Manuscript received J uly 3, 1993; revised J uly 4, 1994. The associate editor
coordinating the review of this paper and approving it for publication was Prof.
Henrik V. Sorensen.
The author is with the Department of Electrical Engineering, Ruhr Univer
sity Bochum, Bochum, Germany.
IEEE Log Number 9406922.
and their corresponding eigenvectors adaptively. Recently rank
revealing URV decomposition [23] and rank revealing QR
factorization [24] have been proposed to track the signal or
noise subspace.
From the computational point of view, we may distinguish
between methods requiring O( n2r) , O(n2), O( nr2) , or O( nr )
operations every update, where n is the input vector dimension
and T( T <n) is the number of desired eigencomponents. The
wide range of the computational complexity is due to the fact
that some algorithms update the complete eigenstructure, with
or without the explicit computation of the sample correlation
matrix, whereas other ones track only the signal or the noise
subspace. For example, a straightforward generalization of
the power method [25] to adapt the 7 dominant eigenvectors
updates the sample correlation matrix at each time instant
and then applies one power iteration to it [ 7] , [17]. This
method requires O( n2r) operations. Stewarts URV updating
algorithm [23] tracks both the number of signal sources and
the signal and noise subspaces without an evaluation of the
s,ample correlation matrix. It has a computational complexity
O(n2). The same order of computations is also required by
the parallel method proposed by Moonen er al. [lo], which
updates the SVD by interlaced QR triangularization and J acobi
rotations. The gradient type algorithms track either the signal
or the noise subspace. They demand O( nr) operations for
the gradientascent or gradientdescent step and additional
C)(nr2) operations for the orthonormalization of the eigen
vector estimates. DeGroat [ 131 developed the rankone signal
averaged (ROSA) algorithm. This method averages both the
signal and noise eigenvalues making both subspaces spherical.
A s a result, subspace tracking becomes a noniterative task and
requires only O( nr ) operations every update.
In this paper, we present a new approach for tracking the
signal subspace. It relies on a novel interpretation of the signal
subspace as the solution of an unconstrained minimization
problem. We first discuss some gradient4escent methods.
The rest of the paper is focused on a different approach. We
show that the minimization task simplifies to the well known
exponentially weighted least squares problem by making an
appropriate projection approximation. Recursive least squares
(IRLS) methods can then be applied to track the signal subspace
efficiently. The resulting algorithms have a computational
complexity O( nr ) and we obtain a (not exactly orthonor
mal) basis of the signal subspace or estimates of the signal
eigenvectors.
This paper is organized as follows. In Section 11, the
signal and noise subspaces are defined and some of their
applications in signal processing are briefly described. Section
IKI introduces an unconstrained cost function and proves that
1053587X/95$04.00 0 1995 IEEE
96
IEEE 'TRANSACTIONS ON SlGNAL PROCESSING. VOL. 43, NO. I . JANUARY 1995
it has no local maxima and minima except for a unique
global minimum which corresponds to the signal subspace.
Based on this observation, various adaptive algorithms for
tracking the signal subspace are developed in Section IV.
Section V presents some simulation results to demonstrate the
applicability and the performance of these algorithms.
The following notations are used in this paper. Matrices and
vectors are represented by boldface and underline characters.
The superscripts *, ', and denote complex conjugation,
transposition, and Hermitian transposition, respectively. I is
an identity matrix. 11 . 1 1 is the Euclidean vector norm. E[. ]
and tr[.] denote the expectation and the trace operator. The
notalions {ut.,} and { A, 3} are used to denote matrices with
the elements ut., or block matrices with the blocks At j .
Similarly, diag(d1, . . . ~ dr, ) is a diagonal matrix consisting of
the diagonal elements d, and diag(D1;'. >Dn) is a block
diagonal matrix comprising the diagonal blocks D, . One
operation is defined as one multiplication plus an optional
addition. For simplicity, we will make no distinctions between
real and complex numbers. The inner product of two n x 1
vectors (either real or complex), for example, will be said to
require ri operations.
11. SUBSPACE AND SUBSPACE APPLICATIONS
Let ~ ( t ) E CIL be the data vector observed at the
tth snapshot. In spatial domain spectral analysis, :(t) =
[ xl ( t ) . ... , xr 2( t ) l T consists of the samples of an array
of n sensors. In time domain spectral analysis, :(t) =
[x( t j , z( t  l j , . . . , : G( t  ri +l ) ]T is a vectorofn consecutive
samples of a time series. We assume :(t) to be composed of
T narrowband signal waves impinging an antenna array or
T incoherent complex sinusoids corrupted by additive noise.
It is given by
T
. x( t )  = sz(t)a(w;) +g( t )
i=l
=As ( t ) +~ ( t )
with A =[ g( wl ) ; . ' . g( q. ) ] and ~( t ) =[sl(t),...,s,(t)lT.
A is a deterministic 71 x T matrix and s(t) is a random
source vector with the nonsingular correlation matrix CS =
~ [ s ( t ) ~ ~ ( t ) l . The vector g( wL) =[I , ~ J ~ T . . . . . e ~ ( ~  ~ ) ~ , 1 is
the steering or frequency vector. In frequency retrieval, w, is
the angular frequency of the Ith sinusoid. In array processing,
holds when plane waves impinge a linear uniform sensor array.
Here d is the spacing between adjacent sensor elements, X
is the wavelength, and 0, is the DOA relative to the array
broadside. If the noise ~ ( t ) is spatially white (possibly after
a prewhitening step) with the equal variance c2 and uncorre
lated with s( t ) , the following expression for the observation
correlation matrix yields
c =~ [ ~ ( t ) ~ ~ ( t ) ] =A C ~ A ~ +c 2 ~ .
( 3 )
Let A, and a,( 1 =1. . . , 71) be the eigenvalues and the
corresponding orthonormal eigenvectors of C. In matrix no
tation. C =UCUH with E =tliae(XI. . . . . A  ) and IJ =
[!ul, . . . ,gn]. If r is less than n, the nonincreasingly ordered
eigenvalues are given by
The dominant eigenpairs (A L l 14,) for i =1, . . . , r are termed
the signal eigenvalues and signal eigenvectors while (A, , 2;)
for i =T +1, . . . , 71 are referred to as the noise eigenvalues
and noise eigenvectors. The column spans of
are defined as the signal and noise subspace, respectively.
It is easy to verify that Us and A have the same column
span and the noise subspace is the orthogonal complement
of the signal subspace. This observation motivates the use of
various subspace based high resolution estimators like MUSIC,
minimumnorm, ESPRIT, and WSF.
In data compression, we are interested in coding a sequence
of random vector samples { ~ ( t ) I t =1, . . . , L } by a minimum
amount of memories. Here {.c(t)} may represent a sequence
of images or speech phonemes. The optimum solution to this
problem in the least squares sense is given by the truncated
KL expansion. After computing the SVD of the L x n data
matrix X =[g( l), . . . , .r( L) ] or the ED of the corresponding
sample correlation matrix C =X HX / L , each sample vector
 .c(t) E 43" is coded by a lower dimensional vector
 y(t) =U;&)
( t =1, '. ' , L)
(6)
where US contains the r dominant right singular vectors of
X or equivalently the T dominant eigenvectors of C. Now
the 71 x r matrix US and {y(t) I t =1,. . . . L } are stored
instead of {:(t) I t =1, . . . .%}. The signal reconstruction is
computed by
0
Clearly, subspace applications are characterized by the
Only a few eigenvectors are required. Since the input vec
tor dimension n is often larger than 2r, it is more efficient
to work with the lower dimensional signal subspace than
with the noise subspace.
In many applications, we do not need eigenvalues. EX
ceptions are the WSF estimator and situations in which
the eigenvalues are used to estimate the number of signal
sources T . We assume T to be known in this paper.
Sometimes it is not necessary to know the eigenvectors
exactly. In the MUSIC, minimumnorm, or ESPRIT es
timator, for example, the use of an arbitrary orthonormal
basis of the signal subspace is sufficient.
The fact that we only need one small part of the eigenstructure
enables us to develop subspace tracking algorithms of reduced
fcillowing features:
I,, I 1 1 ' 1  computation and storage requirements.
YANG: PROJ ECTION APPROXIMATION SUBSPACE TRACKING
91
111. A NOVEL SIGNAL SUBSPACE INTERPRETATION
Let g E 43" be a complex valued random vector process
with the correlation matrix C =E[ ggH] . We consider the
following scalar function
J( W) =~ 1 1 ~  WW~ . C~ ~ ' =tr(C)  2 t r ( ~ ~ ~ ~ )
+tr(WHCW. WHW) (8)
with a matrix argument W E Cn Xr ( r <n). Without loss of
generality, we assume W to have full rank r. Otherwise, if
the rank of W is i. <r, W in (8) can always bereplaced by
a full rank 71 x i: matrix I & ' satisfying WWH =WWH.
Note that we do not possess any other constraints on W.
In particular, there is no restriction onthe normof W. Hence
it does not make sense to maximize J( W) because J( W) is
unbounded if the elements of W approach infinity. We are
interested in the minimum of J ( W) . We want to know:
Is there a global minimum of J ( W) ?
What is the relation between this minimum and the signal
Are there any other local minima of J ( W)?
These questions are answered by the following theorems.
Theorem 1: W is a stationary point of J( W) if and only
if W =U,Q where U, E Cnx r contains any r distinct
eigenvectors of C and Q E CrxT is an arbitrary unitary matrix.
At each stationary point, J ( W) equals the sum of eigenvalues
whose eigenvectors are not involved in U, .
subspace of C?
Proof: See the Appendix.
Theorem 2: All stationary points of J( W) are saddle points
except when U, contains the r dominant eigenvectors of C.
In this case, J( W) attains the global minimum.
Proof: See the Appendix.
In the following, some remarks are given to comment on
the
.
.
theorems:
Since J ( W) has a global minimum at which the column
span of W equals the signal subspace and no other
local minima, a global convergence is guaranteed if one
seeks the signal subspace of C by minimizing J ( W) via
iterative methods.
Wedo not possess any constraints on the orthonormality
of the columns of W. The two theorems show that
minimizing J( W) in (8) will automatically result in a
solution W with orthonormal columns. This makes our
signal subspace interpretation a different one from those
in the literature where the orthonormality WHW =I
is often requested explicitly in terms of an optimization
constraint. Yang and Kaveh [ 181, for instance, proposed
to maximize/minimize tr(WHCW) subject to WHW =
I to seek the signal/noise subspace of C. This is a
constrained optimization problem. The result is that one
has to reorthonormalize W or do some similar approxi
mative orthonormalization [ 161. [ 191 after each update (or
periodically) in order to force the algorithms to converge.
In (8), we are dealing with an unconstrained minimization
problem. The use of an iterative algorithm to minimize
J ( W) will always converge to an orthonormal basis
of the signal subspace without any orthonormalization
operations during the iteration.
It is important to note that at the global minimum of
J ( W) , W does not contain the signal eigenvectors.
Instead, we obtain an arbitrary orthonormal basis of the
signal subspace as indicated by the unitary matrix Q
in Theorem I . This is not surprising because J ( W) is
invariant with respect to rotation of the parameter space,
i.e., J( W) =J(WQ) when &QH =I . In other words,
W is not uniquely determined when we minimize J ( W) .
The outer product WWH, however, is unique. It equals
the signal subspace projection matrix.
For the simple onevector (T =1) case, the solution
of minimizing J( W) is given by the most dominant,
normalized eigenvector of C.
IV. SUBSPACE TRACKING
In real applications with the snapshots g( t ) ( t =1 , 2 , . . .),
we are interested in estimating the signal subspace recursively.
Our aim is to develop efficient algorithms which compute the
siignal subspace estimate at the time instant t from the subspace
estimate at t  1 and the new arriving sample vector ~ ( t ) .
Starting from the above signal subspace interpretation, there
exist various possibilities to do this.
A. Gradient Methods
Since (8) describes an unconstrained cost function to be
minimized, it is straightforward to apply the gradientdescent
technique for subspace tracking. The gradient of J( W) with
respect to W is given by (see appendix)
V,J =[2C +CWWH +WWHC]W.
W( t ) =W( t  1)  p[  2C( t ) +C( t ) W( t  1 ) WH( t  1)
(9)
The subspace update can thus be written as
+W( t  l ) WH( t  l )C(t )]W(t  1)
(10)
where p > 0 is a step size to be suitably chosen and
&(t ) is an estimate for the correlation matrix C at the time
instant t. We may usean exponentially weighted or a sliding
window estimate for C( t ) . The simplest choice, however, is
the instantaneous estimate C( t ) =g( t ) gH( t ) as used in the
leastmeansquare (LMS) algorithm for adaptive filtering. The
resulting subspace update is given by
 y ( t ) =WH( t  l)z(t);
W( t ) =W( t  1) +p[ 2g( t ) y H( t )   : ( t ) yH( t ) ( 1 1 )
x WH( t  l)W(t  1)  W( t  l )y(t)yH(t)].  
A careful counting of the operations required shows that this
algorithm has a computational complexity O( nr ) .
Wenote that a further simplification of the above algorithm
can be achieved by approximating WH( t  1)W(t  1) in
( 1 13 by the identity matrix I :
(12)
This approximation is justified by our observation that W( t )
will converge to a matrix with orthonormal (when p =
p ( t ) t2m 0) or nearly orthonormal (when p =const. small)
W( t ) =W(t  1) +, f . [ X( t )  W( t  l)y(t)]gH(t). 
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. I, JANUARY 1995
columns for stationary signals. If we consider the onevector
( T =1) case of (12), we obtain
 ~ ( t ) =~ ( t  1) +&(t)  g(t  l )y(t)]y*(t) (13)
with y ( t ) =@( t  l ) ~( t) . This update formula is identical to
the Oja learning rule [ 151, [ 191designed for extracting the first
principal component by means of a single linear unit neural
network.
B. PAST and PASTd Algorithms
Although the applicability of both gradient based subspace
update schemes (1 1) and (12) has been verified in our sim
ulations (not reported in this paper), it is not the aim of this
paper to study them in details. Our main interest in this work
is focused on a different, the so called projection approxima
tion subspace tracking (PAST) approach [ 261. Replacing the
expectation in (8) with the exponentially weighted sum yields
t
J(W(1)) =Z/Fyx(i)  W(t)WH(L)z(i)J12
i =l
=tr[C(t)]  2tr[WH(t)C(t)W(t)]
+t ~[ WH( t ) G( t ) W( f ) WH( ~, ) W( t ) ] . (14)
All sample vectors available in the time interval 1 5 i 5 t are
involved in estimating the signal subspace at the time instant
t. The use of the forgetting factor 0 <B 5 1 is intended to
ensure that data in the distant past are downweighted in order
to afford the tracking capability when the system operates in a
nonstationary environment. [j =1 corresponds to the growing
sliding window case. The effective window length for p <1
is 1/(1  [I) when t >> 1.
.J(W(t)) in (14) is obviously identical to J( W) in (8)
except for the use of the exponentially weighted sample
correlation matrix
t
C(L) =x/ j t  i z ( i ) gH( i ) =/K'(t  1) +g( t ) gH( t ) (15)
instead of C =E[:c gH] . Therefore both theorems in Section
111also hold for J( W( t ) ) . In other words, the columns of W( t )
which minimizes J ( W( t ) ) form an orthonormal basis of the
signal subspace spanned by the T dominant eigenvectors of
.I( W( t ) ) is a fourthorder function of the elements of W( 0 .
Iterative algorithms are thus necessary to minimize J( W( t ) ) .
The key issue of the PAST approach is to approximate
WH( t ) x( i ) in (14), the unknown projection of ~ ( i ) onto the
columns of W( t ) by the expression ?j ( i ) =WH( i  l)g(,/),
which can be calculated for 1 5 i 5t at the time instant t.
This results in a modified cost function
i=l
C( t ) .
t
J' ( W( / ) ) = /j ' ' l Ix(i )  W(t)y(i)l12  (16)
I =1
which is quadratic in the elements of W( t ) .
This projection approximation, hence the name PAST,
changes the error performance surface of J(W(t )). For
stationary or slowly varying signals, however, the difference
TABLE I
THE PAST ALGORITHM FOR TRACKING THE SIGNAL SUBSPACE
Choose P(0) and W(0) suitably
F ORt= 1,2, ... DO
 y ( t ) =W H ( t  l)c(t)
 h. (t) =P( t  1)&)
 d t ) =h(t)/[P +rH(t)h(2)1
d t ) =c(t)  W(t  l)y(t)
W( t ) =W(t  1) +e ( t ) f ( t )
P(t) =Tri 1 (P(t  1)  g( t ) hH( t ) }
P
between WH( t ) &( i ) and WH( i  l )g(i ) is small, in particular
when i is close to t. This difference may be larger in the distant
past with i <<1. But the contribution of past data to the cost
function is decreasing for growing t. We therefore expect
,I' (W(t)) to be a good approximation for J( W( t ) ) and the
matrix W( t ) minimizing J' ( W( t ) ) to be a good estimate for
the signal subspace of C( t ) . In case of sudden parameter
changes, the algorithms derived from the PAST approach also
converge as will be illustrated in some numerical experiments.
The main advantage of the PAST approach is the expo
nentially weighted least squares criterion (16), which is well
studied in adaptive filtering. J' ( W( t ) ) is minimized if
A recursive computation of the R x T matrix C,,(t) and the
T x T matrix Cyy(t) requires O( nr ) and O(?) operations.
The computation of W( t ) from C, , (t ) and C,,(t) demands
additional O( nr 2) +O(r' ) operations. A more efficient and
numerically more robust way is to apply the matrix inversion
lemma to compute the inverse of C,,(t) or to use the QR
updating technique to calculate the Cholesky factor of Cyy ( t )
recursively. This results in various RLS algorithms [27], [28].
Since these algorithms are known to most readers, we just use
one of them for updating W( t ) without derivations.
Table I summarizes the socalled PAST algorithm for track
ing the signal subspace. The operator Tri{.} indicates that
only the upper (or lower) triangular part of P( t ) =C; i (t )
is calculated and its Hermitian transposed version is copied to
the another lower (or upper) triangular part. This RLS scheme
reduces the number of operations and preserves the Hermitian
symmetry of P( 1 ) in presence of rounding errors.
The PAST algorithm requires 3711. +O( T' ) operations every
update. One reaEon for this low computational complexity
YANG: PROJ ECTION APPROXIMATION SUBSPACE TRACKING
99
is that all columns of W( t ) share a common gain vector
 g( t ) =Ci i ( t ) y( t ) . Once it has been computed via O( r 2)
operations, the n x r matrix W( t ) can be updated by a rankone
modification using O(nr) operations.
The initial values P(0) and W( 0) have to be chosen
suitably. P( 0) must be a Hermitian positive definite matrix.
W( 0) should contain r orthonormal vectors. Both matrices can
be calculated from an initial block of data or from arbitrary
initial data. The simplest way, however, is to set P(0) to the
r x r identity matrix and the columns of W(0) to the r leading
unit vectors of the n x n identity matrix. The choice of these
initial values affects the transient behavior but not the steady
state performance of the algorithm.
We note that the PAST algorithm is derived by minimiz
ing the modified cost function J(W(t)) in (16) instead of
the original one J ( W( t ) ) in (14). Hence, the columns of
W( t ) are not exactly orthonormal. The deviation from the
orthonormality depends on the signaltonoise ratio (SNR)
and the forgetting factor 8. This lack of orthonormality does
not mean that we need a reorthonormalization of W( t ) after
each update. In many subspace tracking algorithms, such
an orthonormalization (or approximative orthonormalization)
at the expense of O(71r2) operations is required to prevent
an unlimited growth of W(t) in norm and to prevent the
columns of W( t ) from converging to the same eigenvector.
In the PAST algorithm, this is not necessary because the post
multiplication of C,,(t) by Ci i ( t ) in (17) behaves just like
an approximative orthonormalization. The difference is that
this operation can be performed by RLS schemes and requires
only 0 ( 7 t r ) + O ( ~ ~ ) operations. In the simulations in section V,
we will see that the deviation of W( t ) from orthonormality is
small in the PAST algorithm. For the growing sliding window
case (0 =l), W( t ) will converge to a matrix with exactly
orthonormal columns under stationary signal conditions.
For the PAST algorithm, the necessity of orthonormalization
depends solely on the postprocessing method which uses
the signal subspace estimate to extract the desired signal
information. If we are using the ESPRIT method for calcu
lating DOAs or frequencies from the signal subspace, no
orthonormalization is necessary, If we are using the MUSIC or
minimumnorm estimator, for which an orthonormal basis of
the signal subspace is required, W( t ) has to be reorthonormal
ized. In the block processing mode (e.g. estimation and data
compression), we only need to orthonormalize W( t ) after its
last update, not at each intermediate time instant.
In the following, we present a second subspace tracking
algorithm derived from the PAST approach. It is based on
the deflation technique and is referred to as the PASTd
algorithm. The basic idea of the deflation technique is the
sequential estimation of the principal components [ 181, [21].
First the most dominant eigenvector is updated by applying the
PAST algorithm with T =1. Then we remove the projection
of the current data vector ~ ( t ) onto this eigenvector from
 x( t ) itself. Because now the second dominant eigenvector
becomes the most dominant one in the updated data vector,
it can be extracted in the same way as before. Applying
this procedure repeatedly, all desired eigencomponents are
estimated sequentially.
TABLE I1
THE PASTD ALGORITHM FOR TRACKM THE SIGNAL SUBSPACE
Table I1 summarizes the PASTd algorithm. Actually, the
main body within the second FOR loop corresponds to the
PAST algorithm in the onevector case r =1. The quantity
di ( t ) plays the same role as the r x r matrix Cy, ( t ) =P ( t )
in Table I and the gain vector g( t ) =Cyi ( t ) g( t ) there
becomes a scalar gain g t ( t ) / d i ( t ) inthe deflation case. The last
equation in Table I1 describes the deflation step. It subtracts the
component of g7 ( t ) along the direction of the ith eigenvector
 wi ( t ) from gi ( t ) .
The PASTd algorithm requires 4nr $ O( r ) operations per
update. The missing of the O(?) term in the computational
complexity has the reason that no r x r matrices have to
be computed. In contrast to the PAST algorithm, this method
enables the explicit computation of the eigencomponents. To
be more specific, ~ , ~ ( t ) is an estimate of the ith eigenvector
of C( t ) and d, (t) is an exponentially weighted estimate of the
corresponding eigenvalue. These eigenvalues may be used to
estimate the number of signals r if it is not known a priori
[29], [30]. On the other hand, the deflation technique causes a
stronger loss of orthonormality between tui@) and a slightly
increased computational complexity if 71 >>T.
It is interesting to compare the PAST and PASTd algorithms
with the gradient methods described in the previous subsection.
For convenience, weconsider the onevector case ( r =1). Both
PAST and PASTd algorithms then simplify to
1
w( t ) =tu(t  1) +[[.(t)  :lu(t . l )y(t)]y*(t) (20)
d( t ) 

with y ( t ) =gH( t  l)c(t) and d ( t ) =pd( t  1) +ly(t)I2.
A comparison with the Oja leaming rule (13) shows that both
equations are identical except for the step size. While the Oja
leaming rule uses a fixed step size p which needs a careful
tuning, (20) implies a time varying, selftuning step size
l /d(t). Because d( t ) is an exponentially weighted estimate of
the corresponding eigenvalue, (20) may also be interpreted as a
gradient method with a power normalized step size. The results
are superior convergence properties and improved robustness
with respect to the conditioning of the input signal.
I00 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. 1 , JANUARY 1995
C. Some Modifications of the PAST Algorithm
In this subsection, we briefly discuss two modifications of
the PAST algorithm.
Equation (14) describes an exponentially weighted sum. For
sudden signal parameter changes, the use of a sliding window
version of the cost function may offer faster tracking. Follow
ing the same idea of the PAST approach, we approximate
t
J( W( t ) ) = I ldi)  W(t)WH(t)Z(i)1l2 (21)
i=t1+1
by
t
J(W(t)) = Il&)  W(t)l/(i)l12 (22)
i=t1+1
with y(i) =WH( i  1)&(i ). 1 >0 is the length of the sliding
window. ~ ( i ) is assumed to be 0 for ,i 5 0 (prewindowing).
The solution of minimizing J(W(t )) is given by
W( t ) =cz,y(~)c;;(t)* (23)
t
G y ( t ) = lC(i)gH(i) =CZ?J(t  1)
  y ( i ) y H( i ) =Cy&  1)
+7J( t ) yH( t )  7J(t  l )f (t  0.
i=tl+l
+g ( t ) f ( t )  ~ ( t  l ) y H( t   l ) , (24)
t
Cyy(t ) =
i=t1+1
( 25)
W( t ) can be calculated recursively by applying twice the
matrix inversion lemma to (25). This results in a sliding
window PAST algorithm requiring 5 7 w +O(?) operations
per update. Since its derivation is straightforward, we omit an
explicit summary of this algorithm.
A second variation of the PAST algorithm arises when the
applications considered need a perfectly orthonormal basis of
the signal subspace. In this case, we have to reorthonormalize
W( t ) after each update. This means, the PAST algorithm
computes W( t ) =Ortho[C,,(t)C~i (t)] at each time in
stant where the matrices C, , (t ) and Cyy(t ) are defined
in (I S) and (19) and Ortho[.] stands for any procedure to
orthonormalize the columns of its argument. We know that
the postmultiplication of C, , (t ) by Ci l ( t ) does not change
its column span at all. We can thus omit C;d(t) and detine
W( t ) to be Ortho[C,,(t)]. This idea leads to the modified
PAST algorithm
(26)
(27)
W( t ) =Ortho[C,,(t)]. (28)
It requires O(7w2) operations because of the orthonormaliza
tion.
Unfortunately, we found that the modified PAST algorithm
offers poor numerical properties in some situations. The reason
is the bad conditioning of the matrix C, , (t ) when the signal
DOAs or frequencies are closely spaced or the SNR is low.
 y ( t ) =WH( t  1)g(t),
CZ,(t) =W,,(t  1) +a( t ) yH( t ) ,
This in tum causes an increased sensitivity of the recursion
for updating C, , (t ) in (27) with respect to rounding errors.
In contrast, the PAST algorithm is based on the recursive
computation of the matrix W( t ) defined as C, , ( t ) C; ~( t ) . The
postmultiplication of C,, ( t ) by Cci ( t ) forces the columns
of W( t ) to be nearly orthonormal resulting in a good con
ditioning. Updating Cz,(t)C;i ( t ) is hence less sensitive
to rounding errors and the corresponding PAST algorithm
provides more robust subspace estimates. Nevertheless, the
modified PAST algorithm is useful in tracking applications
with well separated signal sources and high SNR.
V. .SIMULATIONS
In the following, we show some simulation results to
demonstrate the applicability and the performance of the
PAST and PASTd algorithms. We test them in two subspace
applications, DOA tracking and image compression. Though
we have also done experiments with a large number of
other subspace tracking algorithms, it is not the aim of this
paper to give a comparative study. The large number of
subspace tracking algorithms available in the literature and
the extreme difficulty in comparing their convergence rate,
steady state performance, accuracy, and robustness from a
theoretical point of view are two reasons which make a
dedicated publication necessary. The only reference used in
this paper for comparison is the exact ED. After the sample
correlation matrix C( t ) has been updated via (13, its complete
ED is computed by a standard batch method for obtaining the
signal subspace.
A. DOA Tracking
We used a linear uniform array with n =9 sensors.
The distance between adjacent sensor elements is half of the
wavelength. Hence, (2) simplifies to 2r f i =w; =~si nt) ,
where 19i is the DOA of the ith signal source and f i with
l f i l 5 1/2 is the corresponding frequency. For simplicity,
we work with the frequency f i instead of Bi . For a given
number of sources T, f ;(Z =1, . . . ~ T ) , and SNR, a sequence
of snapshots :(t)(t =1,2, . . .) is generated according to the
signal model ( 1) and used to track the signal subspace. After
each subspace update, we apply TLSESPRIT [ 3] to compute
f; from the signal subspace estimate. For the startup, we do
not use any a priori information. The initial values, P(0) and
W( 0) in the PAST algorithm, W( 0) in the PASTd algorithm,
and C(0) for the ED are set to identity matrices or their leading
submatrices. The initial values of the eigenvalue estimates
di ( 0) in the PASTd algorithm are chosen to be one. We
note that throughout the experiments no orthonormalization
of W( t ) is used unless specified explicitly.
In the first experiment, two sources have linearly time
varying frequency tracks. They start at 0.2 and 0.3, cross
at 0.25, and finish at 0.3 and 0.2 over a span of 1000
snapshots. They have an SNR of 5 dB. The third source has a
constant DOA with f =0.2 and 0 dB SNR. The forgetting
factor is set equal to 0.97. Fig. 1 shows the simulation
results. It contains four plots labeled (a) to (d). Fig. l(a)
shows the results obtained by updating the signal subspace
YANG: PROJECTION APPROXIMATION SUBSPACE TRACKING
0.1
0
.
VI 0
3
2 0.1
101
~

0.5
0.4
I
0.1
.
; 0 
2 0.1.
0
0.41
I
i
 PAST
...... PASTd
.
... PAST with reorthonormalization 1
PASTd with reorthonormalization 1
I
1
0.3
0.4
0.3 .
 i
 i
I
I
0.2 e
1
90, r I . , .
8ol
5 6 0
s
 VI 71 <n

.
' C p 1/ 30
201
111 I i
1
i
Ti me
(C)
Fig. I . Tracking slowly varying and crossing signals.
via ED. The dotted lines give the true frequency tracks. As
expected, the algorithm fails to work when both signals cross.
Fig. l(b) depicts the results obtained by the PAST and PASTd
algorithms. It also shows the results of both algorithms with a
reorthonormalization of W( t ) after each update. We observe
that the difference between the four algorithms in Fig. l(b) is
vanishing. In particular, the signal subspace estimated by the
PAST and PASTd algorithms is as good as that given by the
PAST and PASTd algorithms with reorthonormalization. This
demonstrates that no orthonormalization is necessary during
the subspace tracking. In addition, the PAST and PASTd
algorithms provide much more robust estimates than ED.
This phenomenon can be explained as follows. The sample
correlation matrix C( t ) contains both signal and noise infor
mation. While computing its ED, there is a need to separate
them. Hence, the accuracy of the estimated signal subspace
strongly depends on the separation between the signal and the
noise subspace as represented by the gap between the smallest
signal eigenvalue A,. and the largest noise eigenvalue
If the signals get closer, A, approaches the level of Ar  l .
To be more specific, Lee [31] has shown that A,.  is
asymptotically proportional to SO2 as the DOA separation 66'
of two sources approaches zero. From the numerical point
0.3
0.2 w
0.2
:::I, , , , , , , , , 1
100 200 300 400 500 600 700 800 900 IO00
OS0
Ti me
(b)
90
80 
70
 PAST
.__... PASTd

PAST with reorthonormalization
PASTd with reorthonormalization
....
0 60
I?
8 50

bo
 8 4 0
d
p 30
.
E 20
10
' 0 100 200 300 400 500 600 700 800 900 1
Ti me
(d)
of view, eigenvectors with closely spaced eigenvalues are
known to be more sensitive to perturbations like rounding
errors. This results in bad subspace and DOA estimates. In
the PAST and PASTd algorithms, we do not form the sample
correlation matrix C( t ) . Instead, W( t ) is updated recursively
from W( t  1). Hence, there is no explicit need to separate the
signal subspace from the noise one. The accuracy of the current
subspace estimate is affected not only by the eigenvalue gap
but also by the previous subspace estimates. The result is an
improved robustness in resolving closely spaced signals.
For verification, we computed all (r=3) principal angles
between the subspaces [25] spanned by the columns of W( t )
and of the matrix A in the signal model (1). The principal
angles are zero if the subspaces compared are identical.
Fig. l(c) and (d) confirm our argumentation above. For closely
spaced signals, the signal subspace estimate via ED deviates
substantially from the true one while the PAST and PASTd
algorithms work satisfactorily.
The second experiment compares the performance of the
algorithms in tracking weak signals. Two signals have constant
DOA's with fl =0, f2 =0.2, and 15 d13SNR. The forgetting
factor l j is set to 0.97. The DOA estimates in Fig. 2(a) and
the principal angles in Fig. 2(b) reveals the poor performance
102
0.25
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. I , JANUARY 1995
1
i
50 100 150 200 250 300 350 4C
10
Time
(a)
 ED
_.._.. PASTd

90, 1
.___._ PAST, w.97
.... sliding window PAST, 1=33
0.05
0.1
0 50 100 150 200 250 300
Time
(b)
Fig. 2. Tracking weak signals.
of ED again. In contrast, the PASTd algorithm (as well as
the PAST algorithm that is not shown for clarity) is able of
capturing the DOAs after a certain number of snapshots.
The explanation for this observation is similar to the first
experiment. A lower SNR leads to a smaller eigenvalue gap
between the signal and the noise subspace which in tum makes
an accurate estimation of the signal subspace via ED more
difficult.
In the third experiment, we study the convergence behavior
of the PAST and PASTd algorithms in signal scenarios with
sudden parameter changes. While one DOA is constant at
f=0.2, the other one changes from f=O to f=O.l at the time
instant t=100. The SNR is 5 dB. We see in Fig. 3 that the PAST
algorithm has a lower convergence rate than the ED when
using the same forgetting factor. This is easily understood from
the projection approximation approach. W( t ) minimizing the
cost function J( W( t ) ) in (14) has the same convergence rate
as when computed via ED because both methods calculate the
signal subspace from the same sample correlation matrix C( I ).
The projection approximation, i.e., replacing WH (t)a(i ) by
WH( i  l ) g( i ) , adds additional memory for past data to the
PAST algorithm. It hence causes an increased transition time
in tracking parameter changes. In such situations, the sliding
Time
(a)
sliding window PAST, 1=33
Time
(b)
Fig. 3. Tracking sudden DOA changes.
window PAST algorithm is more appropriate. Fig. 3 shows
that the sliding window PAST algorithm with an equivalent
window length 1 =33 x 1/(1  0) converges much faster
than the exponential window algorithm.
In the last experiment, the deviation of W(t) from or
thonormality is investigated. For this purpose, three signals
with f =0.2, 0, and 0.1 are used. In Fig. 4, the Frobenius
norm llWH(t)W(t)  ill^is computed for different values
of SNR and of the forgetting factor /I. In general, a larger
SNR and a larger forgetting factor (a longer window) will lead
to a smaller deviation from orthonormality. In the growing
sliding window case ([j =I ) , W( t ) seems to converge to a
matrix with perfectly orthonormal columns. The explanation
is a decreasing difference between the original cost function
. l(W(t)) in (14) and the modified one J ( W( t ) ) in (16)
for growing t . Minimizing .l(W(t)) is hence asymptotically
equivalent to minimizing J ( W( t ) ) whose solution W( t ) is
known to have orthonormal columns. If <1, we have a
finite window length and the deviation from orthonormality
reaches a nonzero steady state value. Recall that a similar
convergence behavior can also be found in RLS algorithms for
adaptive filtering [27] where the misadjustment is proportional
to (1  O)/(l +S).
YANG: PROJECTION APPROXIMATION SUBSPACE TRACKING
~
103
h
."
a
s
8
i
E
c
.
a
.
E;
. h

E
8
c
6
k
E
c
ca
.
.
d
10'
1
P=1

SNRd d 0 __._._ w.95
.... w.90 7
P=1

SNRd d 0 __._._ w.95
.... w.90 7
10.'
0 20 40 60 80 100 I20 140 160 180 200
Time
(b)
Fig. 4.
factors.
Deviation from orthononnality for different SNR and forgetting
B. Image Compression
In this experiment, image compression is used to test the
PASTd algorithm. For this purpose, we take the 256 x 256
image in Fig. 6(a) with 256 gray levels. It is segmented
into 8 x 8 image blocks and each of them is rearranged
into a 64elements vector. Starting with this sequence of
L=1024 data vectors ~ ( f ) E R6'((t =l ? . . . , L ) , the PASTd
algorithm is used to estimate the T dominant eigenvectors of
the sample correlation matrix C =x f =l x ( t ) z T( t ) / L. The
forgetting factor ,!I is set equal to one and the initial values
of the algorithm are derived from identity matrices. The final
subspace estimate W =W( L ) E W6 4 x r is orthonormalized
and used in place of Us for image coding as described in
Section 11. The reconstructed data vector is given by g( t ) =
WWT ~ ( t ) . For each r, we compute
as a measure for the quality of the reconstructed image and
compare it with the optimum SNR which we obtain if W
is calculated via ED of C or SVD of the data matrix X =
32 
30 
28 
c
26
24 
5
22i ,/ , , , PASTd , , , 1
20 :
l80 5 10 I5 20 25 30 35
r
.... PASTd with renormalization
Fig. 5. SNR of KL image compression via subspace tracking.
TABLE 111
SNR OF KL IMAGE COMPRESSION FOR SOME SELECTED r
I r 1 1 2 4 8 16 32 I
19.11 20.71 24.05 26.90 29.92 33.95
Fig. 5 shows the simulation results. The SNR for some
selected T are given in Table 111. We see that the difference
between the ED and the PASTd algorithm in SNR is less than
1 dB. A renormalization of all columns of W( t ) after each
update at the expense of 2 n ~ +O( T) additional operations
reduces this difference to be less than 0.27 dB over a wide
range of compression rate 1 5 T 5 n/2=32. The performance
of the PASTd algorithm is nearly identical to the exact ED. For
illustration, three reconstructed images with T =8, which are
calculated by ED, PASTd, and PASTd with renormalization,
are compared in Fig. 6.
We note that the PAST algorithm is not suitable for this
kind of applications. In data compression, each eigenvector has
a distinctive contribution to the data coding according to its
eigenvalue. It is hence meaningful to use the highest accuracy
for estimating the most dominant eigenvector and the second
highest accuracy for the computation of the second dominant
eigenvector etc. PASTd supports this sequential computation
while the PAST algorithm just computes an arbitrary linear
c'ombination of the dominant eigenvectors.
Finally, we point out that data compression is in fact a block
operation. From the computational point of view, the use of
tracking algorithms for computing W is only advantageous
than batch ED or SVD if 'r << n. Computing the right
siingular vectors of the L x 71 data matrix X as defined
above requires at least Ln2 +0(n3) operations [25].' In
.clomparison, the PASTd algorithm demands in the order 4L?ir
or 6Lnx (if with reorthonormalization) operations. Therefore,
'The number of operations given in [251 is 2Ln' +1 I d . The reason for
the factor 2 is a different definition of "one operation." [g( l), ' ' ' , g( L) ] T.
104 IEI %E TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. I . JANUARY 1995
(C)
Fig. 6. Original and reconstructed images ( r =8): (a) Original image: (b) us
the PASTd algorithm is only recommended for use if T <
6. Nevertheless, the PASTd algorithm is much simpler to
implement and requires much less memory cells to store the
computed eigenvectors than SVD or ED.
VI. CONCLUSION
In this paper, we introduced a novel interpretation of the
signal subspace as the solution of an unconstrained minimiza
tion problem. Based on this observation, we derived various
adaptive algorithms for tracking the signal subspace recur
sively. After the discussion of some gradientdescent methods,
we focused on the projection approximation approach which
enables the use of RLS techniques to update the subspace
estimate. The PAST algorithm computes an arbitrary basis
of the signal subspace by 3nr +O( r 2 ) operations every
update. The PASTd algorithm is able to update the signal
eigenvectors and eigenvalues via 4n,1. +O( r ) operations.
Both algorithms deliver nearly orthonormal subspace basis
or eigenvector estimates. If a perfectly orthonormal basis is
required by the postprocessing method, a reorthonormalization
is necessary.
Simulations results show that the tracking capability of these
new algorithms are comparable with and in some critical
(d)
ing ED; (c) using PASTd; (d) using PASTd with renormalization.
situations like closely spaced signals and/or low SNR more
robust than the computationally expensive batch ED. In image
compression, we achieve virtually identical performance as the
ED. Relations of the new algorithms to other subspace tracking
methods and numerical issues are also discussed.
The future works are a theoretical analysis of the conver
gence property and a comparative study with other subspace
tracking algorithms.
APPENDIX
PROOFS OF BOTH THEOREMS IN SECTION 111
For simplicity, we first prove that both theorems hold for
real valued data. In this case, the unconstrained cost function
becomes
where both C and W are real. Then we sketch similar proofs
for complex valued data with emphasis focused on the major
differences between the real and the complex valued case.
YANG: PROJECTION APPROXIMATION SUBSPACE TRACKING
I05
A. The Real Valued Case
Lemma: For real symmetric matrices A and B, AB+BA =
0 implies B =0 if A is positive definite.
Proof: Let B =UB. EBU~ be the ED with eigenvalues
AB. Let A =UZAUB ={ ( L i , } . AB +BA =0 implies
then AE, +EgA =0, or in terms of the diagonal elements,
2i i i ; Af =0. Since A is positive definite and hence ai, >0 V
%, we conclude A; =0 V i and B =0.
Proof of Theorem I : Let W = [zl. ... , E, ] and V =
[I, ~ . . . ,Or]. Y, is the gradient operator with respect to gi .
After some calculations. we get
is an approximation of C by taking its first 'r eigencomponents.
Accordingly, the Hessian matrix H can be written as a sum
of two symmetric matrices
1 1 1
 H = HI +H2
2 2 2
=diag( A 11  C +2C, . . . . . A, I  C +2C,)
+{(A, +A, ) 14pT) . (A91
The ED of the block diagonal matrix iH1 is given by
1
2
=UDI UT
1 with
2
1
V.J =[2C +GWWT +WWTG]W.
C, . I= [2C+GWWT+WWTC]I. , ( i = l ; . . . r )
2 D; =diag(A, +A1, . . . . X, +A,..A, (A121
(A2)
(A31
If W =U, Q, where Q is orthogonal and U, contains any
U =diag(U,. . . . U) .
D1=diag(D: . . . . . D;).
(AI 1)
(A131
 A,+l , ' ' ' , A,  A,).
7' distinct eigenvectors of C, it is straightforward to show
VJ =0.
The second term of the Hessian matrix can be expressed by
Conversely, VJ =0 implies
1
2
WT7V.7 =WTGW(WTW  I )
+(WTW  l )WTCW =0.
(A4)
Since both WTGW and WTW are symmetric and WTCW
is positive definite, we conclude WTW = 1 using the
Lemma. Accordingly, V.1 =0 is equivalent to CW =
WWTCW. Let WTCW =QTE,.Q be the ED. We then
obtain CU,. =UJ , . with U, =WQT. Since E, is a
diagonal matrix, the full rank matrix U,. must contain T distinct
eigenvectors of C. In the following, we assume without loss
of generality that the eigenvalues of C are arbitrarily ordered
(in contrast to the nonincreasing order in (4)) and U, . contains
the first T eigenvectors. i.e., U, . =[xl.. ' . .n,.] and .E,. =
diag(X1.. . . , A,,). It is then easy to show
f ,
.J(W) =.1(U,Q) =tr(E)  tr(E,.) = A,. ( A 3
where is the ith unit vector in R". {(A, +A,)g,ge} is not
a diagonal matrix. All elements of the (i .j )th block of this
matrix are zero except for the element A; +A,, at the position
( j , i ) within the block. For illustration, we give one example
for ?1=3 and ~=2:
{(A; +A,)G]!LT}
1 0 0 I 0 0
I n order to compute the eigenvalues of fH2, J acobi rotations
1.251are used to eliminate the pairs of offdiagonal elements
A ; +A,. Note that
;=,+I
1. (A16)
Denoting the whole orthogonal transformation for diagonaliz
ing ( ( X i +Xj)ajuT} by G E WT L T Xn r , we obtain
Proof of Theorem 2: Let H ={EiVT.J} be the nr x r i r '
Hessian matrix of . I( W j with respect to the wdimensional
vector [.IT,... .?J :]~. Using (A2), we get the following
expression for the (1:. ,j)th block of the block matrix H
0  ( Ai +A,)
V 1 VTJ =bj,,(2C +CWWT +WW'G) +?JTC.I;I
21J
1
2
+?$UiC +c?Jj?J: +w, p: c (A61 H2 =UGD2GTiTT (A171
where b , , is the Kronecker delta. An evaluation of this ex with
pression at the stationary point [ E~. . . . .w,.] =[ G~, . . . . %, . I Q
results in
0 2 =diag( 0:. . . . . Di ) ,
(A181
0; =diag((A; +A l ) . . . . : (A; +A j  l ) : 2 ~ , .
iEiVT.J =Cr ; j ( Ai l  G +2Cr) +(A; +Aj)g,uT (A7)
A; +A l +l . . . . . A, +A,..O. ". . O).
(A191
Interestingly, an application of the same J acobi rotations to
the eigenvalue matrix D1 in (A12) does not change it at all,
i.e., GDIGT =D1. The reason is that all J acobi rotations are
where
C,. =U,.E,.UF =Udiag(X1. . . . . A,.. 0. . . . , O)UT
(A8)
I06
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 43, NO. 1, JANUARY 1995
applied to 2 x 2 submatrices of the type [
we could see from the structure of D1. To be more specific
as
Therefore, the Hessian matrix of J( W) evaluated at the
stationary point W =U,Q has the ED
6421)
1
2
 H =( UG) ( D~ +D*)(UG)*
and its n r eigenvalues are given by (A22), which appears at
the top of this page.
Clearly, H is nonnegative definite if and only if A, 2
A,(L =l,...,r;j =r +l . . . . , n) . Since this is the only
local minimum and the steady function J( W) is unbounded
when W approaches infinity, it is also the global minimum.
The zero eigenvalues in (A22) indicate the nonuniqueness
of the matrix W because of the rotational invariance of the
cost function J( W) . However, the signal subspace and the
corresponding projection matrix WWT are unique if A, >
A,(/ =l ;..,r:g =r + 1 , . . . 7 L ) .
B. The Complex Valued Case
Proof of Theorem 1: For complex valued data, we define
Y R J and ol,& to be the gradient operator with respect to the
real and imaginary part of E,. Following [27] and [ 32] , the
complex gradient operator is defined as 0, =i [T7R, z+~GI, L].
Clearly, 0, , , J =~ r , l J =0 V I, is equivalent to V J =0 with
V =[yl, . . . . I,]. After some calculations, we obtain from
(8)
(A23)
which has the same form as the real gradient (A3) except for
a factor 1/2. The proof of Theorem 1 for the complex case is
hence identical to that for real valued data.
V J =[2c +CWWH +WWHC]W
Proof of Theorem 2: We inspect the 2717 x 2717. real Hes
sian matrix H =w T J with 0 = [ o&. ... . Y:,,>
YE,>. . . ,7:,lT. After lengthy but similar computations as
in the real case, the eigenvalue matrix of H evaluated at
stationary points can be calculated to
0 D1 02 O I
(A24)
where D1 and Dz are the same matrices defined in (A12),
(A13), (A18). and (A19). The 2717. eigenvalues of H are thus
given by (A25), which appears at the bottom of this page.
Again, H is nonnegative definite if and only if A, ( 1: =1. . . . , r )
are the dominant eigenvalues. This completes the proof.
REFERENCES
I I ] R. 0. Schmidt, Multiple emitter location and signal parameter es
timation, in Proc. RADC Spectrum Estimarion Workshop, 1979, pp.
243258.
121 R. Kumaresan and D. W. Tufts. Estimating the angles of arrival of
multiple plane waves, IEEE Trar7s. Aarospare Electron. Sysr., vol.
131 R. Roy and T. Kailath, ESPRITstimation of signal parameters via
rotational invariance techniques, IEEE Trans. Acoitsr., Speech, Signal
Prowssing. vol. 37, 1989.
141 M. Viberg. B. Ottersten, and T. Kailath, Detection and estimation
in sensor arrays using weighted subspace fitting, IEEE Trans. Sigrial
Processing, vol . 39. pp. 24362449, 1991.
[SI W. K. Pratt, Digifal I nx~ge Prru~ssi nR.
[6] P. Comon and G. H. Goluh, Tracking il few extreme singular values and
vectors in signal processing, Proc.. IEEE. pp. 13271343, Aug. 1990.
171 N. L. Owsley, Adaptive data orthogonalizaiton. in Proc. IEEE
ICASSP, 1978, pp. 1001 12.
[8] D. W. Tufts and C. D. Melissinos, Simple, effective computation of
principal eigenvectors and their eigenvalues and applications to high
resolution estimation of frequencies, IEEE Trans. Acoust.. Speech.
Signal Proc,e.vsiri,g, vol. ASSP34, pp. 10461053. 1986.
[ 9] K. C. Shaman, Adaptive algorithms for estimating the complete
covariance eifenstructure, in P roc,. IEEE ICASSP (Tokyo, J apan), Apr.
1986, pp. 14011404.
1101 M. Moonen. P. van Dooren, and J . Vandewalle. Updating singular
value decompositions: A parallel implementation, in PWC.. SPlE 4dv.
Algorrrhms Arc.hitr,r.trr/es Si,qria/ Proc essirig (San Diego, CA). Aug.
1989, pp. 8OL91.
AES19. pp. 13k139. 1983.
New York: Wiley, 1978.
0 0 ... 0 A,  A,+1 . A,,  A,
YANG: PROJECTION APPROXIMATION SUBSPACE TRACKING I07
J . R. Bunch, C. P. Nielsen, and D. Sorenson, Rankone modification
of the symmetric eigenproblem, Numeri sche Mathrmatik, vol. 3 1. pp.
3148, 1978.
I. Karasalo, Estimating the covariance matrix by signal subspace
averaging. IEEE Trans. Acoust.. Sprrch. Si,cna/ Pi.orxxyin,e, vol. ASSP
34, p;. i12, 1986.
R. D. DeGroat. Noniterative subsuace tracking, IEEE Trans. Siprial
Procrssing, vol. 40, pp. 571577, 1992.
P. A. Thompson. An adaptive spectral analysis technique for unbiased
frequency estimation in the presence of white noise, in Proc. 13th
Asilnmur Conj: Circ. Syst. Comput., Nov. 1980, pp. 529533.
E. Oja, A simplified neuron model as aprincipal component analyrer,
J . Karhunen and E. Oja, New methods for stochastic approximation of
truncated Karhunen LoCve expansions, in Pr w . 6th Int. Conf: Putt.
Recop. , Oct. 1982, pp. 5Xk553.
J . Karhunen. Adaptive algorithms for estimating eigenvectors of cor
relation type matrices. i n Proc.. IEEE ICASSP (San Diego, CA), Mar.
1984, pp. 14.6.114.6.3.
J . Yangand M. Kaveh, Adaptive eigensubspace algorithms for direction
or frequency estimation and tracking. IEEE Trans. Arousf., Sperch.
Sigfia/ froressiiig. vol. 36, pp. 241251, 1988.
S. Y. Kung, Di,qI/u/ N e w d Pi.oc,essin,?. Englewood Cliffs, NJ : Prentice
Hall. 1993.
V. U. Reddy, B. Egardt. and T. Kailath. Least squares type algorithm
for adaptive implementation of Pisarenkos harmonic retrieval method,
IEEE T r a m Acoust.. Spccdr. Si,q~iu/ Processilr,?, vol. ASSP30, 1982.
S. Bannour and M. R. AzimiSadjadi. An adaptive approach for optimal
data reduction using recursive least squares learning method, in Pwc.
IEEE ICASSP (San Francisco, CA), Mar. 1992, pp. 1129711300.
X. Yang, T. K. Sarkar. and E. Arvas, A survey of conjugate gradient
algorithms for solution of extreme eigenproblems of a symmetric
matrix, IEEE Trans. Ac oust.. Sprer,h. Si~qiial Procrssi n~, vol. 37, pp.
1550 1556, 1989.
G. W. Stewart. An updating algorithm for subspace tracking, IEEE
Trails. Si,qnu/ Procrssrqq. vol. 40. pp. 15351541, 1992.
C. H. Bischof and G. M. Shroff, On updating signal subspaces, IEEE
Trans. Sigiiul Proc.rs.si i i g. vol. 40. pp. 96105. 1492.
J . Math. Bi o., vol. 15. pp. 267273. 1982.
[.!5] G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore,
MD: J ohns Hopkins University Press, 1980, 2nd ed.
[ 261 B. Yang, Subspace tracking based on the projection approach and the
recursive least squares method. in Proc. IEEE KASSP (Minneapolis,
MN), Apr. 1993, pp. IV145IV148.
[ 271 S. Haykin, Adupthv Filter Throrry. Englewood Cliffs, NJ : Prentice
Hall, 1991, 2nd ed.
[ 281 B. Yang and J . F. Bohme, Rotation based RLS algorithms: Unified
derivations, numerical properties and parallel implementations, IEEE
Trotis. Signal Procrssiiig, vol. 40, pp. 11511 167, 1992.
[ 291 M. Wax and T. Kailath. Detection of signals by information theoretic
criteria, IEEE Trans. Acoust.. Speech. Signtrl Processing. vol. ASSP33,
pp. 387392, 1985.
[ 301 B. Yang and F. Gersemsky, An adaptive algorithmof linear computa
tional complexity for both rank and subspace tracking. in Pror.. IEEE
l CASSP (Adelaide), Apr. 1994.
1311 H. B. Lee, Eigenvalues and eigenvectors of covariance matrices for
signals closely spaced in frequency. IEEE Trans. Amust. . Speech.
Signal Proc.essing, vol. 40, pp. 25182535. 1992.
I 321 D. H. Brandwood, A complex gradient operator and its application in
adaptive array theory.Pror. Inst. Elect. Ens:., vol. 130, pp. 1116. 1983.
Bin Yang received the Dipl.Ing. degree in 1986
and the Dr.Ing. degree in 1991, both in electrical
engineering fromthe Ruhr University Bochum. Ger
many.
Since 1986, he has served as a Teaching and
Research staff in the Department of Electrical Engi
neering at the Ruhr University Bochum, Germany.
His current research interests include adaptive signal
processing, array processing, subspace tracking. fast
and parallel algorithms. and neural networks.
a*