w e extend the transient analysis of the earlier chapters to more general filter recursions,
starting with the normalized LMS algorithm.
25.1 NLMS FILTER We thus consider the 6-NLMS recursion for which the data normalization in (22.2) is given by S[Ui] = + llUiIl2 (25.1) (25.2) In this case, relations (22.26)-(22.27) and (22.29) become and we see that we need to evaluate the moments Unfortunately, closed form expressions for these moments are not available in general, even for Gaussian regressors. Still, we will be able to show that the filter is convergent in the mean and is also mean-square stable for step-sizes satisfying p < 2, and regardless of the input distribution (Gaussian or otherwise) - see App. 25.B. We therefore treat the general case directly. Since the arguments are similar to those in Chapter 24 for LMS, we shall be brief. Thus, introduce the M2 x 1 vectors A A u = vec(C), r = vec(R,) 371 Adaptive Filters, by Ali H. Sayed Copyright @ 2008 John Wiley & Sons, Inc as well as the M2 x M2 matrices 372 - wi A CHAPTER 25 DATA- NORMALIZED FILTERS - E llwillz E IIwiIIL E I I ~ i I I $ z u (25.1 1) E I I ~ i l l F ( MZ - l ) u - 2 = A [ ( ; : : l 12) T @ ( E $:,12)] and the M x M matrix (25.5) (25.6) The matrix A is positive-definite, while B is nonnegative-definite - see Prob. V.6. Apply- ing the vec notation to both sides of the expression for C' in (25.3) we find that it reduces to d = F r (25.7) where F is M2 x M2 and given by A F = I - p A + p 2 B (25.8) Moreover, the recursion for E w i can be written as The same arguments that were used in Sec. 24.1 will show that the mean-square behavior of E-NLMS is characterized by the following M2-dimensional state-space model: where 3 is the companion matrix 3= with 0 1 0 0 1 0 0 0 1 0 0 0 1 -Po -p1 -p2 ... -PMz-l ( M2 x M2) Ma-1 p(z) = A det(z1- F ) = xM2 + pkx' k=O and the k-th entry of Y is given by 373 - SECTION 25.2 NLMS FILTER k = 0' 1, . . . , M2 - 1 (25.12) The definitions of {Wi, Y } are in terms of any o of interest, e.g., most commonly, o = q or u = r. It is shown in App. 25.B that any p < 2 is a sufficient condition for the stability of (25.10). Theorem 25.1 (Stability of E-NLMS) Consider the E-NLMS recursion (25.1) and assume the data { d( i ) , ui } satisfy model (22.1) and the independence assumption (22.23). The regressors need not be Gaussian. Then the filter is convergent i n the mean and is also mean-square stable for any p < 2. Moreover, the transient behavior of the filter is characterized by the state- space recursion (25.10)-(25.12), and the mean-square deviation and the ex- cess mean-square error are given by where r = vec(R,) and q = vec(1). The expressions for the MSD and EMSE in the statement of the theorem are derived in a manner similar to (23.51) and (23.56). They can be rewritten as ~ MSD = p20:Tr(SCm,d) and EMSE = p20: Tr ( SEemq ~ where ) S A E ( ufu2 ( 6 + llu%112)2 and the weighting matrices {Cmsd, C, , , , } correspond to the vectors Umsd = (I - F) - l q and gems, = (I - F) - ' r. That is, Cmsd = vec-l(um,d) and c,,,, = vec-'(oemSe) Learning Curve Observe that since E lea(i)I2 = E llt&-. l~l~u, we again find that the time evolution of E iea(i)12 is described by the top entry of the state vector Wi in (25.10)-(25.12) with o chosen as u = r. The learning curve of the filter will be E le(i)i2 = 0," + E lea(i)12. Small Step-Size Approximations Several approximations for the EMSE and MSD expressions that appear in the above theo- rem are derived in Probs. V.18-V.20. The ultimate conclusion from these problems is that for small enough , Y and E, we get 1 2 1 2 EMSE = &E (m) Tr(R,) and MSD = B E (m) 2 - P 2 - P I (25.13) The expression for the EMSE is the same we derived earlier in Lemma 17.1. - 374 Gaussian Regressors CHAPTER 25 NORMALIZED FILTERS If the regressors happen to be Gaussian, then it can be shown that the M2-dimensional state-space model (25.10)-(25.12) reduces to an M-dimensional model - this assertion is proved in Probs. V.16 and V.17. DATA- 25.2 DATA-NORMALIZED FILTERS The arguments that were employed in the last two sections for LMS and 6-NLMS are general enough and can be applied to adaptive filters with generic data nonlinearities of the form (22.2)-(22.3). To see this, consider again the variance and mean relations (22.26)- (22.27) and (22.29), which are reproduced below: (25.14) If we now introduce the M2 x M2 matrices A (E [%IT@1) + (,,, [%I) (25.15) B A E ( [ Z l T @ [ % ] ) (25.16) and the M x M matrix (25.17) then Et3i = (I - pP) Et 3i - l and the expression for C' can be written in terms of the linear vector relation 6' = Fo (25.18) where F is MZ x M2 and given by A F = I - p A + p 2 B (25.19) Let H = [ -:I2 ] ( 2M2 x 2M2) (25.20) Then the same arguments that were used in Chapter 24 will lead to the statement of Thm. 25.2 listed further ahead. The expressions for the MSD and EMSE in the state- ment of the theorem are derived in a manner similar to (23.51) and (23.56). They can be rewritten as I MSD = p2a:Tr(SC,,d) and EMSE = p2a:Tr(SCe,,e) I where S = E ( ~ b ~ i / g ~ [ ~ i ] ) and the weighting matrices {Cmsd, C,,,,} correspond to the vectors Omsd = (I - F) - l q and gemse = (I - F) - ~ T. That is, Cmsd = VeC-l(Om,d) and C,,,, = VeC-l(Oemse) Theorem 25.2 (Stability of data-normalized filters) Consider data normalized adaptive filters of the form (22.2)-(22.3), and assume the data { d( i ) , ui} sat- isfy model (22.1) and the independence assumption (22.23). Then the filter is convergent in the mean and is mean-square stable for step-sizes satisfying 0 < p < min{2/Xm,(P), l/Xmax(A-'B), l/max{X(H) E R'}} where the matrices {A, B, P, H} are defined by (25.15)-(25.17) and (25.20) and B is assumed finite. Moreover, the transient behavior of the filter is characterized by the M2-dimensional state-space recursion Wi = W i - 1 + p 2 g: y , where F is the companion matrix 3= with 0 1 0 0 1 0 0 0 1 ( M2 x M2 ) MZ-1 A p ( z ) = det (s1-F) =s MZ + p k z k k=O denoting the characteristic polynomial of F in (25.19). Also, A w, = for any 0 of interest, e.g., g = q or o = T . In addition, the mean-square deviation and the excess mean-square error are given by 375 SECTION 25.2 DATA- NORMALIZED FILTERS where T = vec(R,) and q = vec(1). - 376 Learning Curve CHAPTER 25 NORMALIZED FILTERS As before, since E lea(i)i2 = E lIGi-1 Ilku, we find that the time evolution of E lea(z)l2 is described by the top entry of the state vector Wi in with o chosen as o = T . The learning curve of the filter will be E le(i)I2 = c: + E lea(i)I2. DATA- Small Step-Size Approximations In Rob. V.39 it is shown that under a boundedness requirement on the matrix B of fourth moments, data-normalized adaptive filters can be guaranteed to be mean-square stable for sufficiently small step-sizes. That is, there always exists a small-enough step-size that lies within the stability range described in Thm. 25.2. Now observe that the performance results of Thm. 25.2 are in terms of the moment matrices { A, B, P}. These moments are generally hard to evaluate for arbitrary input distributions and data nonlinearities g[.). However, some simplifications occur when the step-size is sufficiently small. This is because, in this case, we may ignore the quadratic term in p that appears in the expression for C' in (25.14), and thereby approximate the variance and mean relations by where P is as in (25.17). Using the weighting vector notation, we can write (25.21) (25.22) F = I - p A (25.23) where now A=( P~CN) +( I BP) The variance relation (25.22) would then lead to the following approximate expressions for the filter EMSE and MSD: EMSE = p2aiTr(SCemse) and MSD = p20~Tr(sCmsd) where and the weighting matrices {Cemse, Cmsd} correspond to the vectors oemse = A-'vec( R,)/p and cmsd = A-lvec(I)/p. That is, {Cemse, Cmsd} are the unique solutions of the Lya- punov equations S = E (~tui/g~[~i]) p p z ms d + PCrnsdP = I and p p z e ms e -I- p x e ms e p = Ru It is easy to Verify that Cmsd = p-'P-'/2 so that the performance expressions can be rewritten as I EMSE = p20:Tr(SCemse), MSD = pa:Tr(SP-')/2 1 Remark 25.1 (Filters with error nonlinearities) There is more to say about the transient per- formance of adaptive filters, especially for filters with error nonlinearities in their update equations. This is a more challenging class of filters to study and their performance is examined in App. 9.C of Sayed (2003) by using the same energy-conservation arguments of this part. The derivation used i n that appendix to study adaptive filters with error nonlinearities can also be used to provide an alternative simplified transient analysis for data-normalized filters. The derivation is based on a long filter assumption in order to justify a Gaussian condition on the distribution of the a priori error signal. Among other results, it is shown in App. 9.C of Sayed (2003) that the transient behavior of data-normalized filters can be approximated by an M-dimensional linear time-invariant state-space model even for non-Gaussian regressors. Appendix 9.E of the same reference further examines the learning abilities of adaptive filters and shows, among other interesting results, that the learning behavior of LMS cannot be fully captured by relying solely on mean-square analysis! 0 25.A APPENDIX: STABILITY BOUND Consider a matrix F of the form F = I - pA+p2 B with A > 0, B 2 0, and p > 0. Matrices of this form arise frequently in the study of the mean-square stability of adaptive filters (see, e.g., (25.19)). The purpose of this section is to find conditions on p in terms of { A, B } in order to guarantee that all eigenvalues of F are strictly inside the unit circle, i.e., so that - 1 < X(F) < 1. To begin with, in order to guarantee X(F) < 1 , the step-size p should be such that (cf. the Rayleigh-Ritz characterization of eigenvalues from Sec. B. 1): max z * ( I - p A+ p 2 B) z < 1 / l xl l =l or, equivalently, A - p B > 0. The argument in parts (b) and (c) of Rob. V.3 then show that this condition holds if, and only if, /I < l/Xrnax(A-'B) (25.24) Moreover, in order to guarantee X(F) > -1, the step-size p should be such that min z * ( I - p A+ p 2 B) z > -1 11~11=1 A or, equivalently, G( p) = 21 - p A + pZB > 0. When p = 0, the eigenvalues of G are all positive and equal to 2. As p increases, the eigenvalues of G vary continuously with p. Indeed, the eigenvalues of G(p) are the roots of det[XI - G(p)] = 0. This is a polynomial equation in X and its coefficients are functions of p. A fundamental result in function theory and matrix analysis states that the zeros of a polynomial depend continuously on its coefficients and, consequently, the eigenvalues of G(p) vary continuously with p. This means that G( p) will first become singular before becoming indefinite. For this reason, there is an upper bound on p, say, prnax, such that G( p) > 0 for all p < pmax. This bound on p is equal to the smallest value of p that makes G( p) singular, i.e., for which det[G(p)] = 0. Now note that the determinant of G( p) is equal to the determinant of the block matrix since det ([ I) = det(Z)det(X - WZ-lY) whenever 2 is invertible. Moreover, since we can write we find that the condition det[K(p)] = 0 is equivalent to det(1- pH) = 0, where 377 - SECTION 25.A STABILITY BOUND 378 CHAPTER 25 DATA- NORMALIZED FILTERS In this way, the smallest positive p that results in det[K(p)] = 0 is equal to 1 max{X(H) E Rt} in terms of the largest positive real eigenvalue of H when it exists. The results (25.24X25.25) can be grouped together to yield the condition min (25.25) (25.26) If H does not have any real positive eigenvalue, then the corresponding condition is removed and we only require p < l/Xm,(A-lB). The result (25.26) is valid for general A > 0 and B 2 0. The above derivation does not exploit any particular structure in the matrices A and B defined by (25.16). 25.B APPENDIX: STABILITY OF NLMS The purpose of this appendix is to show that for E-NLMS, any p < 2 is sufficient to guarantee mean-square stability. Thus, refer again to the discussion in Sec. 25.1 and to the definitions of the matrices {A, B, P, F } in ( 254x253) . We already know from the result in App. 25.A that stability in the mean and mean-square senses is guaranteed for step-sizes in the range 1 1 Xmax(A-'B)' max { X( H) E lF+} where the third condition is in terms of the largest positive real eigenvalue of the block matrix, A A/2 - B/ 2 . = [ I - . 0 ] The first condition on p, namely, p < 2/Xma,(P), guarantees convergence in the mean. The second condition on p, namely, p < 1/Xmax(A-'B), guarantees X(F) < 1. The last condition, p < 1/ max {X(H) E R'}. enforces X(F) > -1. The point now is that these conditions on p are met by any p < 2 (i.e., F is stable for any p < 2). This is because there are some important relations between the matrices {A, B, P} in the E-NLMS case. To see this, observe first that the term which appears in the expression (25.6) for P is generally a rank-one matrix (unless ui = 0); it has M - 1 zero eigenvalues and one possibly nonzero eigenvalue that is equal to8 ~ ~ ~ i ~ / ~ / ( ~ + 11ui11*). This eigenvalue is less than unity so that Xmax (-) 5 (25.28) Now recalling the following Rayleigh-Ritz characterization of the maximum eigenvalue of any Her- mitian matrix R (from Sec. B.1): (25.29) *Every rank-one matrix of the form xx', where x is a column vector of size M, has M - 1 zero eigenvalues and one nonzero eigenvalue that is equal to 1 1 ~ 1 1 ~ . we conclude from (25.28) that Applying the same characterization (25.29) to the matrix P in (25.6), and using the above inequality, we find that A x,,,(P) = max x*Px = llxll=1 I 1 (25.30) In other words, the maximum eigenvalue of P is bounded by one, so that the condition p < 2/Xm,,(P) can be met by any p < 2. Let us now examine the condition p < l / Xmax ( A- l B) . Using again the fact that the matrix in (25.27) has rank one, it is shown in Prob. V.8 that 2[(*)TS(*)] E + 1 I ~ ~ 1 l 2 E + llU~ll2 I [(-)T.l] + [rS(-)] (25.31) Taking expectations of both sides, and using the definitions (25.4)-(25.5) for A and B, we conclude that 2B - A I 0 so that the condition p < l / Xmax( A- ' B) can be met by any p satisfying p < 2. What about the third condition on p in terms of the positive eigenvalues of the matrix H? It turns out that p < 2 is also sufficient since it already guarantees mean-square convergence of the filter, as can be seen from the following argument. Choosing C = I in the variance relation (25.3) we get where Obviously, S 5 P so that C' 5 I - 2 p P + p2P and, hence, Now from the result of part (a) of Prob. V.6 we know that R, > 0 implies P > 0. We also know from (25.30) that A,,,(P) < 1. Therefore, all the eigenvalues of P are positive and lie inside the open interval ( 0, l ) . Moreover, over the interval 0 < p < 2, the following quadratic function of p, k( p) e 1 - 2px +pZX I - 2 p ~ + /J'P I [I - 2p~mi n( p) + / ~ ~ x r n i n ( p ) ] ~ assumes values between 1 and 1 - X for each of the eigenvalues X of P. Therefore, it holds that from which we conclude that 379 - SECTION 25.B STABILITY OF NLMS where the scalar coefficient (Y = 1 - 2pAmin(P) + p2X,i,(P) is positive and strictly less than one for 0 < p < 2. It then follows that E 11.iii112 remains bounded for all i.