Incremental and Robust PCA

Pattern Recognition 37 (2004) 1509 1518 www.elsevier.
com/locate/patcog
On incremental and robust subspace learning

Yongmin Li
Department of Information Systems and Computing, Brunel University, Uxbridge, Middlesex UB8 3PH, UK Received 7 April 2003; accepted 6 November 2003
Abstract Principal Component Analysis (PCA) has been of great interest in computer vision and pattern recognition. In particular, incrementally learning a PCA model, which is computationally e cient for large-scale problems as well as adaptable to re ect the variable state of a dynamic system, is an attractive research topic with numerous applications such as adaptive background modelling and active object recognition. In addition, the conventional PCA, in the sense of least mean squared error minimisation, is susceptible to outlying measurements. To address these two important issues, we present a novel algorithm of incremental PCA, and then extend it to robust PCA. Compared with the previous studies on robust PCA, our algorithm is computationally more e cient. We demonstrate the performance of these algorithms with experimental results on dynamic background modelling and multi-view face modelling. ? 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Principal Component Analysis; Incremental PCA; Robust PCA; Background modelling; Mmulti-view face modelling
1. Introduction Principal Component Analysis (PCA), or the subspace method, has been extensively investigated in the eld of computer vision and pattern recognition [14]. One of the attractive characteristics of PCA is that a high-dimensional vector can be represented by a small number of orthogonal basis vectors, i.e. the principal components. The conventional methods of PCA, such as singular value decomposition (SVD) and eigen-decomposition, perform in batch mode with a computational complexity of O(m3 ) where m is the minimum value between the data dimension and the number of training examples. Undoubtedly these methods are computationally expensive when dealing with large-scale problems where both the dimension and the number of training examples are large. To address this problem, many researchers have been working on incremental algorithms. Early work on this topic includes Refs. [5,6].
Tel.: +44-1895-203397; fax: +44-1895-251686. E-mail address: yongmin.li@brunel.ac.uk (Y. Li).
Gu and Eisenstat [7] developed a stable and fast algorithm for SVD which performs in an incremental way by appending a new row to the previous matrix. Chandrasekaran et al. [8] presented an incremental eigenspace update algorithm using SVD. Hall et al. [9] derived an eigen-decomposition-based incremental algorithm. In their extended work, a method for merging and splitting eigenspace models was developed [10]. Recently, Liu and Chen [11] also introduced an incremental algorithm for PCA model updating and applied it to video shot boundary detection. Franco [12] presented an approach to merging multiple eigenspaces which can be used for incremental PCA learning by successively adding new sets of elements. In addition, the traditional PCA, in the sense of least mean squared error minimisation, is susceptible to outlying measurements. To build a PCA model which is robust to outliers, Xu and Yuille [13] treated an entire contaminated vector as an outlier by introducing an additional binary variable. Gabriel and Odoro [14] addressed the general case where each element of a vector is assigned with a di erent weight. More recently, De la Torre and Black [15] presented a method of robust subspace learning based on robust
0031-3203/$30.00 ? 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2003.11.010
1510
Y. Li / Pattern Recognition 37 (2004) 1509 1518
M-estimation. Brand [16] also designed a fast incremental SVD algorithm which can deal with missing/untrusted data; however, the missing part must be known beforehand. Skocaj and Leonardis [17] proposed an approach to incremental and robust eigenspace learning by detecting outliers in a new image and replacing them with values from the current model. One limitation of the previous robust PCA methods is that they are usually computationally intensive because the optimisation problem has to be computed iteratively, 1 e.g. the self-organising algorithms in Ref. [13], the crisscross regressions in Ref. [14] and the expectation maximisation algorithm in Ref. [15]. Although a robust updating was proposed in Ref. [17], the outlier detection is performed by either a global threshold which assumes the same variability over the whole image, or a local threshold by median absolute deviation proposed by De la Torre and Black [15] which is based on an iterative process. This computational ine ciency restricts their use in many applications, especially when real-time performance is crucial. To address the issue of incremental and robust PCA learning, we present two novel algorithms in this paper: an incremental algorithm for PCA and an incremental algorithm for robust PCA. In both algorithms, the PCA model updating is performed directly from the previous eigenvectors and a new observation vector. The real-time performance can be signicantly improved over the traditional batch-mode algorithm. Moreover, in the second algorithm, by introducing a simplied robust analysis scheme, the PCA model is robust to outlying measurements without adding much extra computation (only ltering each element of a new observation with a weight which can be returned from a look-up table). The rest of the paper is organised as follows. The new incremental PCA algorithm is introduced in Section 2. It is then extended to robust PCA in Section 3 as a result of adding a scheme of robust analysis. Applications of using the above algorithms for adaptive background modelling and multi-view face modelling are described in Sections 4 and 5, respectively. Conclusions and discussions are presented in Section 6. 2. Incremental PCA Note that in this context we use x to denote the mean-normalised observation vector, i.e. x=x ;
1
where x is the original vector and is the current mean vector. For a new x, if we assume the updating weights on the previous PCA model and the current observation vector are and 1 , respectively, the mean vector can be updated as
new
+ (1 )x =
+ (1 )x:
(2)
Construct p + 1 vectors from the previous eigenvectors and the current observation vector yi = yp+1 =
i ui ;
i = 1; 2; : : : ; p;
(3) (4)
1 x;
where {ui } and { i } are the current eigenvectors and eigenvalues. The PCA updating problem can then be approximated as an eigen-decomposition problem on the p + 1 vectors. An n (p + 1) matrix A can then be dened as A = [y1 ; y2 ; : : : ; yp+1 ]: (5)
Assume the covariance matrix C can be approximated by the rst p signicant eigenvectors and their corresponding eigenvalues, CU UT ; (6)
np pp np
where the columns of Unp are eigenvectors of C, and diagonal matrix pp is comprised of eigenvalues of C. With a new observation x, the new covariance matrix is expressed by Cnew = C + (1 )xxT Unp
p pp Unp i uu i=1
T + (1 )xxT (7)
T + (1 )xxT :
Substituting Eqs. (3)(5) into Eq. (7) gives Cnew = AAT :

new
(8)
Instead of the n n matrix C , we eigen-decompose a smaller (p + 1) (p + 1) matrix B, B = AT A (9) yielding eigenvectors {vinew } and eigenvalues { satisfy Bvinew =
new new i vi ; new i }
which (10)
(1)
i = 1; 2; : : : ; p + 1:
It is important to distinguish an incremental algorithm from an iterative algorithm. The former performs in the manner of prototype growing from training example 1; 2; : : : to t , the current training example, while the latter iterates on each learning step with all the training examples 1; 2; : : : and N until a certain stop condition is satised. Therefore, for the PCA problem discussed in this paper, the complexity of algorithms in the order from the lowest to highest is incremental, batch-mode and iterative algorithm.
Left multiplying by A on both sides and using Eq. (9), we have AAT Avinew = inew Avinew : (11) Dening uinew = Avinew (12)
1511
and then using Eqs. (8) and (12) in Eq. (11) leads to Cnew uinew =
new new i ui ;
(13)
new i .
i.e., uinew is an eigenvector of Cnew with eigenvalue Algorithm 1 The incremental algorithm of PCA 1: 2: 3: 4: 5: 6: 7: 8: 9: 10:
Construct the initial PCA from the rst q(q p) observations. for all new observation x do Update the mean vector (2); Compute y1 ; y2 ; : : : ; yp from the previous PCA (3); Compute yp+1 (4); Construct matrix A (5); Compute matrix B (9); Eigen-decompose B to obtain eigenvectors {vinew } and eigenvalues { inew }; Compute new eigenvectors {uinew } (12). end for
limitation of these algorithms is that they mostly perform in an iterative way which is computationally intensive. The reason for having to use an iterative algorithm for robust PCA is that one normally does not know which parts of a sample are likely to be outliers. However, if a prototype model, which does not need to be perfect, is available for a problem to be solved, it would be much easy to detect the outliers from the data. For example, we can easily pick up a cat image as an outlier from a set of human face images because we know what the human faces look like, and for the same reason we can also tell that the white blocks in Fig. 5 (the rst column) are outlying measurements. Now if we assume that the updated PCA model at each step of an incremental algorithm is good enough to function as this prototype model, then we can solve the problem of robust PCA incrementally rather than iteratively. Based on this idea, we develop the following incremental algorithm of robust PCA. 3.1. Robust PCA with M-estimation We dene the residual error of a new vector xi by T x x: (14) ri = Unp Unp i i Note that the Unp is dened as in Eq. (6) and, again, xi is mean-normalised. We know that the conventional non-robust PCA is the solution of a least-squares problem 2 min
i
The algorithm is formally presented in Algorithm 1. It is important to note the following: (1) Incrementally learning a PCA model is a well-studied subject [5,6,811,16]. The main di erence between the algorithms, including this one, is how to express the covariance matrix incrementally (e.g. Eq. (7)) and the formulation of the algorithm. The accuracy of these algorithms is similar because updating is based on approximating the covariance with the current p-ranked model. Also, the speed of these algorithms is similar because they perform in a similar way of eigen-decomposition or SVD on the rank of (p + 1). However, we believe the algorithm as presented in Algorithm 1 is concise and easy to be implemented. Also, it is ready to be extended to the robust PCA which will be discussed in the next section. (2) The actual computation for matrix B only occurs for the elements of the (p + 1)th row or the (p + 1)th column since {ui } are orthogonal unit vectors, i.e. only the elements on the diagonal and the last row/column of B have non-zero values. (3) The update rate determines the weights on the previous information and new information. Like most incremental algorithms, it is application-dependent and has to be chosen experimentally. Also, with this updating scheme, the old information stored in the model decays exponentially over time.
ri
=
i j
(rij )2 :
(15)
Instead of sum of squares, the robust M-estimation method [18] seeks to solve the following problem via a robust function (r ): min
i j
(rij ):
(16)
Di erentiating Eq. (16) by k , the parameters to be estimated, i.e. the elements of Unp , we have @r j (rij ) i = 0; k = 1; 2; : : : ; np; (17) @ k i j where (t ) = d (t )= d t is the in uence function. By introducing a weight function (t ) ; (18) w (t ) = t Eq. (17) can be written as @r j w(rij )rij i = 0; k = 1; 2; : : : ; np (19) @ k i j which can be regarded as the solution of a new least-squares problem if w is xed at each step of incremental updating min
i j
3. Robust PCA Recall that PCA, in the sense of least-squared reconstruction error, is susceptible to contaminated outlying measurement. Several algorithms of robust PCA have been reported to solve this problem, e.g. Refs. [1315]. However, the
w(rij )(rij )2 :
(20)
2 In this context, we use subscript to denote the index of vectors, and superscript to denote the index of their elements.
1512
If we dene zij = w(rij )xij ; (21)
then substituting Eqs. (14) and (21) into Eq. (20) leads to a new eigen-decomposition problem min
i
T z z 2: Unp Unp i i
(22)
that PCA actually presents the distribution of the original training vectors with a hyper-ellipse in a subspace of the original space and thus the variation in the original dimensions can be approximated by the projections of the ellipse onto the original space. The next step is to express c, the parameter of Eqs. (23) and (24), with cj = j ; (26) where is a xed coe cient, for example, = 2:3849 is obtained with the 95% asymptotic e ciency on the normal distribution [20]. can be set at a higher value for fast model updating, but at the risk of accepting outliers into the model. To our knowledge, there are no ready solutions so far as to estimate the optimal value of coe cient . We use an example of background modelling to illustrate the performance of parameter estimation described above. A video sequence of 200 frames is used in this experiment. The conventional PCA is applied to the sequence to obtain 10 eigenvectors of the background images. The variation j computed using the PCA model by Eq. (25) is shown in Fig. 1(a). We also compute the pixel variation directly over the whole sequence as shown in Fig. 1(b). Since there is no foreground object appeared in this sequence, we do not need to consider the in uence of outliers. Therefore, Fig. 1(b) can be regarded as the ground-truth pixel variation of the background image. For a quantitative measurement, we compute the ratio of j by Eq. (25) to its ground-truth (subject to a xed scaling factor for all pixels), and plot the histogram in Fig. 1(c). It is noted that (1) the variation computed using the low-dimensional PCA model is a good approximation of the ground-truth, with most ratio values close to 1 as shown in Fig. 1(c); (2) the pixels around image edges, valleys and corners normally demonstrate large variation, while those in smooth areas have small variation. 3.3. The incremental algorithm of robust PCA By incorporating the process of robust analysis, we have the incremental algorithm of robust PCA as listed in Algorithm 2. The di erence from the non-robust algorithm (Algorithm 1) is that the robust analysis (lines 3 6) has been added and x is replaced by z , the weighted vector, in lines 7 and 9. It is important to note the following: (1) It is much faster than the conventional batch-mode PCA algorithm for large-scale problems, not to mention the iterative robust algorithm. (2) The model can be updated online over time with new observations. This is especially important for modelling dynamic systems where the system state is variable. (3) The extra computation over the non-robust algorithm (Algorithm 1) is only to lter a new observation with
It is important to note that w is a function of the residual error rij which needs to be computed for each individual training vector (subscript i) and each of its elements (superscript j ). The former maintains the adaptability of the algorithm, while the latter ensures that the algorithm is robust to every element of a vector. If we choose the robust function as the Cauchy function (t ) = c2 t log 1 + 2 c
2
(23)
where c controls the convexity of the function, then we have the weight function w (t ) = 1 : 1 + (t=c)2 (24)
Now it seems that we arrive at a typical iterative solution to the problem of robust PCA: compute the residual error with the current PCA model (14), evaluate the weight function w(rij ) (24), compute zi (21), and eigen-decompose (22) to update the PCA model. Obviously, an iterative algorithm like this would be computationally expensive. In the rest of this section, we propose an incremental algorithm to solve the problem. 3.2. Robust parameter updating One important parameter needs to be determined before performing the algorithm: c in Eqs. (23) and (24) which controls the sharpness of the robust function and hence determines the likelihood of a measurement being an outlier. In previous studies, the parameters of a robust function are usually computed at each step in an iterative robust algorithm [18,19] or using median absolute deviation method [15]. Both methods are computationally expensive. Here we present an approximate method to estimate the parameters of a robust function. The rst step is to estimate j , the standard deviation of the j th element of the observation vectors {xij }. Assuming that the current PCA model (including its eigenvalues and eigenvectors) is already a robust estimation from an adaptive algorithm, we approximate j with
j
= max
i=1
j i | ui | ;
(25)
i.e. the maximal projection of the current eigenvectors on the j th dimension (weighted by their corresponding eigenvalues). This is a reasonable approximation if we consider
1513
Fig. 1. Standard deviation of individual pixels j computed from (a) the low-dimensional PCA model (approximated) and (b) the whole image sequence (ground-truth). All values are multiplied by 20 for illustration purpose. Large variation is shown in dark intensity. (c) Histogram of the ratios of approximated j to its ground-truth value.
a weight function. If the Cauchy function is adopted, this extra computation is reasonably mild. However, even when more intensive computation like exponential and logarithm involved in the weight function w, a look-up-table can be built for the weight item w() in Eq. (21) which can remarkably reduce the computation. Note that the look-up table should be indexed by r=c rather than r . Algorithm 2 The incremental algorithm of robust PCA 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: Construct the initial PCA from the rst q(q p) observations. for all new observation x do Estimate cj , the parameter of the robust function, from the current PCA (25), (26); Compute the residual error r (14); Compute the weight w(r j ) for each element of x (24); Compute z (21); Update the mean vector (2), replacing x by z ; Compute y1 ; y2 ; : : : ; yp from the previous PCA (3); Compute yp+1 (4), replacing x by z ; Construct matrix A (5); Compute matrix B (9); Eigen-decompose B to obtain eigenvectors {vinew } and eigenvalues { inew }; Compute new eigenvectors {uinew } (12). end for
4. Robust background modelling Modelling background using PCA was rstly proposed by Oliver et al. [21]. By performing PCA on a sample of N images, the background can be represented by the mean image and the rst p signicant eigenvectors. Once this model is constructed, one projects an input image into the p-dimensional PCA space and reconstructs it from the p-dimensional PCA vector. The foreground pixels can then be obtained by computing the di erence between the input image and its reconstruction. Although Oliver et al. claimed that this background model can be adapted over time, it is computationally intensive to perform model updating using the conventional PCA. Moreover, without a mechanism of robust analysis, the outliers or foreground objects may be absorbed into the background model. Apparently this is not what we expect. To address the two problems stated above, we extend PCA background model by introducing (1) the incremental PCA algorithm described in Section 2 and (2) robust analysis of new observations discussed in Section 3. We applied the algorithms introduced in the previous sections to an image sequence in PET2001 data sets. 3 This sequence was taken from a university site with a length of
3 A benchmark database for video surveillance which can be downloaded at http://www.cvg.cs.rdg.ac.uk/PETS2001/pets2001dataset.html
1514
Fig. 2. Sample results of background modelling. From left to right are the original input frame, reconstruction and the weights computed by Eq. (24) (dark intensity for low weight) of the robust algorithm, and the reconstruction and the absolute di erence images (dark intensity for large di erence) of the conventional batch-mode algorithm. The background changes are highlighted by white boxes. Results are shown for every 500 frames of the test sequence.
3061 frames. There are mainly two kinds of activities happened in the sequence: (1) moving objects, e.g. pedestrians, bicycles and vehicles, and (2) new objects being introduced into or removed from the background. The parameters in the experiments are: image size 192 144 (grey level), PCA dimension p = 10, size of initial training set q = 20, update rate = 0:95 and coe cient = 10. 4.1. Comparing to the batch-mode method In the rst experiment, we compared the performance of our robust algorithm (Algorithm 2) with the conventional batch-mode PCA algorithm. It is infeasible to run the con-
ventional batch-mode PCA algorithm on the same data since they are too big to be t in the computer memory. We randomly selected 200 frames from the sequence to perform a conventional batch-mode PCA. Then the trained PCA was used as a xed background model. Sample results are illustrated in Fig. 2. It is noted that our algorithm successfully captured the background changes. An interesting example is that, between the 1000th and 1500th frames (the rst and second rows in Fig. 2), a car entered into the scene and became part of the background, and another background car left from the scene. The background changes are highlighted by white boxes in the gure. The model was gradually updated to re ect the changes of the background.
1515
Fig. 3. The rst three eigenvectors obtained from the robust algorithm (upper row) and non-robust algorithm (lower row). The intensity values have been normalised to [0; 255] for illustration purpose.
Fig. 4. The rst dimension of the PCA vector computed on the same sequence in Fig. 2 using the robust algorithm (a) and non-robust algorithm (b).
In this experiment, the incremental algorithm achieved a frame rate of 5 fps on a 1:5 GHz Pentium IV computer (with JPEG image decoding and image displaying). On the other hand, the xed PCA model failed to capture the dynamic changes of the background. Most noticeably are the ghost e ect around the areas of the two cars in the reconstructed images and the false foreground detection. 4.2. Comparing to the non-robust method In the second experiment, we compared the performance of the non-robust algorithm (Algorithm 1) and robust algorithm (Algorithm 2). After applying both algorithms to the same sequence used above, we illustrate the rst three eigenvectors of each PCA model in Fig. 3. It is noted that the non-robust algorithm unfortunately captured the variation of outliers, most noticeably the trace of pedestrians and
cars on the walkway appearing in the images of the eigenvectors. This is exactly the limitation of conventional PCA (in the sense of least-squared error minimisation) as the outliers usually contribute more to the overall squared error and thus deviate the results from desired. On the other hand, the robust algorithm performed very well: the outliers have been successfully ltered out and the PCA modes generally re ect the variation of the background only, i.e. greater values for highly textured image positions. The importance of applying robust analysis can be further illustrated in Fig. 4 which shows the values of the rst dimension of the PCA vectors computed with the two algorithms. A PCA vector is a p vector obtained by projecting a sample vector onto the p eigenvectors of a PCA model. The rst dimension of the PCA vector corresponds to the projection to the most signicant eigenvector. It is observed that the non-robust algorithm presents a uctuant result,
1516
especially when signicant activities happened during frames 1000 1500, while the robust algorithm achieves a steady performance. Generally, we would expect that a background model (1) should not demonstrate abrupt changes when there are continuous foreground activities involved, and (2) should evolve smoothly when new components are being introduced or old components removed. The results as shown in Fig. 4 depict that the robust algorithm performed well in terms of these criteria, while the non-robust algorithm struggled to compensate for the large error from outliers by severely adjusting the values of model parameters. 5. Multi-view face modelling Modelling face across multiple views is a challenging problem. One of the di culties is that the rotation in depth causes the non-linear variation to the 2D image appearance. The well-known eigenface method, which has been successfully applied to frontal face detection and recognition, can hardly provide a satisfactory solution to this problem as the multi-view face images are largely out of alignment. One possible solution to this problem as presented in Ref. [4] is to build a set of view-based eigenface models, however, the pose information of the faces need to be known and the division of the view space is often arbitrary and coarse. In the following experiments we compare the results of four methods: (1) view-based eigenface method [4], (2) Algorithm 2 (robust), (3) Algorithm 1 (non-robust), and (4) batch-mode PCA. The image sequences were captured using an electromagnetic tracking system which provides the position of a face in an image and the pose angles of the face. The images are in the size of 384 288 pixels and contain faces of about 80 80 pixels. As face detection is beyond the domain of this work, we directly used the cropped face images by the position information provided by the tracking system. We also added uniformly distributed random noise to the data by generating high-intensity blocks with a size of 4 8 pixels at various image positions. Note that the rst 20 frames do not contain generated noise in order to obtain a clean initial model for the robust method. We will discuss this issue in the last section. For method (1), we divide the view space into ve segments: left prole, left, frontal, right, and right prole. So the pose information is used additionally for this method. Five view-based PCA models are trained, respectively, on these segments with the uncontaminated data because we want to use the results of this method as ground-truth for comparison. For methods (2) and (3), the algorithms perform incrementally through the sequences. For method (4), the batch-mode PCA is trained from the whole sequence. The images are scaled to 80 80 pixels. The parameters for the robust method are the same as those in the previous
section: p = 10, q = 20, = 0:95 and = 10. Fig. 5 shows the results of these methods. It is evident that (1) the batch-mode method failed to capture the large variation caused by pose change (most noticeably is the ghost e ect of the reconstructions); (2) although the view-based method is trained from clean data and uses extra pose information, the reconstructions are noticeably blurry owing to the coarse segmentation of view space; (3) the non-robust algorithm corrupted quickly owing to the in uence of the high-intensity outliers; (4) the proposed incremental algorithm of robust PCA performed very well: the outliers have been ltered out and the model has been adapted with respect to the view change.
6. Conclusions PCA is a widely applied technique in pattern recognition and computer vision. However, the conventional batch-mode PCA su ers from two limitations: computationally intensive and susceptible to outlying measurement. Unfortunately, the two issues have only been addressed separately in the previous studies. In this work, we developed a novel incremental PCA algorithm, and extended it to robust PCA. We do not intend to claim that our incremental algorithm is superior to other incremental algorithms in terms of accuracy and speed. Actually the basic ideas of all these incremental algorithm are very similar, and so are their performances. However, we believe that our incremental algorithm has been presented in a more concise way, that it is easy to be implemented, and more importantly, that it is ready to be extended to the robust algorithm. The main contribution of this paper is the incremental and robust algorithm for PCA. In the previous work, the problem of robust PCA is mostly solved by iterative algorithms which are computationally expensive. The reason of having to do so is that one does not know what part of a sample are outliers. However, the updated model at each step of an incremental PCA algorithm can be used for outlier detection, i.e. given this prototype model, one does not need to go through the expensive iterative process, and the robust analysis can be preformed in one go. This is the starting point of our proposed algorithm. We have provided detailed derivation of the algorithms. Moreover, we have discussed several implementation issues including (1) approximating the standard deviation using the previous eigenvectors and eigenvalues, (2) selection of robust functions, and (3) look-up table for robust weight computing. These can be helpful to further improve the performance. Furthermore, we applied the algorithms to the problems of dynamic background modelling and multi-view face
1517
Fig. 5. Sample results of multi-view face modelling. From left to right: original face image, mean vectors and reconstructions of (1) view-based eigenface method, (2) Algorithm 2, (3) Algorithm 1, and (4) batch-mode PCA, respectively. Results are shown for every 20 frames of the test sequence.
modelling. These two applications alone have their own signicance: the former extends the static method of PCA background modelling to a dynamic and adaptive method by introducing an incremental and robust model updating scheme, and the latter makes it possible to model faces of large pose variation with a simple, adaptive model. Nevertheless, we have experienced problems when the initial PCA model contains signicant outliers. Under these circumstances, the assumption (the prototype model is good enough for outlier detection) does not hold, and the model would take long time to recover. Although the model can recover more quickly by choosing a smaller update rate , we argue that the update rate should be determined by applications rather than the robust analysis process. A possible solution to this problem is to learn the initial model using the traditional robust methods. Owing to the small size of initial data, the performance in terms of computation should not degrade seriously.
References
[1] G. Golub, C.V. Loan, Matrix Computations, Johns Hopkins University Press, Baltimore, 1989.
[2] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cognitive Neurosci. 3 (1) (1991) 7186. [3] H. Murase, S.K. Nayar, Illumination planning for object recognition using parametric eigenspaces, IEEE Trans. Pattern Anal. Mach. Intell. 16 (12) (1994) 12191227. [4] B. Moghaddam, A. Pentland, Probabilistic visual learning for object representation, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 137143. [5] P. Gill, G. Golub, W. Murray, M. Saunders, Methods for modifying matrix factorizations, Math. Comput. 28 (26) (1974) 505535. [6] J. Bunch, C. Nielsen, Updating the singular value decomposition, Numer. Math. 31 (2) (1978) 131152. [7] M. Gu, S.C. Eisenstat, A fast and stable algorithm for updating the singular value decomposition, Technical report YALEU/DCS/RR-966, Department of Computer Science, Yale University, 1994. [8] S. Chandrasekaran, B. Manjunath, Y. Wang, J. Winkeler, H. Zhang, An eigenspace update algorithm for image analysis, Graphical Models Image Process. 59 (5) (1997) 321332. [9] P.M. Hall, A.D. Marshall, R.R. Martin, Incremental eigenanalysis for classication, in: P.H. Lewis, M.S. Nixon (Eds.), British Machine Vision Conference, Southampton, UK, 1998, p. 286 295. [10] P.M. Hall, A.D. Marshall, R.R. Martin, Merging and splitting eigenspace models, IEEE Trans. Pattern Anal. Mach. Intell. 22 (9) (2000) 10421049.
1518
Y. Li / Pattern Recognition 37 (2004) 1509 1518 Computer Vision, Vol. 1, Vancouver, Canada, 2001, pp. 362 369. M. Brand, Incremental singular value decomposition of uncertain data with missing values, in: European Conference on Computer Vision, Copenhagen, Denmark, May 2002. D. Skocaj, A. Leonardis, Incremental approach to robust learning of eigenspaces, in: Workshop of Austrian Association for Pattern Recognition, September 2002, pp. 111118. Peter J. Huber, Robust Statistics, Wiley, New York, 1981. F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W.A. Stahel, Robust Statistics, Wiley, New York, 1986. Zhengyou Zhang, Parameter estimation techniques: a tutorial with application to conic tting, Image Vision Comput. 15 (1) (1967) 5976. N. Oliver, B. Rosario, A. Pentland, A Bayesian computer vision system for modeling human interactions, IEEE Trans. Pattern Anal. Mach. Intell. 22 (8) (2000) 831841.
[11] Xiaoming Liu, Tsuhan Chen, Shot boundary detection using temporal statistics modeling, in: IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, May 2002. [12] A. Franco, A. Lumini, D. Maio, Eigenspace merging for model updating, in: International Conference on Pattern Recognition, Quebec, Canada, Vol. 2, August 2002, pp. 156 159. [13] Lei Xu, A. Yuille, Robust principal component analysis by self-organizing rules based on statistical physics approach, IEEE Trans. Neural Networks 6 (1) (1995) 131143. [14] K. Gabriel, C. Odoro , Resistant lower rank approximation of matrices, in: J. Gentle (Ed.), Proceedings of the 15th Symposium on the Interface, Amsterdam, Netherlands, 1983, pp. 307308. [15] F. De la Torre, M. Black, Robust principal component analysis for computer vision. in: IEEE International Conference on
[16] [17] [18] [19] [20] [21]
About the AuthorYONGMIN LI received his B.Eng. and M.Eng. in control engineering from Tsinghua University of China in 1990 and 1992, respectively, and his Ph.D. in computer vision from Queen Mary, University of London in 2001. He is currently a faculty member in the Department of Information Systems and Computing at Brunel University, UK. He was a research scientist in the Content and Coding Lab at BT Exact (formerly the BT Laboratories) from 2001 to 2003. He has several years of experience in control systems, information systems and software engineering during 19921998. His research interest covers the areas of machine learning, pattern recognition, computer vision, image processing and video analysis. He was the winner of the 2001 British Machine Vision Association (BMVA) best scientic paper award and the winner of the best paper prize of the 2001 IEEE International Workshop on Recognition Analysis and Tracking of Faces and Gestures in Real-Time Systems. Dr. Li is a senior member of IEEE.
AN INTEGRATED ALGORITHM OF INCREMENTAL AND ROBUST PCA Yongmin Li, Li-Qun Xu, Jason Morphett and Richard Jacobs Content and Coding Lab, BTexact Technologies, Adastral Park, Ipswich IP5 3RE, UK Email: {Yongmin.Li, Li-Qun.Xu, Jason.Morphett, Richard.J.Jacobs}@bt.com
ABSTRACT Principal Component Analysis (PCA) is a well-established technique in image processing and pattern recognition. Incremental PCA and robust PCA are two interesting problems with numerous potential applications. However, these two issues have only been separately addressed in the previous studies. In this paper, we present a novel algorithm for incremental and robust PCA by seamlessly integrating the two issues together. The proposed algorithm has the advantages of both incremental PCA and robust PCA. Moreover, unlike most M-estimation based robust algorithms, it is computational efcient. Experimental results on dynamic background modelling are provided to show the performance of the algorithm with a comparison to the conventional batch-mode and non-robust algorithms. 1. INTRODUCTION Principal Component Analysis (PCA) has been extensively applied to numerous applications in image processing and pattern recognition. While it can efciently represent high dimensional vectors with a small number of orthogonal basis vectors, the conventional methods of PCA usually perform in batch-mode which is computationally expensive when dealing with large scale problems. To address this problem, there have been several incremental algorithms developed in the previous studies [1, 2, 3, 4]. These algorithms are generally similar in terms of accuracy and speed while the differences are mainly on how to approximate the covariance matrix. In addition, the traditional PCA, in the sense of least mean squared error minimisation, is susceptible to outlying measurements. To enable PCA less vulnerable to outliers, it has been proposed to use robust methods for PCA computation [5, 6, 7]. Unfortunately most of these methods are computationally intensive because the optimisation problem has to be computed iteratively, e.g. the self-organising algorithms in [5], the criss-cross regressions in [6] and the Expectation Maximisation algorithm in [7]. To address the two problems discussed above, we present an integrated algorithm for incremental and robust PCA computing. To our knowledge, these two issues have not yet been addressed together in the previous studies. We will give the detailed mathematical derivation of the algorithm and demonstrate its computational efciency and robustness with experimental results on background modelling. 2. INCREMENTAL AND ROBUST PCA LEARNING In the next two sub-sections, we will present the mathematical derivation of our algorithm for incremental and robust PCA before the abstract algorithm description in Section 2.3. 2.1. Incremental PCA Updating PCA seeks to compute the optimal linear transform with reduced dimensionality in the sense of least mean squared reconstruction error. This problem is usually solved using batch-mode algorithms such as eigenvalue decomposition and Singular Value Decomposition which are computationally intensive when applied to large scale problems where both the dimensionality and number of training examples are large. Incremental algorithms can be employed to provide approximate solutions with simplied computation. Along with the algorithms presented in the previous work [1, 2, 3, 4], we present a novel incremental algorithm in this paper. Given a new observation vector x which has been subtracted by the mean vector , i.e. x=x (1) where x is the original observation vector, if we assume the updating weights on the previous PCA model and the current observation vector are and 1 respectively, the mean vector can be updated as new = + (1 )x = + (1 )x (2) Construct p + 1 vectors from the previous eigenvectors and the current observation vector yi = i ui , i = 1, 2, ..., p (3) y p+1 = 1 x (4) where {ui } and {i } are the current eigenvectors and eigenvalues. The PCA updating problem can then be solved as an eigendecomposition problem on the p +1 vectors. An n (p +1) matrix A can then be dened as A = [y 1 , y 2 , ..., y p+1 ] (5) Assume the covariance matrix C can be approximated by the rst p signicant eigenvectors and their corresponding eigenvalues, C Unp pp UT np (6) where the columns of Us are eigenvectors of C and diagonal matrices s are comprised of eigenvalues of C. With a new observation x, the new covariance matrix is expressed by Cnew = =
i=1
C + (1 )xxT T Unp pp UT np + (1 )xx

p
i uuT + (1 )xxT
(7)
Substituting (3), (4) and (5) into (7) gives Cnew = AAT (8)
Instead of the n n matrix Cnew , we eigen-decompose a smaller (p + 1) (p + 1) matrix B, B = AT A (9) new yielding eigenvectors {v i } and eigenvalues {new } which sati isfy AT Av new = new v new , i i i i = 1, 2, ..., p + 1 (10)
Equation (17) can be written as j j j ri )ri w(ri = 0, k i j min

i j
k = 1, 2, ..., np
(19)
which is exactly the solution of a new least-squares problem

j j 2 w(ri )(ri )
(20)
If we dene Left multiplying by A on both sides, we have AA Dening unew = Av new i i and then using (8) and (12) in (11) leads to C
new T j zi = j w(ri )xj i
(21)
Av new i
new Av new i i
(11) (12)
then substituting (14) and (21) into (20) leads to a new eigendecomposition problem min
i
Unp UT np zi zi
(22)
unew i
new unew i i
(13)
i.e. unew is an eigenvector of Cnew with eigenvalue new . i i Note that the proposed algorithm performs similarly to other algorithms in terms of accuracy and complexity. The difference to other algorithms is how to express the covariance matrix incrementally (Equation (7)). Thus, it at least provides an alternative solution to the problem; and moreover, our algorithm can be interpreted as approximating the original large scale PCA with a small scale PCA on {y 1 , y 2 , ..., y p+1 }, which is more concise and analytical in the way of presentation. It is also important to note that, like other incremental algorithms, the update rate , which determines the weights on the previous information and new information, is application-dependent and has to be chosen experimentally. 2.2. Robust Analysis Recall that PCA is the optimal linear transform in the sense of least squared reconstruction error. When the data used to construct the PCA contain contaminated outlying measurement, the conventional PCA may deviate from the desired solution. As oppose to the iterative algorithms of robust PCA developed previously [5, 6, 7], we present a simplied method as follows. Given a PCA model, the residual error of a vector xi is expressed by ri = Unp UT (14) np xi xi We know that the conventional non-robust PCA is the solution of a least-squares problem1 min
i
ri
=
i j
j 2 (ri )
(15)
Instead of sum-of-squares, the robust M-estimation method [8] seeks to solve the following problem via a robust function (r) min
i j j (ri )
(16)
Differentiating (16) by k , the parameters to be estimated, i.e. the elements of Unp , we have
j (ri ) i j j ri = 0, k
k = 1, 2, ..., np
(17)
where (t) = d(t)/dt is the inuence function. By introducing a weight function (t) w(t) = (18) t
1 In this context, we use subscript to denote the index of vectors, and superscript the index of their elements.
It is important to note that w is a function of the residual error j ri which needs to be computed for each individual training vector (subscript i) and each of its elements (superscript j ). The former maintains the adaptability of the algorithm, while the latter ensures the algorithm is robust to every element of a sample vector. If we choose the robust function as the Cauchy function c2 t (t) = log (1 + ( )2 ) (23) 2 c where c controls the convexity of the function, then we have the weight function 1 (24) w(t) = 1 + (t/c)2 Now it seems we arrive at a typical iterative solution to the problem of robust PCA: compute the residual error with the curj rent PCA model (14), evaluate the weight function w(ri ) (24), compute zi (21), eigen-decompose (22) to update the PCA model, and go back to (14) again... Obviously an iterative algorithm like this would be computationally expensive. The reason why we have to use an iterative algorithm is that we do not know which part of a sample is likely to be outliers. However, if we have a reasonably good prototype model in hand, it is much easier to determine the outliers. In fact, the updated PCA model at each step from an incremental algorithm is good enough to serve for this purpose in most dynamic problems. Based on this idea, we can begin with estimating the parameters of the robust function, and perform the robust PCA updating in a single run. For the robust function (23,24) we choose above, one parameter, c, needs to be determined which controls the sharpness of the robust function and hence determines the likelihood of a measurement being an outlier. In the previous studies, the parameters of a robust function are usually computed at each step in an iterative robust algorithm [8, 9] or using Median Absolute Deviation method [7]. Both of the methods are computationally expensive. Here we use a simplied method. The rst step is to estimate j , the standard deviation of the j th element of the observation vectors {xi }. Assuming that the current PCA model (including its eigenvalues and eigenvectors) is already a robust estimation from an adaptive algorithm, we approximate j with j = maxp (25) i=1 i |uij | i.e. the maximal projection of the current eigenvectors on the j th dimension (weighted by their corresponding eigenvalues). This is a reasonable approximation if we consider that PCA actually presents the distribution of the original training vectors with a hyper-ellipse in a subspace of the original space and thus the variation in the original dimensions can be approximated by the projections of the ellipse onto the original space.
Fig. 1. Comparison results of using our proposed algorithm and the conventional algorithm. From left to right are the original images, reconstruction and the weights (dark intensity for low weight) from our algorithm, reconstruction and the absolute difference images (dark intensity for large difference) from the conventional algorithm. The next step is to express c, the parameter of (23,24), with cj = j (26) where is a xed coefcient, for example, = 2.3849 is obtained with the 95% asymptotic efciency on the normal distribution [10]. can be set at a higher value for fast model updating, but at a risk of accepting outliers into the model. To our knowledge, there are no easy solutions so far to this kind of problems like determining the coefcient . We have compared the j computed with the proposed method and that with the traditional method (the ground-truth). Experimental results indicate that the former is a good approximation of the latter. Due to page limit, these results are not included in the paper. 2.3. Algorithm The algorithm of incremental and robust PCA is presented in Table 1. The underlying principle is very simple: we begin with an initial PCA model, compute the condence of a new observation vector (or the likelihood to be an outlier) based on the current PCA model and weight the new vector accordingly, then perform an incremental PCA with the weighted vector. The advantages of this algorithm include: 1. Computational efciency: it is much faster than the conventional batch-mode PCA algorithms for large scale problems, not to mention the iterative robust algorithms. 2. Model adaptability: the model can be updated online over time with new observations. This is especially important for modelling dynamic systems where the system state is variable, for example, background modelling in video surveillance which we will discuss in Section 3. 3. The computation is reasonably mild if the Cauchy function is adopted. However, even when more intensive computation like exponential and logarithm involved in the weight function w, a look-up-table (indexed by r/c) can be built for the weight item w() in Equation (21) which can remarkably reduce the computation. 3. EXPERIMENTS The proposed algorithm can be easily applied to many pattern recognition problems, especially for dynamic problems with the system status changing over time. In this section, we demonstrate the performance of the algorithm on background modelling, which is a important process for object segmentation, tracking and visual event detection. Modelling background using PCA was rstly proposed by Oliver et al. [11]. By performing PCA on a sample of N images, the Initialisation Construct the initial PCA from the rst q (q p) observations. Updating FOR EACH new observation x 1. Estimate cj from the current PCA (25,26); 2. Compute the residual error r (14); 3. Compute the weight w(rj ) for each element of x (24); 4. Compute z (21); 5. Update the mean vector (2), replacing x by z ; 6. Compute y 1 , y 2 , ..., y p from the previous PCA (3); 7. Compute y p+1 (4), replacing x by z ; 8. Construct matrix A (5); 9. Compute matrix B (9); 10. Eigen-decompose B to obtain eigenvectors {v new } and i eigenvalues {new }; i 11. Compute new eigenvectors {unew } (12). i Table 1. The algorithm of incremental and robust PCA.
background can be represented by the mean image and the rst p signicant eigenvectors. Once this model is constructed, one can project an input image into the p dimensional PCA space and reconstruct it from the p dimensional PCA vector. The foreground pixels can then be obtained by computing the difference between the input image and its reconstruction. Although Oliver et al. claimed that this background model can be adapted over time, it is computationally intensive to perform model updating using the conventional PCA. Moreover, without a mechanism of robust analysis, the outliers or foreground objects may be absorbed into the background model. To address the two problems stated above, we extend PCA background model by applying our incremental and robust PCA algorithm. The image sequences used in the experiments are from the PET2001 datasets, a benchmark database for video surveillance. The parameters are chosen as: PCA dimension p = 10, size of initial training set q = 20, update rate = 0.95, and coefcient = 10. Figure 1 shows the comparison results on a test sequence between our algorithm (Table 1) and the conventional batch-mode PCA algorithm. It is infeasible to run the latter on the same data since they are too big to be t in the computer memory. We randomly selected 200 frames from the sequence to perform a conventional batch-mode PCA. Then the trained PCA was used as a xed background model. It is noted that our algorithm successfully captured the background changes while the xed PCA model failed with noticeably ghost effect in the reconstruction and the false foreground detection. In this experiment, our algorithm achieved a frame rate of 5 fps on a 1.5GHz Pentium IV computer (with JPEG image decoding and image displaying). To illustrate the importance of robust analysis, we show the
rst rst three eigenvectors of the PCA background models of using and not using robust analysis in Figure 2. It is noted that the non-robust algorithm unfortunately captured the variation of outliers, most noticeably the trace of pedestrians and cars on the walkway appearing in the images of the eigenvectors. This is exactly the limitation of conventional least-squares based PCA as the outliers usually contribute more to the overall mean squared error and thus deviate the results from desired. On the other hand, the robust algorithm correctly captured the variation of the background. It can be further illustrated in Figure 3 which shows the values of the rst dimension of the PCA vectors computed with the two algorithms. It is observed that the non-robust algorithm presents a uctuant result because it struggled to compensate the large error from outliers by severely adjusting the values of model component, especially when signicant activities happened during frames 1000-1500, while the robust algorithm achieves a steady performance.
As we do not know which part of a sample is outliers, traditional M-estimation based robust algorithms usually perform in an iterative manner which is computational expensive. However, if a PCA model is updated incrementally, the current model is usually sufcient to be used for outlier detection for new observation vectors. This is the underlying principle of our proposed algorithm. Estimating the parameters of a robust function is usually a tricky problem for robust M-estimation algorithms. In this work, we have solved this problem by approximating the parameters directly from the current eigenvectors/eigenvalues, so that the computation of the robust algorithm is only slightly more than an incremental algorithm. As an example, we demonstrate the performance of the algorithm on dynamic background modelling, with a comparison to the conventional batch-mode PCA algorithm and the non-robust algorithm. Improved results have been achieved in the experiments. This algorithm can be readily applied to many large scale PCA problems, especially dynamic problems with system status changing over time. 5. REFERENCES [1] P. Gill, G. Golub, W. Murray, and M. Saunders, Methods for modifying matrix factorizations, Mathematics of Computation, vol. 28, no. 26, pp. 505535, 1974. [2] J. Bunch and C. Nielsen, Updating the singular value decomposition, Numerische Mathematik, vol. 31, no. 2, pp. 131152, 1978. [3] S. Chandrasekaran, B. Manjunath, Y. Wang, J. Winkeler, and H. Zhang, An Eigenspace update algorithm for image analysis, Graphical Models and Image Processing, vol. 59, no. 5, pp. 321332, 1997. [4] P. M. Hall, A. D. Marshall, and R. R. Martin, Incremental eigenanalysis for classication, in British Machine Vision Conference, P. H. Lewis and M. S. Nixon, Eds., 1998, pp. 286295. [5] Lei Xu and A. Yuille, Robust principal component analysis by self-organizing rules based on statistical physics approach, IEEE Transactions on Neural Networks, vol. 6, no. 1, pp. 131143, 1995. [6] K. Gabriel and C. Odoroff, Resistant lower rank approximation of matrices, in Proceedings of the Fifteenth Symposium on the Interface, J. Gentle, Ed., Amsterdam, Netherlands, 1983, pp. 304308. [7] F. De la Torre and M. Black, Robust principal component analysis for computer vision, in IEEE International Conference on Computer Vision, Vancouver, Canada, 2001, vol. 1, pp. 362369. [8] Peter J. Huber, Robust Statistics, John Wiley & Sons Inc, 1981. [9] F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, and W.A. Stahel, Robust Statistics, John Wiley & Sons Inc, 1986. [10] Zhengyou Zhang, Parameter estimation techniques: A tutorial with application to conic tting, Image and Vision Computing, vol. 15, no. 1, pp. 5976, 1997. [11] N. Oliver, B. Rosario, and A. Pentland, A Bayesian computer vision system for modeling human interactions, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 8, pp. 831841, 2000.
Fig. 2. The rst three eigenvectors obtained from the robust algorithm (upper row) and non-robust algorithm (lower row). The intensity values have been normalised to [0, 255] for illustration purpose.
(a)
(b)
Fig. 3. The rst dimension of the PCA vector computed on the same sequence in Figure 1 with robust analysis (a) and without (b).
4. CONCLUSIONS Learning PCA incrementally and robustly is an interesting issue in pattern recognition and image processing. However, this problem has only been addressed separately in the previous studies, i.e. either incrementally or robustly. In this work, we have developed an incremental PCA algorithm, and extended it with efcient robust analysis.

Incremental and Robust PCA

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Incremental and Robust PCA

Hochgeladen von

Copyright:

Verfügbare Formate

Pattern Recognition 37 (2004) 1509 1518 www.elsevier.

On incremental and robust subspace learning

Tel.: +44-1895-203397; fax: +44-1895-251686. E-mail address: yongmin.li@brunel.ac.uk (Y. Li).

Y. Li / Pattern Recognition 37 (2004) 1509 1518

Substituting Eqs. (3)(5) into Eq. (7) gives Cnew = AAT :

Y. Li / Pattern Recognition 37 (2004) 1509 1518

Y. Li / Pattern Recognition 37 (2004) 1509 1518

If we dene zij = w(rij )xij ; (21)

Y. Li / Pattern Recognition 37 (2004) 1509 1518

Y. Li / Pattern Recognition 37 (2004) 1509 1518

Y. Li / Pattern Recognition 37 (2004) 1509 1518

Y. Li / Pattern Recognition 37 (2004) 1509 1518

Y. Li / Pattern Recognition 37 (2004) 1509 1518

[16] [17] [18] [19] [20] [21]

C + (1 )xxT T Unp pp UT np + (1 )xx

Equation (17) can be written as j j j ri )ri w(ri = 0, k i j min

which is exactly the solution of a new least-squares problem

Das könnte Ihnen auch gefallen