Sie sind auf Seite 1von 8

Principal Component Analysis

Part I

Based on
1. “Statistics and Data Analysis in Geology, J.C. Davis, New York, John Wiley & sons, 2nd ed.,1996
2. “Chemometrics: A Textbook, Amsterdam”, D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte,
and L. Kaufman, Elsevier,1988
3. “Course note: Multivariate Data Analysis and Chemometrics”, B. Jørgensen, Department of Statistics,
University of Southern Denmark, 2003
4. “Multi- and Megavariate Data Analysis- Principles and Applications”, L. Eriksson, E. Johansson, N. Kettaneh-Wold,
and S. Wold, UMETRICS, 2001
5. “Matlab” online Manual, The MathWorks, Inc.
PART I:
General Description of Principal Component Analysis

1. Principal Component Analysis (PCA) is…


2. Projection and Maximum Variance
3. General Steps of PCA
4. Mathematical Steps of PCA
5. Eigenvalue Decomposition of covariance matrix, S
6. Properties of scores and loading vectors
1. Principal Components Analysis (PCA) is
• a way to describe multivariate data by maximizing variances
• useful tool for data compression and information extraction
• to find combinations of variables (factors) that describe major trends in the data
• an eigenvalue decomposition process of the covariance (correlation) matrix of
the variables
2. Projection and Maximum Variance
Adapted from “Chemometrics: A Textbook, Amsterdam”, D.L. Massart, B.G.M. Vandeginste, S.N. Deming, Y. Michotte,
and L. Kaufman, Elsevier,1988

If there exist two variables, y1 and y2 with six samples, it can be shown in 2-dimensional space.
“0” is a centroid of 6 samples after the mean centering.

y2 2
1
y2 0
3 5
2 2 2
4 the spread of data = 01 + 02 + " + 06
6

y1 y1

If 1’, 2’, …, 6’ on the line shown in below figure are the projections of each point from above graph, then

x2 2 2 2
y2 1' 2 0i = 0i ' + ii '
1 2' 3'
0
y2 x1
3 5
4'
6'
Therefore, the maximum value of the total variation is
4 5'
6
2 2 2 2 2 2
the spread of data = 01' + 02 ' + " + 06 ' + 11' + 22 ' + " + 66 '



y1 y1 to be maximized to be minimized

Maximum variance describes the trends for the 6 data points.


PCA is a way to describe multivariate data by maximizing variances.
3. General Steps of PCA

1) Pre-treatment of data: Scaling


2) Calculation of covariance matrix
3) Calculation of eigenvalues and eigenvectors of covariance matrix
4) Calculation of scores

analysis input information

(Combinatorial) Experiment Multivariate PCA Knowledge


Data Set
4. Mathematical Steps of PCA

⎡ a11 ... a1n ⎤ Scaling ⎡ (a11 / sk 1 ) ... (a1n / skn ) ⎤


⎢ ⎥
A = ⎢⎢ # % # ⎥⎥ X =⎢
(for example, UV scaling)
# % # ⎥
⎢⎣ am1 ... amn ⎥⎦ ⎢ (am1 / sk 1) ... (amn / skn ) ⎥
⎣ ⎦

T
X X Covariance Matrix
S = cov( X ) =
m −1
Eigenvalue Decomposition

X = t1 p1T + t1 p1T + " + tk pkT + E where k ≤ min{m, n}


ti;scores, pi: loadings
5. Eigenvalue Decomposition of covariance matrix, S
From the Matlab manual, (The Mathworks, Inc.)

Sp = λ p
eigenvector
eigenvalue

SP = PΛ
Eigenvalues on the diagonal of
this diagonal matrix
Eigenvectors forming the columns of this matrix
If P is nonsingular, this become the eigenvalue decomposition

Eigenvalue Matrix
−1
S = P ΛP
Loading Matrix=Eigenvector Matrix
6. Properties of ti and pi vectors

Eigenvalue Matrix
−1
S = P ΛP or cov( X ) pi = λi pi
Loading Matrix
=Eigenvector Matrix

X = t1 p1T + t1 p1T + " + tk pkT + E where k ≤ min{m, n}

pi vectors: Loadings ti vectors: Scores


- Information between the variables - Information between the samples
- forms Orthonormal set - forms Orthogonal set
piT p j = 0 for i ≠ j tiT t j = 0 for i ≠ j
piT p j = 1 for i = j

Das könnte Ihnen auch gefallen