Sie sind auf Seite 1von 34

Linear systems Matrix decompositions Linear least squares Some useful functions in R

Lecture 4 - Computational linear algebra

Björn Andersson (w/ Jianxin Wei)


Department of Statistics, Uppsala University

February 5, 2015

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Table of Contents

1 Linear systems
Existence and uniqueness
Solving linear systems
2 Matrix decompositions
LU decomposition
Cholesky decomposition
QR decomposition
Singular value decomposition
Statistical applications
3 Linear least squares

4 Some useful functions in R

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Existence and uniqueness

Existence and uniqueness of solutions

A n × n matrix A is said to be non-singular if it satisfies any of the


following, equivalent, conditions:
det(A) 6= 0
A−1 exists
rank(A) = n
For a given square matrix A and vector b, the linear system
Ax = b has:
A unique solution: A is non-singular and b is arbitrary
Infinitely many solutions: A is singular and b ∈ span(A)
No solution: A is singular and b 6∈ span(A)

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Solving linear systems

Forward substitution and backward substitution


Let L be a lower triangular matrix. Then the linear system Lx = b
can be solved by forward-substitution:
1 Let x1 = b1 /l11 .
2 For i ∈ {2, . . . , n}
Pi−1
bi − j=1 lij xj
Let xi = lii
Let U be an upper triangular matrix. Then the linear system
Ux = b can be solved by backward-substitution:
1 Let xn = bn /unn .
2 For i ∈ {n − 1, . . . , 1}
bi − nj=i+1 uij xj
P
Let xi = uii
Note that if ∃i : lii = 0 or uii = 0, then the matrix L or U is
singular and thus the system does not have a solution.
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Solving linear systems

Forward and backward substitution in R

Functions forwardsolve() and backsolve().


R> L <- matrix(c(1,2,0,1), 2, 2)
R> forwardsolve(L, c(1,4))
[1] 1 2
R> backsolve(t(L), c(4,1))
[1] 2 1

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Table of Contents

1 Linear systems
Existence and uniqueness
Solving linear systems
2 Matrix decompositions
LU decomposition
Cholesky decomposition
QR decomposition
Singular value decomposition
Statistical applications
3 Linear least squares

4 Some useful functions in R

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

LU decomposition

LU decomposition

Let A be a square matrix such that A = LU, where L is a


lower triangular matrix and U is an upper triangular matrix.
A linear system Ax = b can be solved using forward and backward
substitution:
Let y = Ux. Hence Ly = b.
Solve for y by forward substitution
Solve for x by backward substitution

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

LU decomposition

Gaussian elimination and LU decomposition

Consider Gaussian elimination written in matrix notation:


   
× × × × × ×
M1 × × × =  0 ∗ ∗ 
× × × 0 ∗ ∗
   
× × × × × ×
M2  0 ∗ ∗  =  0 ∗ ∗ 
0 ∗ ∗ 0 0 +
The LU decomposition then corresponds to

M2 M1 A = U ⇒ A = (M2 M1 )−1 U = LU.

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

LU decomposition

LU decomposition in R

R> A <- matrix(runif(9), 3, 3)


R> b <- c(2,1,3)
Find x in Ax = b by LU decomposition:
R> solve(A, b)
[1] 17.579906 -2.520983 -2.940866
Find A−1 by LU decomposition:
R> solve(A)
[,1] [,2] [,3]
[1,] -1.138959 -6.554862 8.8042286
[2,] 2.662184 1.148671 -2.9980071
[3,] -1.335980 2.352416 -0.8737737
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Cholesky decomposition

Cholesky decomposition
The Cholesky decomposition is the decomposition of a symmetric
and positive-definite matrix into the product of a upper triangular
matrix and its transpose:

A = U0 U.

From the Cholesky decomposition it is possible to calculate the


inverse of a matrix in the following way:

A−1 = U−1 (U−1 )0 ,

which is a more stable way than using Gaussian elimination.


In R, the function chol() gives the Cholesky decomposition and
chol2inv() the inverse of a matrix using the Cholesky
decomposition.
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

QR decomposition

QR decomposition

Let A = QR, where Q is an orthogonal matrix and R is an


invertible upper triangular matrix.
Then the linear system Ax = b can be written

Ax = b ⇒ QRx = b ⇒ Rx = Q0 b,

and x can be solved by backward substitution.

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

QR decomposition

QR decomposition in R

We have the system Ax = b. Solve for x:


R> qr.solve(A, b)
[1] 17.579906 -2.520983 -2.940866
Find the inverse of A−1 by QR decomposition:
R> qr.solve(A)
[,1] [,2] [,3]
[1,] -1.138959 -6.554862 8.8042286
[2,] 2.662184 1.148671 -2.9980071
[3,] -1.335980 2.352416 -0.8737737
qr.Q(qr(A)) and qr.R(qr(A)) retrieves the matrices Q and R
from the QR decomposition.
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

QR decomposition

Calculating eigenvalues of a matrix

We want to calculate the eigenvalues of a real matrix A. To do so


we can apply the QR algorithm. Let A0 = A. For k = 1, 2, . . .
1 Compute the QR decomposition Ak = Qk Rk
2 Let Ak = Rk Qk
3 Continue until convergence of Ak to a triangular matrix
containing the eigenvalues of A in the diagonal

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

QR decomposition

Calculating eigenvalues of a matrix


R> qr.eigen <- function(A){
+ for(i in 1:100){
+ QR <- qr(A)
+ R <- qr.R(QR)
+ Q <- qr.Q(QR)
+ A <- R %*% Q
+ }
+ return(diag(A))
+ }
R> qr.eigen(h3)
[1] 1.40831893 0.12232707 0.00268734
R> eigen(h3)$values
[1] 1.40831893 0.12232707 0.00268734
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Singular value decomposition

Singular value decomposition

The singular value decomposition (SVD) of a m × n matrix A has


the form
A = UΣV0 ,
where U is a m × m orthogonal matrix, V is a n × n orthogonal
matrix and Σ is a m × n diagonal matrix where

σij ≥ 0, if i = j, 0 else.

singular values - the diagonal entries σii of Σ


singular vectors - the columns ui of U
right singular vectors - the columns vi of V

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Singular value decomposition

Reduced form of the SVD

For a m × n matrix A, m > n, the reduced form of the SVD is


 
0
 Σ1
A = UΣV = U1 U2 V0 = U1 Σ1 V0 .
0

The decomposition can also be expressed as


X
A= σii ui vi0 .
σii 6=0

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Singular value decomposition

SVD and the spectral decomposition

The spectral decomposition says that a real, symmetric m × m


matrix A can be decomposed as

A = λ1 P1 P01 + · · · + λn Pm P0m ,

where P1 , . . . , Pm are pairwise orthogonal vectors,


I = P1 P01 + · · · + Pm P0m and λ1 , . . . , λm are the eigenvalues of A.
In matrix form we have

P0 AP = Λ ⇐⇒ A = PΛP0 ,

where P is the matrix with vectors P1 , . . . , Pm and


Λ = diag(λ1 , . . . , λm ).

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Singular value decomposition

SVD and the spectral decomposition

If A is symmetric and all the eigenvalues are non-negative, then:


the SVD equals the spectral decomposition
the singular values are the eigenvalues
the left and right singular values are eigenvectors
For any matrix A,
The square of the singular values, σi2 , are the eigenvalues of
AA0 and A0 A
The left singular vectors ui are eigenvectors of AA0
The right singular vectors vi are eigenvectors of A0 A

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Singular value decomposition

Rank determination

In theory, the rank of A is the number of non-zero singular


values.
In practice, the rank might not be well-determined in that
some singular values may be very small but non-zero.
For many purposes it’s better to regard any singular values
falling below a certain threshold as negligible in determining
the numerical rank.
The R package corpcor contains the function rank.condition()
which can determine the numerical rank of a matrix.

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Statistical applications

Principal component analysis

(Adapted from Linear Statistical Inference and Its Applications,


Rao 1973)
Let x be a random p-vector. Let X denote the n × p matrix of
observations of x. In some cases, it is desired to reduce the
dimensions of the data matrix for interpretative purposes. One
dimension reduction technique is principal component analysis
(PCA).
In PCA, the object is to reduce the dimensionality by finding the
directions in the data which have the most variation. This is
accomplished by finding the eigenvectors corresponding to the
largest eigenvalues in the covariance (or correlation) matrix of the
data and then projecting the original data onto these vectors.

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Statistical applications

Principal component analysis

Let Xc be the centered X matrix, i.e. each column has been mean
subtracted. The principal components Y are defined as

Y = Xc W,

where W is an orthogonal matrix of eigenvectors of X0c Xc ordered


according to the size of the corresponding eigenvalue (starting
from the largest). The matrix W is called the loading matrix.

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Statistical applications

Principal component analysis


Note that the covariance matrix of Xc is
1
Σ= X0 Xc .
n−1 c
We can write
1
λ1 w1 w10 + · · · + λp wp wp0 ,

Σ=
n−1
where wk , k ∈ {1, . . . , p} are eigenvectors corresponding to
eigenvalue λk such that they are orthogonal to each other. Hence
we may conclude that the components yi = Xc wi are uncorrelated
with each other, since
Cov(yi , yj ) = wi0 Σwj = 0,
for i, j ∈ {1, . . . , p}, i 6= j. Also note that
Var(yk ) = wk0 Σwk = λk /(n − 1).
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Statistical applications

Principal component analysis


Thus, one way of obtaining the matrix W is to calculate the
eigenvectors of the matrix X0c Xc . However, this is rather inefficient
computationally.
Instead, the singular value decomposition can be used: for p < n,
we can write the matrix Xc as
Xc = UDW0 ,
where U is a n × n orthogonal matrix, W is a p × p orthogonal
matrix and D is a diagonal matrix whose diagonal entries are called
the singular values of Xc . Now, remember that the square of the
singular values of Xc are the eigenvalues of X0c Xc and that the
vectors in W are the eigenvectors of X0c Xc . Hence we can retrieve
the loading matrix and the covariance matrix for the principal
components from the SVD.
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Statistical applications

Factor analysis

A technique which is similar to PCA is factor analysis where the


object is to extract a number of factors Fi from the data. The
model is
X = AF + G,
where F are the common factors, G are the unique factors and A is
a matrix of unknown constants. Let Var(F) = Im and
Var(G) = diag(δ1 , . . . , δp ) = ∆. Then Cov(X) = AA0 + ∆.

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Table of Contents

1 Linear systems
Existence and uniqueness
Solving linear systems
2 Matrix decompositions
LU decomposition
Cholesky decomposition
QR decomposition
Singular value decomposition
Statistical applications
3 Linear least squares

4 Some useful functions in R

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Vector norms
The vector norm is a measure of the size or magnitude of a vector.
For an integer p > 0 and an n-vector x, the vector norm is
n
!1/p
X
kxkp = |xi |p .
i=1
Important special cases are:
1-norm: kxk1 = ni=1 |xi |
P
2 1/2 , the Euclidean norm
Pn 
2-norm: kxk2 = i=1 |xi |
∞-norm: kxk∞ = max1≤i≤n |xi |
Two properties of a vector 2-norm:
  2
x1 2 2
x2 = kx1 k2 + kx2 k2

2
kQxk2 = kxk2 for Q orthogonal, kQxk22 = x0 Q0 Qx = kxk22
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Linear least squares

A typical linear least squares problem can be characterized as

Ax ' b,

where A is a m × n, m > n, matrix, x is an n-vector and b is an


m-vector.
For such over-determined systems, there is usually no exact
solution. The closest match possible in the 2-norm can however be
found. This is the linear least squares problem, formulated as

min kb − Axk2 = krk2 .


x

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Existence and uniqueness

The solution to a least squares problem always exists.


The solution is unique iff A has full column rank, i.e. iff
rank(A) = n.

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Normal equations

A least squares problem can be treated using methods from


calculus. The object is to minimize the residual vector r = b − Ax.
We define a function

φ(x) = krk2 = (b − Ax)0 (b − Ax) = b0 b − 2x0 A0 b + x0 A0 Ax.

A necessary condition for a minimum is that the gradient is equal


to zero:
∇φ(x) = 2A0 Ax − 2A0 b = 0
and that the solution satisfies the linear system A0 Ax = A0 b.
A sufficient condition is that A0 A is positive definite, which is
equivalent to rank(A) = n

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Solving least squares by QR decomposition

We have from the QR decomposition that the least squares


problem can be re-written as
2 2
kb − Axk22 = Q01 b − R1 x 2 + Q02 b 2

and the minimum is then attained when R1 x = Q01 b. Then, x can


be found by back substitution.

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Solving least squares by SVD

The reduced form of the SVD is


 
 Σ1
A = U1 U2 V0 = U1 Σ1 V0 .
0

Hence, the solution of Ax ' b is


X u0 b
x = VΣ−1 0
1 U1 b =
i
vi .
σii
σii 6=0

The SVD is especially useful for ill-conditioned or nearly


rank-deficient problems, since the very small singular values can be
dropped from the summation. This makes the solution much less
sensitive to small changes in the data.
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Table of Contents

1 Linear systems
Existence and uniqueness
Solving linear systems
2 Matrix decompositions
LU decomposition
Cholesky decomposition
QR decomposition
Singular value decomposition
Statistical applications
3 Linear least squares

4 Some useful functions in R

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Applying functions to matrices and arrays

It is sometimes convenient and more efficient to avoid loops and


instead use the function apply() to conduct calculations from the
entries of a matrix. For example, suppose you want to calculate
the mean of each of the columns in a matrix.
R> A <- matrix(c(rnorm(10), rnorm(10, 2), rnorm(10, 5)),
+ ncol=3)
R> apply(A, 2, mean)
[1] 0.01823443 2.19343645 4.80973298
The apply function works for any function you specify. For a
matrix, the second argument denotes if the function should operate
on the rows (1) or the columns (2).

Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University


Lecture 4 - Computational linear algebra
Linear systems Matrix decompositions Linear least squares Some useful functions in R

Additional built-in functions


colSums(), rowSums() - calculates the column sums and row
sums of a given matrix
cbind(), rbind() - combines two matrices by their columns
or rows
crossprod(x, y) - calculates t(x) %*% y but is much
faster than transposing and multiplying the matrices/vectors
lower.tri(), upper.tri() - retrieves the upper or lower
triangular parts of matrices
scale(x, center = TRUE, scale = TRUE)
scale(x, scale=FALSE) - retrieves the centered matrix of x
(the mean of each column subtracted from the columns)
scale(x, scale=TRUE) - retrieves the centered and scaled
matrix of x (the columns are mean subtracted and divided by
the standard deviation of the columns)
Björn Andersson (w/ Jianxin Wei) Department of Statistics, Uppsala University
Lecture 4 - Computational linear algebra