Lec 08 Triangular Factorization

LU Factorization
A system of linear equations is given as t f li ti i i

Ax b
Triangular Factorization: 1
where A is nxn matrix, b is a vector of length n, and x is an unknown vector of length n; solving the above system involves determining the value of each element of x Any nonsingular matrix A can be expressed as a product of a lower triangular matrix, L, and an upper triangular matrix U, such that
A LU
y and the linear system can be written as

LUx b
Using this form, the linear system can be solved as follows: Solve Ly = b Solve Ux = y
CPSC 659 Spring 2011
2011 Vivek Sarin
Gaussian Elimination
Gaussian elimination can be used to solve the linear system with th G i li i ti b dt l th li t ith the nxn matrix A and the right hand side vector b The first stage converts A to an upper triangular form; identical operations are applied to the right h d side ti li d t th i ht hand id This stage overwrites A with L-1A and b with L-1b Gaussian Elimination
for k = 1:n-1, for I = k+1:n, m(i) = A(i,k)/A(k,k); end; for i = k+1:n, for j=k+1:n, A(i,j) = A(i,j)- m(i)*A(k,j); end; b(i) = b(i) - m(i)*b(k); end; for I = k+1:n, A(k+1:n,k) = m(k+1:n); end; end;
CPSC 659 Spring 2011 2011 Vivek Sarin
Back Substitution
At the end of Gauss elimination, upper triangular part of A, including th d fG li i ti ti l t f A i l di diagonal, stores U; lower triangular part of A, excluding diagonal, stores L (L is unit lower triangular, i.e., diagonal entries of L are 1 unity); and b contains y = L-1b Next use back substitution to solve Ux = y; b gets overwritten with x Back Substitution
b(n) = b(n)/A(n,n); b(n)/A(n n); for k = n-1:-1:1, for i=1:k, b(i) = b(i) - A(i,k+1)*b(k+1); end; b(k) = b(k)/A(k,k); end;
Complexity Stage Operations Compute L & U 2n3/3 Solve Ly = b n2 Solve Ux = y n2

Data n2 n2/2 n2/2

2011 Vivek Sarin
Standard LU Factorization
LU Factorization
for k=1:n-1, for i=k+1:n, m(i) = A(i,k)/A(k,k); for j=k+1:n, A(i,j) = A(i,j) - m(i)*A(k,j); end; A(i,k) = m(i); end; end;
(k-1) co olumns of L
Column k of A
2011 Vivek Sarin
Parallel LU with Row Partitioning

Block Bl k m consecutive rows together into a single task that is assigned to ti t th i t i l t k th t i i dt each processor (p=n/m) At the kth step, the kth row of U is broadcast to each processor Processors in charge of matrix rows k thru n compute the kth column of L and update the active part of matrix A with rank1 update
P0
P1
(k-1) columns of L c
Column k of A o
P2
P3
2011 Vivek Sarin
Parallel LU with Row Partitioning
Load balancing L db l i Each processor becomes idle when its last row has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2 Concurrency and load balance can be improved by assigning rows to tasks in a cyclic manner; in this case,
n 1
Tcomp Tcomm
tc
k n 1 k 1
2(n k ) 2 p 1 ts tb (n k )
2n3 3p nts
tb n 2 2
Communication can be overlapped with computation C i ti b l d ith t ti
2011 Vivek Sarin
Cyclic Row Partitioning
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
Column k of A
2011 Vivek Sarin
Parallel LU with Column Partitioning

Block Bl k consecutive columns together into a single task that is assigned ti l t th i t i l t k th t i i d to each processor (p=n/m) At the kth step, processor that owns column k computes the kth column of L, and broadcasts it to all processors l f L d b d t t ll Processors in charge of columns k+1 thru n update the active part of matrix A with rank-1 update
1st block of n/p rows 2nd block of n/p rows Block of n/p rows / pth block of n/p rows
P0
Column k of A
P1
P2
P3
2011 Vivek Sarin
Parallel LU with Column Partitioning
Load balancing L db l i Each processor becomes idle when its last column has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2. Concurrency and load balance can be improved by assigning columns to tasks in a cyclic manner; in this case,
n 1
Tcomp Tcomm
n 1 k 1
tc
k
2( n k ) 2 p 1
2n 3 3p nt s t tb n 2 2
t s tb ( n k )
Communication can be overlapped with computation C i ti b l d ith t ti
2011 Vivek Sarin
Cyclic Column Partitioning
Column k of A
2011 Vivek Sarin
Parallel LU with Submatrix Partitioning

Partition t i i t P titi matrix into submatrices of size mm, where m=n/ p and b ti f i h / d assign a submatrix to each processor At the kth step Processors that own column k compute the kth column of L The kth column of L is broadcast along processor row, and the kth row of U is broadcast along processor column Processors in charge of matrix columns k+1 thru n update the active part of matrix A with rank-1 update
2011 Vivek Sarin
P0
P1
P2
P3
P4
P5
P6
P7
Column k of A
P8
P9
P10
P11
P12
P13
P14
P15
2011 Vivek Sarin
Load Balancing L dB l i Each processor becomes idle when its last row and column has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2. Concurrency and load balance can be improved by assigning smaller submatrices to tasks in a cyclic manner along rows as well as columns
Tcomp Tcomm 2(n k ) 2 2n3 tc p 3p k 1 n 1 (n k ) 2 t s tb 2nts p k 1
n 1
tb n 2 p
Communication can be overlapped with computation
2011 Vivek Sarin
Cyclic Submatrix Partitioning
P0 P4 P8 P12 P0
P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13
P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14
P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15
P0 P4 P8 P12 P0 P4 P8 P12 P0 P4 P8 P12
P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13
P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14
P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15
P0 P4 P8 P12 P0 P4 P8 P12 P0 P4 P8 P12
P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13
P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14
P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15
P4 P8
Column k of A
P12 P0 P4 P8 P12
2011 Vivek Sarin
Pivoting

Standard St d d LU factorization algorithm may not produce very accurate f t i ti l ith t d t LU factors Reordering the rows and columns of A before computing the LU factorization improves stability of th algorithm; of course, such f t i ti i t bilit f the l ith f h reordering does not change the solution of the linear system If P1 and P2 are the row and column permutation matrices, we solve
P AP2 z 1 Pb 1 x P2 z
The LU factorization of the reordered matrix, P1AP2=LU, is used to solve the system as follows:
Ly P b, 1 Uz y, x P2 z
Partial Pivoting: only rows are reordered; produces stable LU factors in most cases Complete Pivoting: both rows and columns are reordered; guaranteed to produce stable LU factors Pivoting is costly on a parallel computer as it requires communication and disrupts overlap of communication with computation
2011 Vivek Sarin
Partial Pivoting
At the kth step, the row having the element with the largest magnitude th t th h i th l t ith th l t it d in column k is exchanged with the kth row before computing column of L and performing rank1 update Search for th l S h f the largest element, i.e., the pivot, can be costly on a t l t i th i t b tl parallel computer Column partitioned approach: pivot search is within the processor owning column i l Row partitioned approach: pivot search is a reduction operation among active processors Submatrix partitioned approach: pivot search is a reduction operation among active processors along a column that own column k of the matrix Alternative approaches to partial pivoting trade-off stability for parallelism
2011 Vivek Sarin
Alternate Forms of LU Factorization

Row-oriented Row oriented delayed update
for i=2:n, for k=1:i-1, m(k) = A(i,k)/A(k,k); for j=k+1:n, A(i,j)=A(i,j)-m(k)*A(k,j); end; A(i,k) = m(k) end; end;
(i-1) rows of L Row i of L pivot Row i of A
Read
(i-1) rows of U
Active part of matrix A, which is modified by matrix-vector product
Computed
Unchanged U h d
2011 Vivek Sarin
Alternate Forms of LU Factorization

Column-oriented Column oriented delayed update
for j=1:n, for k=1:j-1, for i=k+1:n, A(i,j)=A(i,j)-A(i,k)*A(k,j); A(i j) A(i j) A(i k)*A(k j) end; b(j)=b(j)-A(j,k)*b(k) end; for l=j+1:n, fo l j+1 n A(l,j) = A(l,j)/A(j,j); end; end;
(j 1) (j-1) columns of U
Computed
pivot
Read
Active part of matrix A, which is modified by matrix-vector product
(j-1) rows of L
Unchanged
2011 Vivek Sarin
Symmetric Positive Definite Matrices
Symmetric Positive D fi it (SPD) matrices are an i S t i P iti Definite ti important class of t t l f matrices that have real, positive eigenvalues, and satisfy the following conditions A = AT xTAx > 0 for any non-zero vector A symmetric matrix can be written as A = LDLT, where D is a diagonal matrix L is a lower triangular matrix with ones on the diagonal LU factorization of A gives U=DLT, therefore, D = diag(U) U DL An SPD matrix has D with positive elements, and can be written as A=RTR, where R is an upper triangular matrix such that RT = LD1/2
2011 Vivek Sarin
Cholesky Factorization of SPD Matrices

Active A ti part of matrix A is always SPD t f ti i l Square roots are computed for positive numbers No pivoting is needed for numerical stability Only upper triangle of A is accessed; overwritten by R Complexity = n3/3 operations, half as many as Gaussian Elimination
Cholesky Factorization
for k=1:n, for i=k+1:n, m(i) = A(k i)/A(k k); A(k,i)/A(k,k); for j=i:n, A(i,j)=A(i,j)- m(i)*A(k,j); end; ; A(k,i) = A(k,i)/sqrt(A(k,k)); end; A(k,k) = sqrt(A(k,k)); end;
CPSC 659 Spring 2011 2011 Vivek Sarin
Alternate Forms of Cholesky Factorization
Colum j of R mn
2011 Vivek Sarin
Parallel Cholesky Factorization
P0
P1 P5
P2 P6 P10
P3 P7 P11 P15
P0 P4 P8 P12 P0
P1 P5 P9 P13 P1 P5
P2 P6 P10 P14 P2 P6 P10
P3 P7 P11 P15 P3 P7 P11 P15
P0 P4 P8 P12 P0 P4 P8 P12 P0
P1 P5 P9 P13 P1 P5 P9 P13 P1 P5
P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10
P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15
(k-1) rows of R
pivot text
Row k of A
Updated
Active part of matrix A, which is modified by rank-1 update
2011 Vivek Sarin

Lec 08 Triangular Factorization

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Lec 08 Triangular Factorization

Hochgeladen von

Copyright:

Verfügbare Formate

LU Factorization

A system of linear equations is given as t f li ti i i

y and the linear system can be written as

CPSC 659 Spring 2011

2011 Vivek Sarin

Complexity Stage Operations Compute L & U 2n3/3 Solve Ly = b n2 Solve Ux = y n2

Data n2 n2/2 n2/2

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Row Partitioning

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Row Partitioning

Communication can be overlapped with computation C i ti b l d ith t ti

CPSC 659 Spring 2011

2011 Vivek Sarin

Cyclic Row Partitioning

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Column Partitioning

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Column Partitioning

Communication can be overlapped with computation C i ti b l d ith t ti

CPSC 659 Spring 2011

2011 Vivek Sarin

Cyclic Column Partitioning

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Submatrix Partitioning

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Submatrix Partitioning

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Submatrix Partitioning

Communication can be overlapped with computation

CPSC 659 Spring 2011

2011 Vivek Sarin

Cyclic Submatrix Partitioning

P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13

P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14

P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15

P0 P4 P8 P12 P0 P4 P8 P12 P0 P4 P8 P12

P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13

P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14

P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15

P0 P4 P8 P12 P0 P4 P8 P12 P0 P4 P8 P12

P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13

P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14

P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15

CPSC 659 Spring 2011

2011 Vivek Sarin

CPSC 659 Spring 2011

CPSC 659 Spring 2011

2011 Vivek Sarin

Alternate Forms of LU Factorization

Active part of matrix A, which is modified by matrix-vector product

CPSC 659 Spring 2011

2011 Vivek Sarin

Alternate Forms of LU Factorization

CPSC 659 Spring 2011

2011 Vivek Sarin