Beruflich Dokumente
Kultur Dokumente
Triangular Factorization: 1
where A is nxn matrix, b is a vector of length n, and x is an unknown vector of length n; solving the above system involves determining the value of each element of x Any nonsingular matrix A can be expressed as a product of a lower triangular matrix, L, and an upper triangular matrix U, such that
A LU
Using this form, the linear system can be solved as follows: Solve Ly = b Solve Ux = y
Triangular Factorization: 2
Gaussian Elimination
Gaussian elimination can be used to solve the linear system with th G i li i ti b dt l th li t ith the nxn matrix A and the right hand side vector b The first stage converts A to an upper triangular form; identical operations are applied to the right h d side ti li d t th i ht hand id This stage overwrites A with L-1A and b with L-1b Gaussian Elimination
for k = 1:n-1, for I = k+1:n, m(i) = A(i,k)/A(k,k); end; for i = k+1:n, for j=k+1:n, A(i,j) = A(i,j)- m(i)*A(k,j); end; b(i) = b(i) - m(i)*b(k); end; for I = k+1:n, A(k+1:n,k) = m(k+1:n); end; end;
CPSC 659 Spring 2011 2011 Vivek Sarin
Back Substitution
Triangular Factorization: 3
At the end of Gauss elimination, upper triangular part of A, including th d fG li i ti ti l t f A i l di diagonal, stores U; lower triangular part of A, excluding diagonal, stores L (L is unit lower triangular, i.e., diagonal entries of L are 1 unity); and b contains y = L-1b Next use back substitution to solve Ux = y; b gets overwritten with x Back Substitution
b(n) = b(n)/A(n,n); b(n)/A(n n); for k = n-1:-1:1, for i=1:k, b(i) = b(i) - A(i,k+1)*b(k+1); end; b(k) = b(k)/A(k,k); end;
Standard LU Factorization
LU Factorization
for k=1:n-1, for i=k+1:n, m(i) = A(i,k)/A(k,k); for j=k+1:n, A(i,j) = A(i,j) - m(i)*A(k,j); end; A(i,k) = m(i); end; end;
Triangular Factorization: 4
(k-1) co olumns of L
Column k of A
Triangular Factorization: 5
Block Bl k m consecutive rows together into a single task that is assigned to ti t th i t i l t k th t i i dt each processor (p=n/m) At the kth step, the kth row of U is broadcast to each processor Processors in charge of matrix rows k thru n compute the kth column of L and update the active part of matrix A with rank1 update
P0
P1
(k-1) columns of L c
Column k of A o
P2
P3
Triangular Factorization: 6
Load balancing L db l i Each processor becomes idle when its last row has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2 Concurrency and load balance can be improved by assigning rows to tasks in a cyclic manner; in this case,
n 1
Tcomp Tcomm
tc
k n 1 k 1
2(n k ) 2 p 1 ts tb (n k )
2n3 3p nts
tb n 2 2
Triangular Factorization: 7
P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3
(k-1) co olumns of L
Column k of A
Triangular Factorization: 8
Block Bl k consecutive columns together into a single task that is assigned ti l t th i t i l t k th t i i d to each processor (p=n/m) At the kth step, processor that owns column k computes the kth column of L, and broadcasts it to all processors l f L d b d t t ll Processors in charge of columns k+1 thru n update the active part of matrix A with rank-1 update
1st block of n/p rows 2nd block of n/p rows Block of n/p rows / pth block of n/p rows
(k-1) co olumns of L
P0
Column k of A
P1
P2
P3
Triangular Factorization: 9
Load balancing L db l i Each processor becomes idle when its last column has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2. Concurrency and load balance can be improved by assigning columns to tasks in a cyclic manner; in this case,
n 1
Tcomp Tcomm
n 1 k 1
tc
k
2( n k ) 2 p 1
2n 3 3p nt s t tb n 2 2
t s tb ( n k )
Triangular Factorization: 10
(k-1) co olumns of L
Column k of A
Triangular Factorization: 11
Partition t i i t P titi matrix into submatrices of size mm, where m=n/ p and b ti f i h / d assign a submatrix to each processor At the kth step Processors that own column k compute the kth column of L The kth column of L is broadcast along processor row, and the kth row of U is broadcast along processor column Processors in charge of matrix columns k+1 thru n update the active part of matrix A with rank-1 update
Triangular Factorization: 12
P0
P1
P2
P3
P4
(k-1) co olumns of L
P5
P6
P7
Column k of A
P8
P9
P10
P11
P12
P13
P14
P15
Triangular Factorization: 13
Load Balancing L dB l i Each processor becomes idle when its last row and column has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2. Concurrency and load balance can be improved by assigning smaller submatrices to tasks in a cyclic manner along rows as well as columns
Tcomp Tcomm 2(n k ) 2 2n3 tc p 3p k 1 n 1 (n k ) 2 t s tb 2nts p k 1
n 1
tb n 2 p
Triangular Factorization: 14
P0 P4 P8 P12 P0
(k-1) co olumns of L
P4 P8
Column k of A
P12 P0 P4 P8 P12
Pivoting
Triangular Factorization: 15
Standard St d d LU factorization algorithm may not produce very accurate f t i ti l ith t d t LU factors Reordering the rows and columns of A before computing the LU factorization improves stability of th algorithm; of course, such f t i ti i t bilit f the l ith f h reordering does not change the solution of the linear system If P1 and P2 are the row and column permutation matrices, we solve
P AP2 z 1 Pb 1 x P2 z
The LU factorization of the reordered matrix, P1AP2=LU, is used to solve the system as follows:
Ly P b, 1 Uz y, x P2 z
Partial Pivoting: only rows are reordered; produces stable LU factors in most cases Complete Pivoting: both rows and columns are reordered; guaranteed to produce stable LU factors Pivoting is costly on a parallel computer as it requires communication and disrupts overlap of communication with computation
2011 Vivek Sarin
Partial Pivoting
Triangular Factorization: 16
At the kth step, the row having the element with the largest magnitude th t th h i th l t ith th l t it d in column k is exchanged with the kth row before computing column of L and performing rank1 update Search for th l S h f the largest element, i.e., the pivot, can be costly on a t l t i th i t b tl parallel computer Column partitioned approach: pivot search is within the processor owning column i l Row partitioned approach: pivot search is a reduction operation among active processors Submatrix partitioned approach: pivot search is a reduction operation among active processors along a column that own column k of the matrix Alternative approaches to partial pivoting trade-off stability for parallelism
Triangular Factorization: 17
Read
(i-1) rows of U
Computed
Unchanged U h d
Triangular Factorization: 18
(j 1) (j-1) columns of U
Computed
pivot
Read
Active part of matrix A, which is modified by matrix-vector product
(j-1) rows of L
Unchanged
Triangular Factorization: 19
Symmetric Positive D fi it (SPD) matrices are an i S t i P iti Definite ti important class of t t l f matrices that have real, positive eigenvalues, and satisfy the following conditions A = AT xTAx > 0 for any non-zero vector A symmetric matrix can be written as A = LDLT, where D is a diagonal matrix L is a lower triangular matrix with ones on the diagonal LU factorization of A gives U=DLT, therefore, D = diag(U) U DL An SPD matrix has D with positive elements, and can be written as A=RTR, where R is an upper triangular matrix such that RT = LD1/2
Triangular Factorization: 20
Cholesky Factorization
for k=1:n, for i=k+1:n, m(i) = A(k i)/A(k k); A(k,i)/A(k,k); for j=i:n, A(i,j)=A(i,j)- m(i)*A(k,j); end; ; A(k,i) = A(k,i)/sqrt(A(k,k)); end; A(k,k) = sqrt(A(k,k)); end;
CPSC 659 Spring 2011 2011 Vivek Sarin
Triangular Factorization: 21
Colum j of R mn
Triangular Factorization: 22
P0
P1 P5
P2 P6 P10
P3 P7 P11 P15
P0 P4 P8 P12 P0
P1 P5 P9 P13 P1 P5
P0 P4 P8 P12 P0 P4 P8 P12 P0
P1 P5 P9 P13 P1 P5 P9 P13 P1 P5
(k-1) rows of R
pivot text
Row k of A
Updated
Active part of matrix A, which is modified by rank-1 update