Sie sind auf Seite 1von 22

LU Factorization

A system of linear equations is given as t f li ti i i


Ax b

Triangular Factorization: 1

where A is nxn matrix, b is a vector of length n, and x is an unknown vector of length n; solving the above system involves determining the value of each element of x Any nonsingular matrix A can be expressed as a product of a lower triangular matrix, L, and an upper triangular matrix U, such that
A LU

y and the linear system can be written as


LUx b

Using this form, the linear system can be solved as follows: Solve Ly = b Solve Ux = y

CPSC 659 Spring 2011

2011 Vivek Sarin

Triangular Factorization: 2

Gaussian Elimination
Gaussian elimination can be used to solve the linear system with th G i li i ti b dt l th li t ith the nxn matrix A and the right hand side vector b The first stage converts A to an upper triangular form; identical operations are applied to the right h d side ti li d t th i ht hand id This stage overwrites A with L-1A and b with L-1b Gaussian Elimination
for k = 1:n-1, for I = k+1:n, m(i) = A(i,k)/A(k,k); end; for i = k+1:n, for j=k+1:n, A(i,j) = A(i,j)- m(i)*A(k,j); end; b(i) = b(i) - m(i)*b(k); end; for I = k+1:n, A(k+1:n,k) = m(k+1:n); end; end;
CPSC 659 Spring 2011 2011 Vivek Sarin

Back Substitution

Triangular Factorization: 3

At the end of Gauss elimination, upper triangular part of A, including th d fG li i ti ti l t f A i l di diagonal, stores U; lower triangular part of A, excluding diagonal, stores L (L is unit lower triangular, i.e., diagonal entries of L are 1 unity); and b contains y = L-1b Next use back substitution to solve Ux = y; b gets overwritten with x Back Substitution
b(n) = b(n)/A(n,n); b(n)/A(n n); for k = n-1:-1:1, for i=1:k, b(i) = b(i) - A(i,k+1)*b(k+1); end; b(k) = b(k)/A(k,k); end;

Complexity Stage Operations Compute L & U 2n3/3 Solve Ly = b n2 Solve Ux = y n2


CPSC 659 Spring 2011

Data n2 n2/2 n2/2


2011 Vivek Sarin

Standard LU Factorization
LU Factorization
for k=1:n-1, for i=k+1:n, m(i) = A(i,k)/A(k,k); for j=k+1:n, A(i,j) = A(i,j) - m(i)*A(k,j); end; A(i,k) = m(i); end; end;

Triangular Factorization: 4

(k-1) co olumns of L

CPSC 659 Spring 2011

Column k of A

2011 Vivek Sarin

Parallel LU with Row Partitioning


Triangular Factorization: 5

Block Bl k m consecutive rows together into a single task that is assigned to ti t th i t i l t k th t i i dt each processor (p=n/m) At the kth step, the kth row of U is broadcast to each processor Processors in charge of matrix rows k thru n compute the kth column of L and update the active part of matrix A with rank1 update

P0

P1
(k-1) columns of L c

Column k of A o

P2

P3

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Row Partitioning

Triangular Factorization: 6

Load balancing L db l i Each processor becomes idle when its last row has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2 Concurrency and load balance can be improved by assigning rows to tasks in a cyclic manner; in this case,
n 1

Tcomp Tcomm

tc
k n 1 k 1

2(n k ) 2 p 1 ts tb (n k )

2n3 3p nts

tb n 2 2

Communication can be overlapped with computation C i ti b l d ith t ti

CPSC 659 Spring 2011

2011 Vivek Sarin

Cyclic Row Partitioning

Triangular Factorization: 7

P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3 P0 P1 P2 P3

(k-1) co olumns of L

CPSC 659 Spring 2011

Column k of A

2011 Vivek Sarin

Parallel LU with Column Partitioning


Triangular Factorization: 8

Block Bl k consecutive columns together into a single task that is assigned ti l t th i t i l t k th t i i d to each processor (p=n/m) At the kth step, processor that owns column k computes the kth column of L, and broadcasts it to all processors l f L d b d t t ll Processors in charge of columns k+1 thru n update the active part of matrix A with rank-1 update
1st block of n/p rows 2nd block of n/p rows Block of n/p rows / pth block of n/p rows

(k-1) co olumns of L

P0
Column k of A

P1

P2

P3

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Column Partitioning

Triangular Factorization: 9

Load balancing L db l i Each processor becomes idle when its last column has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2. Concurrency and load balance can be improved by assigning columns to tasks in a cyclic manner; in this case,
n 1

Tcomp Tcomm
n 1 k 1

tc
k

2( n k ) 2 p 1

2n 3 3p nt s t tb n 2 2

t s tb ( n k )

Communication can be overlapped with computation C i ti b l d ith t ti

CPSC 659 Spring 2011

2011 Vivek Sarin

Cyclic Column Partitioning

Triangular Factorization: 10

(k-1) co olumns of L

CPSC 659 Spring 2011

Column k of A

2011 Vivek Sarin

Parallel LU with Submatrix Partitioning


Triangular Factorization: 11

Partition t i i t P titi matrix into submatrices of size mm, where m=n/ p and b ti f i h / d assign a submatrix to each processor At the kth step Processors that own column k compute the kth column of L The kth column of L is broadcast along processor row, and the kth row of U is broadcast along processor column Processors in charge of matrix columns k+1 thru n update the active part of matrix A with rank-1 update

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Submatrix Partitioning

Triangular Factorization: 12

P0

P1

P2

P3

P4
(k-1) co olumns of L

P5

P6

P7

Column k of A

P8

P9

P10

P11

P12

P13

P14

P15

CPSC 659 Spring 2011

2011 Vivek Sarin

Parallel LU with Submatrix Partitioning

Triangular Factorization: 13

Load Balancing L dB l i Each processor becomes idle when its last row and column has been processed Work reduces as the algorithm progresses, i.e., work at the kth step is proportional to (nk)2. Concurrency and load balance can be improved by assigning smaller submatrices to tasks in a cyclic manner along rows as well as columns
Tcomp Tcomm 2(n k ) 2 2n3 tc p 3p k 1 n 1 (n k ) 2 t s tb 2nts p k 1
n 1

tb n 2 p

Communication can be overlapped with computation

CPSC 659 Spring 2011

2011 Vivek Sarin

Triangular Factorization: 14

Cyclic Submatrix Partitioning

P0 P4 P8 P12 P0
(k-1) co olumns of L

P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13

P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14

P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15

P0 P4 P8 P12 P0 P4 P8 P12 P0 P4 P8 P12

P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13

P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14

P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15

P0 P4 P8 P12 P0 P4 P8 P12 P0 P4 P8 P12

P1 P5 P9 P13 P1 P5 P9 P13 P1 P5 P9 P13

P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10 P14

P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15

P4 P8
Column k of A

P12 P0 P4 P8 P12

CPSC 659 Spring 2011

2011 Vivek Sarin

Pivoting

Triangular Factorization: 15

Standard St d d LU factorization algorithm may not produce very accurate f t i ti l ith t d t LU factors Reordering the rows and columns of A before computing the LU factorization improves stability of th algorithm; of course, such f t i ti i t bilit f the l ith f h reordering does not change the solution of the linear system If P1 and P2 are the row and column permutation matrices, we solve
P AP2 z 1 Pb 1 x P2 z

The LU factorization of the reordered matrix, P1AP2=LU, is used to solve the system as follows:
Ly P b, 1 Uz y, x P2 z

Partial Pivoting: only rows are reordered; produces stable LU factors in most cases Complete Pivoting: both rows and columns are reordered; guaranteed to produce stable LU factors Pivoting is costly on a parallel computer as it requires communication and disrupts overlap of communication with computation
2011 Vivek Sarin

CPSC 659 Spring 2011

Partial Pivoting

Triangular Factorization: 16

At the kth step, the row having the element with the largest magnitude th t th h i th l t ith th l t it d in column k is exchanged with the kth row before computing column of L and performing rank1 update Search for th l S h f the largest element, i.e., the pivot, can be costly on a t l t i th i t b tl parallel computer Column partitioned approach: pivot search is within the processor owning column i l Row partitioned approach: pivot search is a reduction operation among active processors Submatrix partitioned approach: pivot search is a reduction operation among active processors along a column that own column k of the matrix Alternative approaches to partial pivoting trade-off stability for parallelism

CPSC 659 Spring 2011

2011 Vivek Sarin

Triangular Factorization: 17

Alternate Forms of LU Factorization


Row-oriented Row oriented delayed update
for i=2:n, for k=1:i-1, m(k) = A(i,k)/A(k,k); for j=k+1:n, A(i,j)=A(i,j)-m(k)*A(k,j); end; A(i,k) = m(k) end; end;
(i-1) rows of L Row i of L pivot Row i of A

Read

(i-1) rows of U

Active part of matrix A, which is modified by matrix-vector product

Computed

Unchanged U h d

CPSC 659 Spring 2011

2011 Vivek Sarin

Alternate Forms of LU Factorization


Column-oriented Column oriented delayed update
for j=1:n, for k=1:j-1, for i=k+1:n, A(i,j)=A(i,j)-A(i,k)*A(k,j); A(i j) A(i j) A(i k)*A(k j) end; b(j)=b(j)-A(j,k)*b(k) end; for l=j+1:n, fo l j+1 n A(l,j) = A(l,j)/A(j,j); end; end;

Triangular Factorization: 18

(j 1) (j-1) columns of U

Computed

pivot

Read
Active part of matrix A, which is modified by matrix-vector product

(j-1) rows of L

Unchanged

CPSC 659 Spring 2011

2011 Vivek Sarin

Symmetric Positive Definite Matrices

Triangular Factorization: 19

Symmetric Positive D fi it (SPD) matrices are an i S t i P iti Definite ti important class of t t l f matrices that have real, positive eigenvalues, and satisfy the following conditions A = AT xTAx > 0 for any non-zero vector A symmetric matrix can be written as A = LDLT, where D is a diagonal matrix L is a lower triangular matrix with ones on the diagonal LU factorization of A gives U=DLT, therefore, D = diag(U) U DL An SPD matrix has D with positive elements, and can be written as A=RTR, where R is an upper triangular matrix such that RT = LD1/2

CPSC 659 Spring 2011

2011 Vivek Sarin

Triangular Factorization: 20

Cholesky Factorization of SPD Matrices


Active A ti part of matrix A is always SPD t f ti i l Square roots are computed for positive numbers No pivoting is needed for numerical stability Only upper triangle of A is accessed; overwritten by R Complexity = n3/3 operations, half as many as Gaussian Elimination

Cholesky Factorization
for k=1:n, for i=k+1:n, m(i) = A(k i)/A(k k); A(k,i)/A(k,k); for j=i:n, A(i,j)=A(i,j)- m(i)*A(k,j); end; ; A(k,i) = A(k,i)/sqrt(A(k,k)); end; A(k,k) = sqrt(A(k,k)); end;
CPSC 659 Spring 2011 2011 Vivek Sarin

Triangular Factorization: 21

Alternate Forms of Cholesky Factorization

CPSC 659 Spring 2011

Colum j of R mn

2011 Vivek Sarin

Triangular Factorization: 22

Parallel Cholesky Factorization

P0

P1 P5

P2 P6 P10

P3 P7 P11 P15

P0 P4 P8 P12 P0

P1 P5 P9 P13 P1 P5

P2 P6 P10 P14 P2 P6 P10

P3 P7 P11 P15 P3 P7 P11 P15

P0 P4 P8 P12 P0 P4 P8 P12 P0

P1 P5 P9 P13 P1 P5 P9 P13 P1 P5

P2 P6 P10 P14 P2 P6 P10 P14 P2 P6 P10

P3 P7 P11 P15 P3 P7 P11 P15 P3 P7 P11 P15

(k-1) rows of R

pivot text

Row k of A

Updated
Active part of matrix A, which is modified by rank-1 update

CPSC 659 Spring 2011

2011 Vivek Sarin

Das könnte Ihnen auch gefallen