Beruflich Dokumente
Kultur Dokumente
(Jean-Yves.L.Excellent@ens-lyon.fr, Bora.Ucar@ens-lyon.fr)
Fridays 10h15-12h15
1/ 94
Motivations
I Applications ayant un besoin croissant en puissance de calcul:
I modélisation,
I simulation (plutôt qu’expérimentation),
I optimisation numérique
I Typiquement:
Problème continu ⇒ Discrétisation (maillage)
⇒ Algorithme numérique de résolution (selon lois physiques)
⇒ Problème matriciel (Ax = b, . . .)
I Besoins:
I Modélisations de plus en plus précises
I Problèmes de plus en plus complexes
I Applications critiques en temps de réponse
I Minimisation des coûts du calcul
200 200
400 400
600 600
800 800
1000 1000
1200 1200
1400 1400
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400
nz = 18427 nz = 18427
3/ 94
Quelques exemples dans le domaine du calcul
scientifique
4/ 94
A few examples in the field of scientific computing
I Cost constraints: wind tunnels, crash simulation, . . .
5/ 94
Scale Constraints
6/ 94
Contents of the course
7/ 94
Contents of the course
↔ ↔ Graph
7/ 94
Contents of the course
I Iterative methods
7/ 94
Tentative outline
A. INTRODUCTION
I. Sparse matrices
II. Graph theory and algorithms
III. Linear algebra basics
------------------------------------------------------
B. SPARSE GAUSSIAN ELIMINATION
IV. Elimination tree and structure prediction
V. Fill-reducing ordering methods
VI. Matching in bipartite graphs
VII. Factorization: Methods
VIII. Factorization: Parallelization aspects
------------------------------------------------------
C. SOME OTHER ESSENTIAL SPARSE MATRIX ALGORITHMS
IX. Graph and hypergraph partitioning
X. Iterative methods
------------------------------------------------------
D. CLOSING
XI. Current research activities
XII. Presentations
8/ 94
Tentative organization of the course
9/ 94
Outline
10/ 94
A selection of references
I Books
I Duff, Erisman and Reid, Direct methods for Sparse Matrices,
Clarendon Press, Oxford 1986.
I Dongarra, Duff, Sorensen and H. A. van der Vorst, Solving
Linear Systems on Vector and Shared Memory Computers,
SIAM, 1991.
I Davis, Direct methods for sparse linear systems, SIAM, 2006.
I Saad, Iterative methods for sparse linear systems, 2nd edition,
SIAM, 2004.
I Articles
I Gilbert and Liu, Elimination structures for unsymmetric sparse
LU factors, SIMAX, 1993.
I Liu, The role of elimination trees in sparse factorization,
SIMAX, 1990.
I Heath and E. Ng and B. W. Peyton, Parallel Algorithms for
Sparse Linear Systems, SIAM review 1991.
11/ 94
Introduction to Sparse Matrix Computations
Motivation and main issues
Sparse matrices
Gaussian elimination
Parallel and high performance computing
Numerical simulation and sparse matrices
Direct vs iterative methods
Conclusion
12/ 94
Motivations
I solution of linear systems of equations → key algorithmic
kernel
Continuous problem
↓
Discretization
↓
Solution of a linear system Ax = b
I Main parameters:
I Numerical properties of the linear system (symmetry, pos.
definite, conditioning, . . . )
I Size and structure:
I Large (> 1000000 × 1000000 ?), square/rectangular
I Dense or sparse (structured / unstructured)
I Target computer (sequential/parallel/multicore)
→ Algorithmic choices are critical
13/ 94
Motivations for designing efficient algorithms
I Time-critical applications
I Solve larger problems
I Decrease elapsed time (parallelism ?)
I Minimize cost of computations (time, memory)
14/ 94
Difficulties
I Access to data :
I Computer : complex memory hierarchy (registers, multilevel
cache, main memory (shared or distributed), disk)
I Sparse matrix : large irregular dynamic data structures.
→ Exploit the locality of references to data on the computer
(design algorithms providing such locality)
I Efficiency (time and memory)
I Number of operations and memory depend very much on the
algorithm used and on the numerical and structural properties
of the problem.
I The algorithm depends on the target computer (vector, scalar,
shared, distributed, clusters of Symmetric Multi-Processors
(SMP), multicore).
→ Algorithmic choices are critical
15/ 94
Introduction to Sparse Matrix Computations
Motivation and main issues
Sparse matrices
Gaussian elimination
Parallel and high performance computing
Numerical simulation and sparse matrices
Direct vs iterative methods
Conclusion
16/ 94
Sparse matrices
Example:
3 x1 + 2 x2 = 5
2 x2 - 5 x3 = 1
2 x1 + 3 x3 = 0
can be represented as
Ax = b,
3 2 0 x1 5
where A = 0 2 −5 , x = x2 , and b = 1
2 0 3 x3 0
17/ 94
Sparse matrix ?
100
200
300
400
500
18/ 94
Sparse matrix ?
1000
2000
3000
4000
5000
6000
7000
“Saddle-point” problem
19/ 94
Preprocessing sparse matrices
200 200
400 400
600 600
800 800
1000 1000
1200 1200
1400 1400
0 200 400 600 800 1000 1200 1400 0 200 400 600 800 1000 1200 1400
nz = 18427 nz = 18427
20/ 94
Factorization process
Solution of Ax = b
I A is unsymmetric :
I A is factorized as: A = LU, where
L is a lower triangular matrix, and
U is an upper triangular matrix.
I Forward-backward substitution: Ly = b then Ux = y
I A is symmetric:
I A = LDLT or LLT
I A is rectangular m × n with m ≥ n and minx kAx − bk2 :
I A = QR where Q is orthogonal (Q−1 = QT ) and R is
triangular.
I Solve: y = QT b then Rx = y
21/ 94
Factorization process
Solution of Ax = b
I A is unsymmetric :
I A is factorized as: A = LU, where
L is a lower triangular matrix, and
U is an upper triangular matrix.
I Forward-backward substitution: Ly = b then Ux = y
I A is symmetric:
I A = LDLT or LLT
I A is rectangular m × n with m ≥ n and minx kAx − bk2 :
I A = QR where Q is orthogonal (Q−1 = QT ) and R is
triangular.
I Solve: y = QT b then Rx = y
21/ 94
Factorization process
Solution of Ax = b
I A is unsymmetric :
I A is factorized as: A = LU, where
L is a lower triangular matrix, and
U is an upper triangular matrix.
I Forward-backward substitution: Ly = b then Ux = y
I A is symmetric:
I A = LDLT or LLT
I A is rectangular m × n with m ≥ n and minx kAx − bk2 :
I A = QR where Q is orthogonal (Q−1 = QT ) and R is
triangular.
I Solve: y = QT b then Rx = y
21/ 94
Difficulties
22/ 94
Key numbers:
1- Small sizes : 500 MB matrix;
Factors = 5 GB; Flops = 100 Gflops ;
2- Example of 2D problem: Lab. Géosiences Azur, Valbonne
I Complex 2D finite difference matrix n=16 × 106 , 150 × 106
nonzeros
I Storage (single prec): 2 GB (12 GB with the factors)
I Flops: 10 TeraFlops
3- Example of 3D problem: EDF (Code Aster, structural
engineering)
I real matrix finite elements n = 106 , nz = 71 × 106 nonzeros
I Storage: 3.5 × 109 entries (28 GB) for factors, 35 GB total
I Flops: 2.1 × 1013
4- Typical performance (MUMPS):
I PC LINUX 1 core (P4, 2GHz) : 1.0 GFlops/s
I Cray T3E (512 procs) : Speed-up ≈ 170, Perf. 71 GFlops/s
I AMD Opteron 8431, 24 cores@2.4 GHz: 50 GFlops/s (1 core:
7 GFlop/s)
23/ 94
Typical test problems:
24/ 94
Typical test problems:
BMW crankshaft,
148,770 unknowns,
5,396,386 nonzeros, Size of factors: 97.2 million entries
MSC.Software Number of operations: 127.9 × 109
25/ 94
Sources of parallelism
26/ 94
Data structure for sparse matrices
27/ 94
Data formats for a general sparse matrix A
28/ 94
Classical Data Formats for Assembled Matrices
I Example of a 3x3 matrix with NNZ=5 nonzeros
1 2 3
1 a11
2 a22 a23
3 a31 a33
I Coordinate format
IRN [1 : NNZ ] = 1 3 2 2 3
JCN [1 : NNZ ] = 1 1 2 3 3
VAL [1 : NNZ ] = a11 a31 a22 a23 a33
I Compressed Sparse Column (CSC) format
IRN [1 : NNZ ] = 1 3 2 2 3
VAL [1 : NNZ ] = a11 a31 a22 a23 a33
COLPTR [1 : N + 1] = 1 3 4 6
column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1
I Compressed Sparse Row (CSR) format:
Similar to CSC, but row by row
I Diagonal format (M=N):
NDIAG = 3 29/ 94
Classical Data Formats for Assembled Matrices
I Example of a 3x3 matrix with NNZ=5 nonzeros
1 2 3
1 a11
2 a22 a23
3 a31 a33
I Coordinate format
IRN [1 : NNZ ] = 1 3 2 2 3
JCN [1 : NNZ ] = 1 1 2 3 3
VAL [1 : NNZ ] = a11 a31 a22 a23 a33
I Compressed Sparse Column (CSC) format
IRN [1 : NNZ ] = 1 3 2 2 3
VAL [1 : NNZ ] = a11 a31 a22 a23 a33
COLPTR [1 : N + 1] = 1 3 4 6
column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1
I Compressed Sparse Row (CSR) format:
Similar to CSC, but row by row
I Diagonal format (M=N):
NDIAG = 3 29/ 94
Classical Data Formats for Assembled Matrices
I Example of a 3x3 matrix with NNZ=5 nonzeros
1 2 3
1 a11
2 a22 a23
3 a31 a33
I Coordinate format
IRN [1 : NNZ ] = 1 3 2 2 3
JCN [1 : NNZ ] = 1 1 2 3 3
VAL [1 : NNZ ] = a11 a31 a22 a23 a33
I Compressed Sparse Column (CSC) format
IRN [1 : NNZ ] = 1 3 2 2 3
VAL [1 : NNZ ] = a11 a31 a22 a23 a33
COLPTR [1 : N + 1] = 1 3 4 6
column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1
I Compressed Sparse Row (CSR) format:
Similar to CSC, but row by row
I Diagonal format (M=N):
NDIAG = 3 29/ 94
Classical Data Formats for Assembled Matrices
2 a22 a23
3 a31 a33
29/ 94
Sparse Matrix-vector products Y ← AX
Algorithm depends on sparse matrix format:
I Coordinate format:
Y ( 1 :M) = 0
DO k =1 ,NNZ
Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k ) ∗ X( JCN ( k ) )
ENDDO
I CSC format:
I CSR format
30/ 94
Sparse Matrix-vector products Y ← AX
Algorithm depends on sparse matrix format:
I Coordinate format:
Y ( 1 :M) = 0
DO k =1 ,NNZ
Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k ) ∗ X( JCN ( k ) )
ENDDO
I CSC format:
Y ( 1 :M) = 0
DO J =1 ,N
Xj=X( J )
DO k=COLPTR( J ) ,COLPTR( J+1)−1
Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k )∗ Xj
ENDDO
ENDDO
I CSR format
30/ 94
Sparse Matrix-vector products Y ← AX
Algorithm depends on sparse matrix format:
I Coordinate format:
Y ( 1 :M) = 0
DO k =1 ,NNZ
Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k ) ∗ X( JCN ( k ) )
ENDDO
I CSC format:
Y ( 1 :M) = 0
DO J =1 ,N
Xj=X( J )
DO k=COLPTR( J ) ,COLPTR( J+1)−1
Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k )∗ Xj
ENDDO
ENDDO
I CSR format
DO I =1 ,M
Y i=0
DO k=ROWPTR( I ) ,ROWPTR( I +1)−1
Y i = Y i + VAL( k )∗X( JCN ( k ) )
ENDDO
Y( I )= Y i
ENDDO
30/ 94
Sparse Matrix-vector products Y ← AX
I Coordinate format:
Y ( 1 :M) = 0
DO k =1 ,NNZ
Y( IRN ( k ) ) = Y( IRN ( k ) ) + VAL( k ) ∗ X( JCN ( k ) )
ENDDO
30/ 94
Jagged diagonal storage (JDS)
1 2 3
1 a11
2 a22 a23
3 a31 a33
31/ 94
Jagged diagonal storage (JDS)
1 2 3
1 a11
2 a22 a23
3 a31 a33
2 a22 a23
3 a31 a33
31/ 94
Jagged diagonal storage (JDS)
1 2 3
1 a11
2 a22 a23
3 a31 a33
31/ 94
Jagged diagonal storage (JDS)
1 2 3
1 a11
2 a22 a23
3 a31 a33
−1 2 3 0 0
2 1 1 0 0
A=
1 1 3 −1 3 = A1 + A2
0 0 1 2 −1
0 0 3 2 1
1 −1 2 3 3 2 −1 3
A1 = 2 2 1 1 , A2 = 4 1 2 −1
3 1 1 1 5 3 2 1
32/ 94
Example of elemental matrix format
1 −1 2 3 3 2 −1 3
A1 = 2 2 1 1 , A2 = 4 1 2 −1
3 1 1 1 5 3 2 1
PNELT
I N=5 NELT=2 NVAR=6 A= i=1 Ai
I
ELTPTR [1:NELT+1] = 147
ELTVAR [1:NVAR] = 123345
ELTVAL [1:NVAL] = -1 2 1 2 1 1 3 1 1 2 1 3 -1 2 2 3 -1 1
I Remarks:
I NVAR = P ELTPTR(NELT+1)-1
Si2 (unsym) ou
P
I NVAL = Si (Si + 1)/2 (sym), avec
Si = ELTPTR(i + 1) − ELTPTR(i)
I storage of elements in ELTVAL: by columns
32/ 94
File storage: Rutherford-Boeing
33/ 94
File storage: Rutherford-Boeing
34/ 94
File storage: Matrix-market
I Example
35/ 94
Examples of sparse matrix collections
36/ 94
Introduction to Sparse Matrix Computations
Motivation and main issues
Sparse matrices
Gaussian elimination
Parallel and high performance computing
Numerical simulation and sparse matrices
Direct vs iterative methods
Conclusion
37/ 94
Gaussian elimination
A(2) x = b(2)
0 10 0 1
a11 a12 a13 x
1 b1
0
(2)
a22
(2)
a23 C@ 1 A B (2) C (2)
= b2 − a21 b1 /a11 . . .
x2 = @ b2 b2
B
@ A A
(2) (2) x3 (2) (2)
0 a32 a33 b3 a32 = a32 − a31 a12 /a11 . . .
38/ 94
Relation with A = LU factorization
k x x k x x
i x x i x 0
fill-in
40/ 94
I Idem for Cholesky :
√ 0 ),
I For i > k compute lik = aik / akk (= aik
I For i > k, j > k, j ≤ i (lower triang.)
aik × ajk
aij0 = aij − √
akk
or
aij0 = aij − lik × ajk
41/ 94
Example
I Original matrix
x x x x x
x x
x x
x x
x x
I Matrix is full after the first step of elimination
I After reordering the matrix (1st row and column ↔ last row
and column)
42/ 94
x x
x x
x x
x x
x x x x x
I No fill-in
I Ordering the variables has a strong impact on
I the fill-in
I the number of operations
NP-hard problem in general (Yannakakis, 1981)
43/ 94
Illustration: Reverse Cuthill-McKee on matrix
dwt 592.rua
Harwell-Boeing matrix: dwt 592.rua, structural computing on a
submarine. NZ(LU factors)=58202
100 100
200 200
300 300
400 400
500 500
0 100 200 300 400 500 0 100 200 300 400 500
nz = 5104 nz = 58202
44/ 94
Illustration: Reverse Cuthill-McKee on matrix
dwt 592.rua
NZ(LU factors)=16924
100 100
200 200
300 300
400 400
500 500
0 100 200 300 400 500 0 100 200 300 400 500
nz = 5104 nz = 16924
44/ 94
Table: Benefits of Sparsity on a matrix of order 2021 × 2021 with 7353
nonzeros. (Dongarra etal 91) .
Procedure Total storage Flops Time (sec.)
on CRAY J90
Full Syst. 4084 Kwords 5503 ×106 34.5
Sparse Syst. 71 Kwords 1073×106 3.4
Sparse Syst. and reordering 14 Kwords 42×103 0.9
45/ 94
Control of numerical stability: numerical pivoting
47/ 94
Threshold pivoting and numerical accuracy
47/ 94
Three-phase scheme to solve Ax = b
1. Analysis step
I Preprocessing of A (symmetric/unsymmetric orderings,
scalings)
I Build the dependency graph (elimination tree, eDAG . . . )
2. Factorization (A = LU, LDLT , LLT , QR)
Numerical pivoting
3. Solution based on factored matrices
I triangular solves: Ly = b, then Ux = y
I improvement of solution (iterative refinement), error analysis
48/ 94
Efficient implementation of sparse algorithms
49/ 94
Effect of switch to dense calculations
50/ 94
Introduction to Sparse Matrix Computations
Motivation and main issues
Sparse matrices
Gaussian elimination
Parallel and high performance computing
Numerical simulation and sparse matrices
Direct vs iterative methods
Conclusion
51/ 94
Main processor (r)evolutions
52/ 94
Pourquoi des traitements parallèles ?
53/ 94
Quelques unités pour le calcul haute performance
Vitesse
1 MFlop/s 1 Megaflop/s 106 opérations / seconde
1 GFlop/s 1 Gigaflop/s 109 opérations / seconde
1 TFlop/s 1 Teraflop/s 1012 opérations / seconde
1 PFlop/s 1 Petaflop/s 1015 opérations / seconde
1 EFlop/s 1 Exaflop/s 1015 opérations / seconde
Mémoire
1 kB / 1 ko 1 kilobyte 103 octets
1 MB / 1 Mo 1 Megabyte 106 octets
1 GB / 1 Go 1 Gigabyte 109 octets
1 TB / 1 To 1 Terabyte 1012 octets
1 PB / 1 Po 1 Petabyte 1015 octets
54/ 94
Mesures de performance
55/ 94
Rapport (Performance réelle / performance de crête) souvent bas !!
Soit P un programme :
1. Processeur séquentiel:
I 1 unité scalaire (1 GFlop/s)
I Temps d’exécution de P : 100 s
2. Machine parallèle à 100 processeurs:
I Chaque processor: 1 GFlop/s
I Performance crête: 100 GFlop/s
3. Si P : code séquentiel (10%) + code parallélisé (90%)
I Temps d’exécution de P : 0.9 + 10 = 10.9 s
I Performance réelle : 9.2 GFlop/s
4. Performance réelle = 0.1
Performance de crête
56/ 94
Moore’s law
58/ 94
Seule solution: le parallélisme
59/ 94
Problème d’accès aux données
60/ 94
Problèmes de débit mémoire
I L’accés aux données est un problème crucial dans les
calculateurs modernes
IPerformance processeur : + 60% par an
IMémoire DRAM : + 9% par an
performance processeur
→ Ratio temps acces memoire augmente d’environ 50% par an!
MFlop/s plus faciles que MB/s pour débit mémoire
I Hiérarchie mémoire de plus en plus complexe (mais latence
augmente)
I Façon d’accéder aux données de plus en plus critique:
I Minimiser les défauts de cache
I Minimiser la pagination mémoire
I Localité: améliorer le rapport références à des mémoires
locales/ références à des mémoires à distance
I Réutilisation, blocage: accroı̂tre le ratio flops/memory access
I Gestion des transferts de données ”à la main” ? (Cell, GPU)
61/ 94
Size Average access time (# cycles) hit/miss
Registers <1
62/ 94
Conception mémoire pour nombre important de
processeurs ?
Comment 500 processeurs peuvent-ils avoir accès à des données
rangées dans une mémoire partagée (technologie, interconnexion,
prix ?)
→ Solution à coût raisonnable : mémoire physiquement distribuée
(chaque processeur ou groupe de processeurs a sa propre mémoire
locale)
I 2 solutions :
I mémoires locales globalement adressables : Calulateurs à
mémoire partagée virtuelle
I transferts explicites des données entre processeurs avec
échanges de messages
I Scalabilité impose :
I augmentation linéaire débit mémoire / vitesse du processeur
I augmentation du débit des communications / nombre de
processeurs
I Rapport coût/performance → mémoire distribuée et bon
rapport coût/performance sur les processeurs 63/ 94
Architecture des multiprocesseurs
64/ 94
Terminologie
Architecture SMP (Symmetric Multi Processor)
I Mémoire partagée (physiquement et logiquement)
I Temps d’accès uniforme à la mémoire
I Similaire du point de vue applicatif aux architectures
multi-cœurs (1 cœur = 1 processeur logique)
I Mais communications bcp plus rapides dans les multi-cœurs
(latence < 3ns, bande passantee > 20 GB/s) que dans les
SMP (latence ≈ 60ns, bande passantee ≈ 2 GB/s)
Standards de programmation
Org. logique partagée: threads POSIX, directives OpenMP
Org. logique distribuée: PVM, MPI, sockets (Message Passing)
66/ 94
Evolution du calcul haute performance
68/ 94
Simulation numérique et matrices creuses
69/ 94
Equations aux dérivées partielles
70/ 94
Exemples d’équations aux dérivées partielles
71/ 94
Discrétisation (étape qui suit la modélisation
physique)
Travail du numéricien:
I Réalisation d’un maillage (régulier, irrégulier)
I Choix des méthodes de résolution et étude de leur
comportement
I Etude de la perte d’information due au passage à la dimension
finie
72/ 94
Discretization with finite differences (1D)
I Basic approximation (ok if h is small enough):
du u(x + h) − u(x)
(x) ≈
dx h
I Results from Taylor’s formula
du h2 d 2 u h3 d 3 u
u(x + h) = u(x) + h + + + O(h4 )
dx 2 dx 2 6 dx 3
I Replacing h by −h:
du h2 d 2 u h3 d 3 u
u(x − h) = u(x) − h + − + O(h4 )
dx 2 dx 2 6 dx 3
I Thus:
d 2u u(x + h) − 2u(x) + u(x − h)
2
= + O(h2 )
dx h2
73/ 94
Discretization with finite differences (1D)
1 −2 1
74/ 94
Finite Differences for the Laplacian Operator (2D)
Assuming same mesh refinement h in x and y directions:
u(x−h,y )−2u(x,y )+u(x+h,y ) u(x,y −h)−2u(x,y )+u(x,y +h)
∆u(x) ≈ h2
+ h2
1
∆u(x) ≈ h2 (u(x −h, y )+u(x +h, y )+u(x, y −h)+u(x, y +h)−4u(x, y ))
1 1 1
1 −4 1 −4
1 1 1
2 −1
0 0 0 0 u1 f1
−1 2 −1 0 0 0
u2
f2
1 0 −1 2 −1 0 0
u3
= f3
h2
0 0 −1 2 −1 0
u4
f4
0 0 0 −1 2 −1 u5 f5
0 0 0 0 −1 2 u6 f6
77/ 94
Slightly more complicated (2D)
∂u
∂(a(x, y ) ∂u
∂x )
∂(b(x, y ) ∂y )
− − + c(x, y ) × u = g (x, y ) sur Ω
∂x ∂y
u(x, y ) = 0 sur ∂Ω
0 ≤ x, y ≤ 1
a(x, y ) > 0
b(x, y ) > 0
c(x, y ) ≥ 0
78/ 94
I Case of a regular 2D mesh:
1
5
1 2 3 4
0 1
1
discretization step: h = n+1 , n =4
I 5-point finite difference scheme:
Ax = b,
I where
1 1
x1 ↔ u1,1 = u( n+1 , n+1 )
2 1
x2 ↔ u2,1 = u( n+1 , n+1 )
x3 ↔ u3,1
x4 ↔ u4,1
x5 ↔ u1,2 , . . .
I and A is n2 by n2 , b is of size n2 , with the following structure:
80/ 94
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
|x x x | 1 |g11|
|x x x x | 2 |g21|
| x x x x | 3 |g31|
| x x 0 x | 4 |g41|
|x 0 x x x | 5 |g12|
| x x x x x | 6 |g22|
| x x x x x | 7 |g32|
A=| x x x 0 x | 8 b=|g42|
| x 0 x x x | 9 |g13|
| x x x x x |10 |g23|
| x x x x x |11 |g33|
| x x x 0 x |12 |g43|
| x 0 x x |13 |g14|
| x x x x |14 |g24|
| x x x x |15 |g34|
| x x x |16 |g44|
81/ 94
Solution of the linear system
I Direct methods:
I L U factorization followed by triangular substitutions
I parallelism depends highly on the structure of the matrix
I Iterative methods:
I usually rely on sparse matrix-vector products
I algebraic preconditioner useful
82/ 94
Evolution in time of a complex phenomenon
I Examples:
I climate modeling,
evolution of radioactive waste, . . .
∂u(x,y ,z,t)
∆u(x, y , z, t) =
I heat equation: ∂t
u(x, y , z, t0 ) = u0 (x, y , z)
I Discretization in both space and time (1D case):
I Explicit approaches:
ujn+1 −ujn u n −2u n +u n
tn+1 −tn = j+1 h2j j−1 .
I Implicit approaches:
ujn+1 −ujn n+1
uj+1 n+1
−2ujn+1 +uj−1
tn+1 −tn = h 2 .
I Implicit approaches are preferred (more stable, larger timestep
possible) but are more numerically intensive: a sparse linear
system must be solved at each iteration.
83/ 94
Discretization with Finite elements
u = 0 on ∂Ω
I we can show (using Green’s formula) that the previous
problem is equivalent to:
Z
a(u, v ) = − f v dx dy ∀v such that v = 0 on ∂Ω
Ω
R ∂u ∂v ∂u ∂v
where a(u, v ) = Ω ∂x ∂x + ∂y ∂y dxdy
84/ 94
Finite element scheme: 1D Poisson Equation
2
I ∆u = ∂∂xu2 = f , u = 0 on ∂Ω
I Equivalent to
a(u, v ) = g (v ) for all v (v|∂Ω = 0)
where a(u, v ) = Ω ∂u ∂v
R R
∂x ∂x and g (v ) = − Ω f (x)v (x)dx
(1D: similar to integration by parts)
P
I Idea: we search u of the form = k αk Φk (x)
(Φk )k=1,n basis of functions such that Φk is linear on all Ei ,
and Φk (xi ) = δik = 1 if k = i, 0 otherwise.
Φk−1 Φk Φk+1
Ω
xk
Ek Ek+1
85/ 94
Finite Element Scheme: 1D Poisson Equation
Φk−1 Φk Φk+1
Ω
xk
Ek Ek+1
I We rewrite a(u, v ) = g (v ) for P
all Φk :
a(u, Φk ) = gR(Φk ) for all k ⇔ i αi a(Φi , Φk ) = g (Φk )
i ∂Φk
a(Φi , Φk ) = Ω ∂Φ
∂x ∂x = 0 when |i − k| ≥ 2
I k th equation associated with Φk
αk−1 a(Φk−1 , Φk ) + αk a(Φk , Φk ) + αk+1 a(Φk+1 , Φk ) = g (Φk )
R ∂Φ
I a(Φk−1 , Φk ) = Ek ∂xk−1 ∂Φ ∂x
k
a(Φk+1 , Φk ) = Ek +1 ∂Φ∂xk+1 ∂Φ
R k
∂x
a(Φk , Φk ) = Ek ∂Φ k ∂Φk ∂Φk ∂Φk
R R
∂x ∂x + Ek+1 ∂x ∂x
86/ 94
Finite Element Scheme: 1D Poisson Equation
From the point ofR view of Ek!, we have a 2x2 contribution matrix:
R ∂Φk−1 ∂Φk−1 ∂Φk−1 ∂Φk
∂x ∂x ∂x ∂x IEk (Φk−1 , Φk−1 ) IEk (Φk−1 , Φk )
REk∂Φk−1 ∂Φk REk ∂Φk ∂Φk =
IEk (Φk , Φk−1 ) IEk (Φk , Φk )
Ek ∂x ∂x Ek ∂x ∂x
Φ1 Φ2 Φ3
0 1 2 3 4 Ω
E1 E2 E3 E4
IE1 (Φ1 , Φ1 ) + IE2 (Φ1 , Φ1 ) IE2 (Φ1 , Φ2 )
IE2 (Φ2 , Φ1 ) IE2 (Φ2 , Φ2 ) + IE3 (Φ2 , Φ2 ) IE3 (Φ2 , Φ3 )
IE3 (Φ2 , Φ3 ) IE3 (Φ3 , Φ3 ) + IE4 (Φ3 , Φ3 )
α1 g (φ1 )
× α2 = g (φ2 )
α3 g (φ3 )
87/ 94
Finite Element Scheme in Higher Dimension
88/ 94
Finite Element Scheme in Higher Dimension
T
ai,i T
ai,j T
ai,k
T
T T T
j k C (T ) = aj,i aj,j aj,k
T
ak,i T
ak,j T
ak,k
88/ 94
Other example: linear least squares
89/ 94
Introduction to Sparse Matrix Computations
Motivation and main issues
Sparse matrices
Gaussian elimination
Parallel and high performance computing
Numerical simulation and sparse matrices
Direct vs iterative methods
Conclusion
90/ 94
Solution of sparse linear systems Ax = b (Direct or Iterative approaches ?
92/ 94
Summary – sparse matrices
93/ 94
Suggested home reading
94/ 94