Beruflich Dokumente
Kultur Dokumente
types of Simulation
Adapted after an initial set of slides of course CS267, UC Berkeley
Thanks to Jim Demmel and Kathy Yelick
Feb-15-08
1
Outline
Simulation models
Discrete event systems
Particle systems
Systems of Ordinary Differential Equations (ODEs)
Partial Different Equations, PDEs.
Casestudies in Lab Course
Feb-15-08
Simulation Models
Feb-15-08
Feb-15-08
Feb-15-08
Aspects of
communication patterns
Local
Structured
Static
Synchronous
Feb-15-08
Global
Unstructured
Dynamic
Asynchronous
Feb-15-08
Simulation of Discrete
Event System
State after next event determined completely by
present state
Explicit solution method
Impact of single event on system? Local/Global
Parallelization strategy:
Allocate objects of the system to processes
Graph partitioning
Feb-15-08
Any live cell with fewer than two live neighbors dies, as if by loneliness.
Any live cell with more than three live neighbors dies, as if by
overcrowding.
Any live cell with two or three live neighbors lives, unchanged, to the next
generation.
Any dead cell with exactly three live neighbors comes to life.
[wiki game of life]
Feb-15-08
10
Feb-15-08
11
Repeat
compute locally to update local system
exchange state info with neighbors
until done
Feb-15-08
12
Feb-15-08
13
Parallelism in Synchronous
Circuit Simulation
Feb-15-08
edge crossings = 10
14
Parallelism in Asynchronous
Simulation
Feb-15-08
15
Particle Systems
Feb-15-08
16
Particle Systems
Feb-15-08
17
Feb-15-08
18
Feb-15-08
19
Classification
Different type of forces, require a different
parallelization strategy:
External forces: no communication, easy
Short range forces: local communication,
medium
Long range forces: global communication,
difficult
Feb-15-08
20
Feb-15-08
21
Example: collisions
simplest algorithm is O(n2) : look at all pairs to see if they collide
Need to check
for collisions
between regions
Feb-15-08
22
Feb-15-08
23
Feb-15-08
24
Based on approximation
O(n log n) or O(n) instead of O(n2)
Forces from group of far-away particles simplifies
Barnes-Hut
Fast Multipole Method (FMM) of Greengard/Rohklin
Anderson
Irregular domain
decomposition
dynamic load
balancing problem
Feb-15-08
25
Feb-15-08
26
ODEs
Many systems can be modeled as
Coupled functions of continuous variables and their derivatives
with respect to one independent continuous variable (usually
time)
27
Circuit Example
State of the system is represented by
node voltages
branch currents all at time t
Equations include
Vi
I
V
[wiki thevenin]
[wiki kirchhoff]
Feb-15-08
28
d2y
m " 2 = #k " y
dt
Feb-15-08
29
Solving ODEs
Usually ODE system matrices are sparse:
i.e., most array elements are 0.
Notation:
x(t): continuous time representation
x[i]: discrete computer representation
Feb-15-08
30
dx
= x(t) = f (x) = a " x
dt
by
!
x[i + 1] = x[i] + "t # slope
Use slope at x[i]
t
!
! Explicit methods, e.g., (Forward) Eulers method.
Approximate dx /dt = a " x by
31
d2x
dx
dx
=
f
(
,
x)
=
b
"
+ a" x
2
dt
dt
dt
dx 2
= b " x 2 + a " x1
dt
dx1
= x2
dt
which are two coupled first order equations
!
Feb-15-08
32
!
A=
Feb-15-08
in the example
33
34
by
t+ t
Feb-15-08
35
Solving ODEs
Feb-15-08
36
P0
Questions
i: [j1,v1], [j2,v2],
Partitioning
P2
P3
Most
problematic
P1
37
Example
.
.
.
.
.
.
.
.
x
x
x
0
0
0
0
0
0
x
0
x
0
0
0
0
0
0
Feb-15-08
0
0
0
x
0
0
x
0
0
0
0
0
0
x
x
0
0
0
x
0
0
x
x
x
x
0
0
0
0
0
0
0
x
0
o
0
x
0
0
0
0
0
0
x
x
x
0
0
0
0
0
0
x
0
x
y .
.
.
.
.
.
.
.
x
x
x
0
0
0
0
0
0
x
0
x
0
0
0
0
x
0
0
0
0
x
0
0
x
0
0
0
0
0
0
x
x
0
0
0
x
0
x
x
x
x
x
0
0
0
0
0
0
0
x
0
o
0
x
0
0
0
0
0
0
x
x
x
0
0
0
0
0
0
x
0
38
Matrix Reordering
P0
P1
=
P2
P3
P4
Feb-15-08
39
1
x
x
x
x
2
x
x
3
x
4
x
3
1
1
1
1 x
2
3
4 x
Feb-15-08
x
x
x
x
4
x
x
x
x
3
4
40
1
1 x
2 x
3
4
Feb-15-08
3
x
4
x
x
x
2
x
x
3
x
x
4
x
x
41
Goals of Reordering
Performance goals
balance load
balance storage
minimize communication
Feb-15-08
42
1 1
2 1
3
4
1
1
5 1
6
1
6
Can reorder the rows/columns of the matrix by putting all the nodes in one
partition together
Feb-15-08
43
Iterative solvers
Will discuss several of these in future.
Jacobi, Successive over-relaxation (SOR) , Conjugate Gradient
(CG), Multigrid,...
Most have sparse-matrix-vector multiplication in kernel.
Eigenproblems
Also depend on sparse-matrix-vector multiplication, direct
methods.
Feb-15-08
44
Communication
Pattern may be arbitrary dependent on graph.
No spatial discretization, so no natural grid exists
45
Feb-15-08
46
47
Terminology
Term hyperbolic, parabolic, elliptic, come from special
cases of the general form of a second order linear PDE
# 2u
# 2u
# 2u
#u
#u
a" 2 + b"
+ c" 2 + d" + e" + f = 0
#x
#x#t
#t
#x
#t
where t is time
Notation
!
Feb-15-08
$ % $ $ $ (
"#
#' ,
,
*
$x i & $x1 $x 2 $x 3 )
48
"v
1 2
=
# v $ v % #v + f x $ #p
"t RE
#%v =0
Diffusion
Zero Divergence
Pressure
External Forces
!
Feb-15-08
49
x-h
x+h
Feb-15-08
"x 2
50
!
!
Discretize time and space and use explicit approach (as described
for ODEs) to approximate derivative:
u(x,t + 1) " u(x,t) u(x " h,t) " 2 $ u(x,t) + u(x + h,t)
=
#t
h2
u(x,t + 1) " u(x,t) = (#t /h 2 )(u(x " h,t) " 2 $ u(x,t) + u(x + h,t))
u(x,t + 1) = u(x,t) + (#t /h 2 )(u(x " h,t) " 2 $ u(x,t) + u(x + h,t))
Let z = t/h2
u(x,t + 1) = z " u(x # h,t) + (1# 2z)u(x,t) + z " u(x + h,t)
By changing variables (x to j and t to i):
u[ j,i + 1) = z " u[ j # 1,i] + (1# 2z)u[ j,i] + z " u[ j + i,i]
Feb-15-08
51
with z = t/h2
This corresponds to
matrix-vector multiply
nearest neighbors on grid
Feb-15-08
t=5
t=4
t=3
t=2
t=1
t=0
u[0,0] u[1,0] u[2,0] u[3,0] u[4,0] u[5,0]
52
Stencil Template
Multiplying by a tridiagonal matrix at each step
1-2z
z
L=
z
1-2z
z
z
1-2z
z
z
1-2z
z
1-2z
1-2z
53
Generalizes to
multiple dimensions
arbitrary graphs (= sparse matrices)
Feb-15-08
54
Feb-15-08
55
L=
-1
-1
-1
-1
-1
-1
-1
-1
2
56
Implicit Solution
The previous slide used Backwards Euler, but using the
trapezoidal rule gives better numerical properties.
This turns into solving the following equation
L=
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
57
2D Implicit Method
Similar to the 1D case, but the matrix L is now
4
-1
-1
-1
-1
-1
-1
L=
-1
-1
-1
4
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
4
-1
-1
-1
-1
-1
4
-1
-1
58
Feb-15-08
59
60
irregular grid
no time, various algorithms (Red-Black Gauss-Seidel, CG)
static irregular partitioning of grid
grid generation and adjustment, load imbalance.
Feb-15-08
no grid
time-stepper (Verlet algorithm, explicit)
particle-in-cell method for efficient parallelization
dynamic repartitioning of particles over processes
61
Feb-15-08
63
Implemented using
ghost regions.
Adds memory overhead
Feb-15-08
64
65
Adaptive Mesh
Feb-15-08
Feb-15-08
67
Feb-15-08
68
Feb-15-08
69
Feb-15-08
70
Summary PDEs
Computation
Sparse matrix-vector operations
Generation/calculation and storage of matrix elements
Communication
Partitioning: mesh or matrix
72
eigenvectors.
eigenvalues and
Seek solution of d (x) /dt = A " x of form x(t) = sin( f " t) " x 0
, where x0 is a constant vector.
Plug in to get -f2 *x0 = A*x0, so that f2 is an eigenvalue and
x0 is an eigenvector of A.
!
!
Solution schemes reduce either to sparse-matrix multiplication,
or solving sparse linear systems.
Feb-15-08
73
Scheduling Asynchronous
Circuit Simulation
Optimization strategy if process is ready to go to next
event, but not sure if another event that must be
processed earlier will yet arrive
Conservative:
Speculative:
Assume no new inputs will arrive and keep simulating, instead of waiting
May need to backup if assumption wrong
Feb-15-08
74
Primitives
Instructions
Examples
SimOS, SPIM
Cycle level
Functional units
VIRAM-p
Register Transfer
Level (RTL)
Gate Level
Register, counter,
MUX
Gate, flip-flop,
memory cell
Ideal transistor
VHDL
Resistors,
capacitors, etc.
Electrons, silicon
Spice
Switch level
Circuit level
Device level
Feb-15-08
Thor
Cosmos
75
Feb-15-08
76
Serial
N3
N2
N2
N2
N3/2
N3/2
N3/2
N*log N
N
N
PRAM
N
N
N
log N
N1/2 *log N
N1/2
N1/2
log N
log2 N
log N
Memory
N2
N3/2
N
N2
N
N
N*log N
N
N
N
#Procs
N2
N
N
N2
N
N
N
N
N
N
77
Overview
of Algorithms
Sorted in two orders (roughly):
from slowest to fastest on sequential machines.
from most general (works on any matrix) to most specialized (works on matrices like L).
Dense LU: Gaussian elimination; works on any N-by-N matrix.
Band LU: Exploits the fact that L is nonzero only on sqrt(N) diagonals nearest main diagonal.
Jacobi: Essentially does matrix-vector multiply by L in inner loop of iterative algorithm.
Explicit Inverse: Assume we want to solve many systems with L, so we can precompute and
store inv(L) for free, and just multiply by it (but still expensive).
Conjugate Gradient: Uses matrix-vector multiplication, like Jacobi, but exploits mathematical
properties of L that Jacobi does not.
Red-Black SOR (successive over-relaxation): Variation of Jacobi that exploits yet different
mathematical properties of L. Used in multigrid schemes.
LU: Gaussian elimination exploiting particular zero structure of L.
FFT (fast Fourier transform): Works only on matrices very like L.
Multigrid: Also works on matrices like L, that come from elliptic PDEs.
Lower Bound: Serial (time to print answer); parallel (time to combine N inputs).
Details in class notes and www.cs.berkeley.edu/~demmel/ma221.
Feb-15-08
78