Overview of Different Types of Simulation

Overview of different
types of Simulation
Adapted after an initial set of slides of course CS267, UC Berkeley
Thanks to Jim Demmel and Kathy Yelick
Feb-15-08
1
Outline
Simulation models
Discrete event systems
Particle systems
Systems of Ordinary Differential Equations (ODEs)
Partial Different Equations, PDEs.
Casestudies in Lab Course
Feb-15-08
Simulation Models
Feb-15-08
Sources of Parallelism and Locality

in Simulation
Real world problems have parallelism and locality

Many objects do not depend on other objects
Objects often depend more on nearby than distant objects
Dependence on distant objects often can be simplified
Scientific models may introduce more parallelism
When a continuous problem is discretized only nearest
neighbors may feel each other.
Far-field effects may be ignored or approximated if they have
little effect
Feb-15-08
Various Kinds of Simulation
Discrete event systems

e.g., Monte Carlo simulations, timing level simulation for circuits
Particle systems
e.g., billiard balls, semiconductor device simulation, galaxies
Differential systems
Functions depending on single independent variable
ODEs, e.g., circuit simulation, structural mechanics, chemical
kinetics
Functions depending on multiple independent variables
PDEs, e.g., heat, elasticity, electrostatics
Feb-15-08
Aspects of
communication patterns
Local
Structured
Static
Synchronous
Feb-15-08
Global
Unstructured
Dynamic
Asynchronous
Discrete Event Systems
Feb-15-08
Discrete Event Systems
Discrete Event Systems are represented as

Number of objects that have internal state
State of objects change upon arrival of events according to
transition function
System may be
synchronous:at each discrete time step evaluate all transition
functions of all objects
asynchronous: transition functions are evaluated only if the
inputs of an object change, based on an event from another
object in the system
[see also wiki discrete event systems]

Feb-15-08
Simulation of Discrete
Event System
State after next event determined completely by
present state
Explicit solution method
Impact of single event on system? Local/Global
Parallelization strategy:
Allocate objects of the system to processes
Graph partitioning
Feb-15-08
Example: Conways Game of Life

The universe of the Game of Life is an infinite
two-dimensional orthogonal grid of square cells,
each of which is in one of two possible states, live or dead.
Every cell interacts with its eight neighbors, which are
the cells that are directly horizontally, vertically, or
diagonally adjacent. At each step in time, the following
transitions occur:
Any live cell with fewer than two live neighbors dies, as if by loneliness.
Any live cell with more than three live neighbors dies, as if by
overcrowding.
Any live cell with two or three live neighbors lives, unchanged, to the next
generation.
Any dead cell with exactly three live neighbors comes to life.
[wiki game of life]
Feb-15-08
10
Parallelism in GoL [1]

First choice: List of cells
Difficult to parallelize: cells assigned to processes

Create process (coming alive)
Kill process (when dead)
Computation for a single cell in a process is way too small
Alternative: Groups of cells assigned to a process

Locality problem: groups move, grow, diffuse
Load balancing problem
Feb-15-08
11
Parallelism in GoL [2]

Second Choice: Store cell in a grid
The simulation is synchronous

use two copies of the grid (old and new)
value of new grid cell depends on 9 cells (itself plus 8 neighbors) in old
grid
simulation proceeds in steps, where each cell is updated at every step
Easy to parallelize using domain decomposition

P1 P2 P3
P4 P5 P6
P7 P8 P9
Repeat
compute locally to update local system
exchange state info with neighbors
until done
Locality is achieved by using large patches of the grid

boundary values from neighboring patches are needed
Feb-15-08
12
Example: Synchronous circuit

simulation
Example: functional level circuit simulation
state is represented by a set of Boolean variables
set of logical rules defining state transitions (AND, OR, NOT, etc.)
synchronous: only interested in state at clock ticks
Feb-15-08
13
Parallelism in Synchronous
Circuit Simulation
Circuit is a graph made up of subcircuits (objects) connected by

wires
generally a circuit is irregular (graph)
parallel algorithm is synchronous
compute subcircuit outputs
propagate outputs to connected subcircuits
Graph partitioning assigns subgraphs to processes

Goal; even distribution (load balance) with minimum edge crossing
(minimize communication)
edge crossings = 6
Feb-15-08
edge crossings = 10
14
Parallelism in Asynchronous
Simulation
Synchronous simulations may waste a lot of time

A lot of inputs may not change for a long period
Asynchronous simulations update only when an event arrives
Simulate only parts that really change
No global steps are made, but events contain time stamp
Examples:
Logic circuit simulation with delays
Event: a change in input of e.g. an AND-gate
Function transition: Update output of that AND-gate
Elevator systems
Parallelization: Again partitioning of circuit (i.e.graph)
Feb-15-08
15
Particle Systems
Feb-15-08
16
Particle Systems
A particle system has

a finite number of particles
moving in space according to e.g. Newtons Laws (i.e. F = ma)
time is continuous
Examples
stars in space with laws of gravity
electron beam semiconductor manufacturing
atoms in a molecule with electrostatic forces
protein folding
neutrons in a fission reactor
Feb-15-08
17
Example: Protein folding

Folding@home
Feb-15-08
18
Forces in Particle Systems

force = external_force + nearby_force + far_field_force
Force on particle can be subdivided in

External force
externally imposed electric field in electron beam
Nearby force (short range)
balls on a billiard table bounce off of each other
Van der Waals forces in fluid (potential as 1/r6)
Far-field force (long range)
gravity, electrostatics (potential as 1/r)
Feb-15-08
19
Classification
Different type of forces, require a different
parallelization strategy:
External forces: no communication, easy
Short range forces: local communication,
medium
Long range forces: global communication,
difficult
Feb-15-08
20
Parallelism in External Forces

These are the simplest
Force on each particle is independent of other particles
Called embarrassingly parallel
Evenly distribute particles on processors

Any distribution works
Locality is not an issue, no communication
For each particle on processor, apply the external force
Feb-15-08
21
Parallelism in Nearby Forces
Nearby forces require interaction communication

Force may depend on other nearby particles
Usual parallel model is domain decomposition of physical domain
O(n2/p) particles per processor; if evenly distributed
Challenge 1: interactions of particles near processor boundary
Example: collisions
simplest algorithm is O(n2) : look at all pairs to see if they collide
need to communicate particles near boundary to neighboring processors

surface to volume effect means low communication
Challenge 2: load imbalance, if particles cluster
galaxies, electrons hitting a device wall
Need to check
for collisions
between regions
Feb-15-08
22
Parallelism in Far-Field Forces

Far-field forces involve all-to-all interaction
communication
Force depends on all other particles
Example: gravity in galaxies
Simplest algorithm is O(n2)
Just decomposing space does not help since every particle
apparently needs to visit every other particle
Use more clever algorithms to beat O(n2)

Price paid for higher performance is accuracy. One has
to carefully look whether this is acceptable.
Feb-15-08
23
Far-field forces: ParticleMesh Methods
Superimpose a regular mesh

Move particles to nearest grid point
Exploit fact that far-field satisfies a PDE that is easy to solve on a
regular mesh
FFT, Multigrid
Accuracy depends on how fine the grid is and uniformity of particles
Feb-15-08
24
Far-field forces: Tree Decomposition
Based on approximation
O(n log n) or O(n) instead of O(n2)
Forces from group of far-away particles simplifies
They resemble a single larger particle
Use tree; each node contains an approximation of descendents

Several algorithms can be applied
Barnes-Hut
Fast Multipole Method (FMM) of Greengard/Rohklin
Anderson
Irregular domain
decomposition
dynamic load
balancing problem
Feb-15-08
25
Ordinary Differential Equations
Feb-15-08
26
ODEs
Many systems can be modeled as
Coupled functions of continuous variables and their derivatives
with respect to one independent continuous variable (usually
time)
Example: electronic circuit

wires are links
nodes are connections between 2 or more wires
each link has resistor, capacitor, inductor or voltage source
Variables are related by Ohms Law, Kirchoffs Laws, etc.
[wiki ordinary differential equations]
Feb-15-08
27
Circuit Example
State of the system is represented by
node voltages
branch currents all at time t
Equations include
Kirchoffs circuit laws

Thevenins theorem
Ohms law
Capacitance
Inductance
$C " d /dt #1' $V ' $ 0 '

&
)*& )=& )
R ( % I ( %Vi (
% 1
R
Vi
I
V
Write as single!large system of ODEs
[wiki thevenin]
[wiki kirchhoff]
Feb-15-08
28
Structural Analysis Example
Another example is structural analysis in Civil Engineering
Variables are displacement of points in a building.

Newtons and Hooks (spring) laws apply.
Static modeling: exert force and determine displacement.
Dynamic modeling: apply continuous force (earthquake).
Eigenvalue problem: do the resonant modes of the building match an
earthquake. The system in these case (and many others) will be
sparse.
d2y
m " 2 = #k " y
dt
Feb-15-08
29
Solving ODEs
Usually ODE system matrices are sparse:
i.e., most array elements are 0.
Given a set of ODEs, two kinds of questions are:

Compute the values of the variables at some time t
Explicit methods
Implicit methods
Compute modes of vibration
Eigenvalue problem
Notation:
x(t): continuous time representation
x[i]: discrete computer representation
Feb-15-08
30
Solving ODEs: Explicit Methods

Assume ODE is first order:
Compute x(i " #t) = x[i]
at i=0,1,2,
Approximate dx(i " #t)
!/dt
dx
= x(t) = f (x) = a " x
dt
by
!
x[i + 1] = x[i] + "t # slope
Use slope at x[i]
t
!
! Explicit methods, e.g., (Forward) Eulers method.
Approximate dx /dt = a " x by
x[i + 1] " x[i]

= a $ x[i]
#t
[wiki numerical ordinary differential equations]

Feb-15-08
31
Second order ODE

Assume ODE is second order
d2x
dx
dx
=
f
(
,
x)
=
b
"
+ a" x
2
dt
dt
dt
Now take x1 = x and x2 = dx/dt

We get
dx 2
= b " x 2 + a " x1
dt
dx1
= x2
dt
which are two coupled first order equations
!
Feb-15-08
32
Any order ODE

We can generalize this and write it as: d x /dt = f (x) = A " x
where x, dx/dt, and f are now vectors and A is a matrix
!
A=
Feb-15-08
in the example
33
Solving ODEs: Explicit Methods
Assume ODE is dx /dt = f (x) = A " x , where A is a sparse matrix

at i=0,1,2,
Approximate dx(i " #t) /dt
x[i + 1] = x[i] + "t # slope x and dx/dt are now vectors !

!
Explicit methods, e.g., (Forward) Eulers method.
dx
x[i + 1] " x[i]
Approximate
!
= A " x by
= A $ x[i]
dt
#t
!
x[i + 1] = x[i] + "t # A # x[i]
i.e. sparse matrix-vector multiplication.
Tradeoffs:
Simple!algorithm: sparse!matrix vector multiply.

Stability problems: May need to take very small time steps, especially if
system is stiff (i.e. can change rapidly).
Feb-15-08
34
Solving ODEs: Implicit Methods
Assume ODE is dx /dt = f (x) = A " x , where A is a sparse matrix

at i=0,1,2,
Approximate dx(i " #t) /dt
!x[i + 1] = x[i] + "t # slope
Use slope at x[i+1]
Implicit!method, e.g., Backward Euler solver.
Approximate dx /dt = A " x
by
t+ t
x[i + 1] " x[i]

= A $ x[i + 1]
#t
i.e. we need to solve a sparse linear system
(I " #t $ A) $ x[i + 1] = x[i]

!
Trade-offs:
!
Larger time step possible: especially for stiff problems
! More difficult algorithm: need to do a sparse solve at each step
Feb-15-08
35
Solving ODEs
Explicit methods to compute solution(t)

E.g., Eulers method
Simple algorithm: sparse matrix vector multiply
May need to take very small time steps, especially if system is stiff (i.e.
can change rapidly)
Implicit methods to compute solution(t)

E.g., Backward Eulers Method
Larger time steps, especially for stiff problems
More difficult algorithm: solve a sparse linear system
All these reduce to sparse matrix problems

Explicit: sparse matrix-vector multiplication
Implicit: solve a sparse linear system
iterative solvers: use sparse matrix-vector multiplication
Feb-15-08
36
Parallel Sparse Matrix-vector

multiplication
y = A*x, where A is a sparse n x n matrix
P0
Questions
i: [j1,v1], [j2,v2],
which processors store

y[i], x[i], and A[i,j]
which processors compute
y[i] = sum (from j=1 to n) A[i,j] * x[j]
= (row i of A) * x
a sparse dot product
Partitioning
P2
P3
Most
problematic
Partition index set {1,,n} = N1 + N2 + + Np.

For all i in Nk, Processor k stores y[i], x[i], and row i of A
For all i in Nk, Processor k computes y[i] = (row i of A) * x
owner computes rule: Processor k compute the y[i]s it owns.
Feb-15-08
P1
37
Example
.
.
.
.
.
.
.
.
x
x
x
0
0
0
0
0
0
x
0
x
0
0
0
0
0
0
Feb-15-08
0
0
0
x
0
0
x
0
0
0
0
0
0
x
x
0
0
0
x
0
0
x
x
x
x
0
0
0
0
0
0
0
x
0
o
0
x
0
0
0
0
0
0
x
x
x
0
0
0
0
0
0
x
0
x
y .
.
.
.
.
.
.
.
x
x
x
0
0
0
0
0
0
x
0
x
0
0
0
0
x
0
0
0
0
x
0
0
x
0
0
0
0
0
0
x
x
0
0
0
x
0
x
x
x
x
x
0
0
0
0
0
0
0
x
0
o
0
x
0
0
0
0
0
0
x
x
x
0
0
0
0
0
0
x
0
38
Matrix Reordering
Ideal matrix structure for parallelism: block diagonal

p (number of processors) blocks, can all be computed locally.
few non-zeros outside these blocks, which require communication.
Can we reorder the rows/columns to achieve this?
P0
P1
=
P2
P3
P4
Feb-15-08
39
Effects of reordering [1]

2
1
2
3
4
1
x
x
x
x
2
x
x
3
x
4
x
3
1
1
1
1 x
2
3
4 x
Feb-15-08
x
x
x
x
4
x
x
x
x
3
4
40
Effects of reordering [2]

1
1 x
2
3 x
4
1
1 x
2 x
3
4
Feb-15-08
3
x
4
x
x
x
2
x
x
3
x
x
4
x
x
41
Goals of Reordering
Performance goals
balance load
balance storage
minimize communication
Some algorithms reorder for other reasons

Reduce # nonzeros in answer (fill)
Improve numerical properties
Feb-15-08
42
Graph Partitioning and Sparse Matrices

Relationship between matrix and graph
1
1 1
2 1
3
4
1
1
5 1
6
1
6
A good partition of the graph has

equal number of (weighted) nodes in each part (load balance)
minimum number of edges crossing between (minimize communication)
Can reorder the rows/columns of the matrix by putting all the nodes in one
partition together
Feb-15-08
43
Implicit Methods and Eigenproblems

Direct methods (Gaussian elimination)
Called LU Decomposition, because we factor A = L*U.
Future lectures will consider both dense and sparse cases.
More complicated than sparse-matrix vector multiplication.
Iterative solvers
Will discuss several of these in future.
Jacobi, Successive over-relaxation (SOR) , Conjugate Gradient
(CG), Multigrid,...
Most have sparse-matrix-vector multiplication in kernel.
Eigenproblems
Also depend on sparse-matrix-vector multiplication, direct
methods.
Feb-15-08
44
Summary for Systems of ODEs

Computation
Sparse matrix-vector operations
Communication
Pattern may be arbitrary dependent on graph.
No spatial discretization, so no natural grid exists
Important considerations for

parallelization:
These are problems without meshes that are to be
solved with techniques from linear algebra
It is not trivial how one can optimally assign
different parts of graph to different processes
Feb-15-08
45
Partial Differential Equations
Feb-15-08
46
Continuous Variables, Continuous

Parameters
Examples of such systems include
Parabolic (time-dependent) problems:
Heat flow: Temperature(position, time)
Diffusion: Concentration(position, time)
Elliptic (steady state) problems:

Electrostatic or Gravitational Potential: Potential(position)
Hyperbolic problems (waves):

Quantum mechanics: Wave-function(position,time)
Many problems combine features of above

Fluid flow: Velocity,Pressure,Density(position,time)
Elasticity: Stress,Strain(position,time)
Feb-15-08
47
Terminology
Term hyperbolic, parabolic, elliptic, come from special
cases of the general form of a second order linear PDE
# 2u
# 2u
# 2u
#u
#u
a" 2 + b"
+ c" 2 + d" + e" + f = 0
#x
#x#t
#t
#x
#t
where t is time
!Analog to solutions of general quadratic equation

a " x 2 + b " xy + c " x 2 + d " x + e " y + f = 0
Notation
!
Feb-15-08
$ % $ $ $ (
"#
#' ,
,
*
$x i & $x1 $x 2 $x 3 )
48
Example: Flow problems

Advection
The Navier Stokes equations

v = velocity vector
RE= Renolds number
"v
1 2
=
# v $ v % #v + f x $ #p
"t RE
#%v =0
Diffusion
Zero Divergence
Pressure
External Forces
!
Feb-15-08
Courtesy: R. Westerman TU-Munich
49
Example: Deriving the Heat Equation

BAR
x-h
x+h
Consider a simple problem

A bar of uniform material, insulated except at ends
Let u(x,t) be the temperature at position x at time t
Heat travels from x-h to x+h at rate proportional to:
u(x # h,t) # u(x,t) u(x,t) # u(x + h,t)
#
du(x,t)
h
h
= C"
dt
h
2
As h0, we get the heat equation: "u(x,t) = C # " u(x,t)
"t
Feb-15-08
"x 2
50
!
!
Details of the Explicit Method for Heat
From experimentation (physical observation) we have:

u(x,t)/t = 2u(x,t)/x2
(assume C = 1 for simplicity)
Discretize time and space and use explicit approach (as described
for ODEs) to approximate derivative:
u(x,t + 1) " u(x,t) u(x " h,t) " 2 $ u(x,t) + u(x + h,t)
=
#t
h2
u(x,t + 1) " u(x,t) = (#t /h 2 )(u(x " h,t) " 2 $ u(x,t) + u(x + h,t))
u(x,t + 1) = u(x,t) + (#t /h 2 )(u(x " h,t) " 2 $ u(x,t) + u(x + h,t))
Let z = t/h2
u(x,t + 1) = z " u(x # h,t) + (1# 2z)u(x,t) + z " u(x + h,t)
By changing variables (x to j and t to i):
u[ j,i + 1) = z " u[ j # 1,i] + (1# 2z)u[ j,i] + z " u[ j + i,i]
Feb-15-08
51
Explicit Solution of the Heat Equation

Use finite differences with u[j,i] as the heat at
time t= i*t (i = 0,1,2,) and position x = j*h (j=0,1,,N=1/h)
initial conditions on u[j,0]
boundary conditions on u[0,i] and u[N,i]
At each timestep i = 0,1,2,...

For j=1 to N-1
u[j,i+1]= z*u[j-1,i]+ (1-2*z)*u[j,i]+z*u[j+1,i]
with z = t/h2
This corresponds to
matrix-vector multiply
nearest neighbors on grid
Feb-15-08
t=5
t=4
t=3
t=2
t=1
t=0
u[0,0] u[1,0] u[2,0] u[3,0] u[4,0] u[5,0]
52
Stencil Template
Multiplying by a tridiagonal matrix at each step
1-2z
z
L=
z
1-2z
z
Graph and 3 point stencil
z
1-2z
z
z
1-2z
z
1-2z
1-2z
u[:,i + 1] = L " u[:,i]
For a 2D mesh (5 point stencil) the matrix is

pentadiagonal
More on the matrix/grid views later
Feb-15-08
53
Parallelism in Explicit Method for PDEs
Partitioning the space (x) into p equal chunks

good load balance (assuming large number of points relative to p)
minimized communication (only p chunks)
Generalizes to
multiple dimensions
arbitrary graphs (= sparse matrices)
Problem with explicit approach

numerical instability
solution blows up eventually if z = t/h2 > .5
need to make the timesteps very small when h is small: t < .5*h2
Feb-15-08
54
Instability in Solving the Heat Equation

Explicitly
Feb-15-08
55
Implicit Solution of the Heat Equation

Discretize time and space and use implicit approach
(backward Euler) to approximate derivative:
u(x,t + 1) " u(x,t) u(x " h,t + 1) " 2 $ u(x,t + 1) + u(x + h,t + 1)
=
#t
h2
u(x,t) = u(x,t + 1) + (#t /h 2 )(u(x " h,t + 1) " 2 $ u(x,t + 1) + u(x + h,t + 1))
Let z = t/h2 and change variables (t to j and x to i)

!
u[:,i] = (I " z # L) # u[:,i + 1]
Where I is identity and

L is Laplacian
Feb-15-08
L=
-1
-1
-1
-1
-1
-1
-1
-1
2
56
Implicit Solution
The previous slide used Backwards Euler, but using the
trapezoidal rule gives better numerical properties.
This turns into solving the following equation
(I + (z /2) " L) " u[:,i + 1] = (I # (z /2) " L) " u[:,i]

Here I is the identity matrix and L is:
L=
-1
-1
-1
-1
-1
-1
-1
-1
Graph and stencil
-1
-1
i.e., essentially solving Poissons equation in 1D

Feb-15-08
57
2D Implicit Method
Similar to the 1D case, but the matrix L is now
4
-1
-1
-1
-1
-1
-1
L=
-1
Graph and stencil
-1
-1
4
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
4
-1
-1
-1
-1
-1
4
-1
-1
Multiplying by this matrix (as in the explicit case) is

simply nearest neighbor computation on 2D grid
To solve this system, there are several techniques
Feb-15-08
58
Solvers for Poisson 2D

There are many types of numerical solvers
Each has different properties regarding
parallel time complexity
number of processor required
storage occupation
Which one to pick depends heavily on matrix structure

and dynamic range of element values
more details in coming lectures
Feb-15-08
59
Summary of Approaches to Solving PDEs

As with ODEs, either explicit or implicit approaches are
possible
Explicit, sparse matrix-vector multiplication
Implicit, sparse matrix solve at each step
Direct solvers are hard (more on this later)
Iterative solves turn into sparse matrix-vector multiplication
Grid and sparse matrix correspondence:

Sparse matrix-vector multiplication is nearest neighbor
averaging on the underlying mesh
Not all nearest neighbor computations have the same

efficiency
Factors are the mesh structure (nonzero structure) and the
number of calculation per mesh point
Feb-15-08
60
Lab assignments [1]

Poisson equation in 2-d
regular grid
no time, various algorithms (Red-Black Gauss-Seidel, CG)
static regular partitioning of grid
Finite elements in 2-d
irregular grid
no time, various algorithms (Red-Black Gauss-Seidel, CG)
static irregular partitioning of grid
grid generation and adjustment, load imbalance.
Interacting particles in 2-d
Feb-15-08
no grid
time-stepper (Verlet algorithm, explicit)
particle-in-cell method for efficient parallelization
dynamic repartitioning of particles over processes
61
Lab Assignments [2]

All software is written in C
No emphasis on language constructs
All parallel programs use the MPI library for communication
All programs run on the DAS-3
Understand the structure of a Message Passing program.
Build parallel program
Investigation and optimization of performance are
important
What is the program doing all the time?

Why do various phases take so long?
How does execution time changes if the problem size goes up?
How do you compare the efficiency of one algorithm with that of
Feb-15-08 another?
62
Comments on practical meshes
Regular 1D, 2D, 3D meshes

Important as building blocks for more complicated meshes
Practical meshes are often irregular

Composite meshes, consisting of multiple bent regular meshes joined
at edges
Unstructured meshes, with arbitrary mesh points and connectivities
Adaptive meshes, change resolution during solution process to put
computational effort where it is needed
Feb-15-08
63
Parallelism in Regular meshes

Computing a Stencil on a regular mesh
need to communicate mesh points near boundary to
neighboring processors.
Often done with ghost regions
Surface-to-volume ratio keeps communication down, but
Still may be problematic in practice
Implemented using
ghost regions.
Adds memory overhead
Feb-15-08
64
Adaptive Mesh Refinement (AMR)
Adaptive mesh around an explosion

Refinement done by calculating errors
Parallelism
Mostly between patches, dealt to processors for load balance
May exploit some within a patch (SMP)
Feb-15-08
65
Adaptive Mesh
Shock waves in a gas dynamics using AMR (Adaptive Mesh Refinement)66

See: http://www.llnl.gov/CASC/SAMRAI/
Feb-15-08
Composite mesh from a

mechanical structure
Feb-15-08
67
Converting the mesh to a matrix
Feb-15-08
68
Effects of Reordering on Gaussian Elimination
Feb-15-08
69
Irregular mesh: NASA Airfoil in 2D
Feb-15-08
70
Challenges of irregular meshes (and a

few solutions)
How to generate them in the first place
Triangle, a 2D mesh partitioner by Jonathan Shewchuk
3D is harder !
How to partition them

ParMetis, a parallel graph partitioner
How to design iterative solvers

PETSc, a Portable Extensible Toolkit for Scientific Computing
Prometheus, a multigrid solver for finite element problems on
irregular meshes
How to design direct solvers

SuperLU, parallel sparse Gaussian elimination
These are challenges to do sequentially, the more so in

parallel
Feb-15-08
71
Summary PDEs
Computation
Sparse matrix-vector operations
Generation/calculation and storage of matrix elements
Communication
Partitioning: mesh or matrix
Problems may be regular/irregular, dynamic/static

Important for class:
Problems with meshes that need to be solved with
clever techniques
How to assign different parts of mesh to different
processes
Feb-15-08
72
Solving ODEs: Eigensolvers
Computing modes of vibration: finding
eigenvectors.
eigenvalues and
Seek solution of d (x) /dt = A " x of form x(t) = sin( f " t) " x 0
, where x0 is a constant vector.
Plug in to get -f2 *x0 = A*x0, so that f2 is an eigenvalue and
x0 is an eigenvector of A.
!
!
Solution schemes reduce either to sparse-matrix multiplication,
or solving sparse linear systems.
Feb-15-08
73
Scheduling Asynchronous
Circuit Simulation
Optimization strategy if process is ready to go to next
event, but not sure if another event that must be
processed earlier will yet arrive
Conservative:
Only simulate up to the minimum time stamp of its inputs
Speculative:
Assume no new inputs will arrive and keep simulating, instead of waiting
May need to backup if assumption wrong
More relevant for compiler optimization, but not so for

design of parallel algorithms
Feb-15-08
74
Example: Circuit Simulation

Circuits are simulated at many different levels
Level
Instruction level
Primitives
Instructions
Examples
SimOS, SPIM
Cycle level
Functional units
VIRAM-p
Register Transfer
Level (RTL)
Gate Level
Register, counter,
MUX
Gate, flip-flop,
memory cell
Ideal transistor
VHDL
Resistors,
capacitors, etc.
Electrons, silicon
Spice
Switch level
Circuit level
Device level
Feb-15-08
Thor
Cosmos
75
Relation of Poisson to Gravity,

Electrostatics
Poisson equation arises in many problems

E.g., force on particle at (x,y,z) due to particle at 0 is
-(x,y,z)/r3, where r = sqrt(x2 +y2 +z2 )
Force is also gradient of potential V = -1/r
= -(d/dx V, d/dy V, d/dz V) = -grad V
V satisfies Poissons equation (try working this out!)
Feb-15-08
76
Algorithms for 2D Poisson

Equation (size N)
Algorithm
Dense LU
Band LU
Jacobi
Explicit Inv.
Conj.Grad.
RB SOR
Sparse LU
FFT
Multigrid
Lower bound
Serial
N3
N2
N2
N2
N3/2
N3/2
N3/2
N*log N
N
N
PRAM
N
N
N
log N
N1/2 *log N
N1/2
N1/2
log N
log2 N
log N
Memory
N2
N3/2
N
N2
N
N
N*log N
N
N
N
#Procs
N2
N
N
N2
N
N
N
N
N
N
PRAM is an idealized parallel model with zero cost communication

Reference: James Demmel, Applied Numerical Linear Algebra, SIAM, 1997.
Feb-15-08
77
Overview
of Algorithms
Sorted in two orders (roughly):
from slowest to fastest on sequential machines.
from most general (works on any matrix) to most specialized (works on matrices like L).
Dense LU: Gaussian elimination; works on any N-by-N matrix.
Band LU: Exploits the fact that L is nonzero only on sqrt(N) diagonals nearest main diagonal.
Jacobi: Essentially does matrix-vector multiply by L in inner loop of iterative algorithm.
Explicit Inverse: Assume we want to solve many systems with L, so we can precompute and
store inv(L) for free, and just multiply by it (but still expensive).
Conjugate Gradient: Uses matrix-vector multiplication, like Jacobi, but exploits mathematical
properties of L that Jacobi does not.
Red-Black SOR (successive over-relaxation): Variation of Jacobi that exploits yet different
mathematical properties of L. Used in multigrid schemes.
LU: Gaussian elimination exploiting particular zero structure of L.
FFT (fast Fourier transform): Works only on matrices very like L.
Multigrid: Also works on matrices like L, that come from elliptic PDEs.
Lower Bound: Serial (time to print answer); parallel (time to combine N inputs).
Details in class notes and www.cs.berkeley.edu/~demmel/ma221.
Feb-15-08
78

Overview of Different Types of Simulation

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Overview of Different Types of Simulation

Hochgeladen von

Copyright:

Verfügbare Formate

Overview of different

Sources of Parallelism and Locality

Real world problems have parallelism and locality

Various Kinds of Simulation

Discrete event systems

Discrete Event Systems

Discrete Event Systems

Discrete Event Systems are represented as

[see also wiki discrete event systems]

Example: Conways Game of Life

Parallelism in GoL [1]

Difficult to parallelize: cells assigned to processes

Computation for a single cell in a process is way too small

Alternative: Groups of cells assigned to a process

Parallelism in GoL [2]

The simulation is synchronous

Easy to parallelize using domain decomposition

Locality is achieved by using large patches of the grid

Example: Synchronous circuit

Circuit is a graph made up of subcircuits (objects) connected by

Graph partitioning assigns subgraphs to processes

Synchronous simulations may waste a lot of time

A particle system has

Example: Protein folding

Forces in Particle Systems

Force on particle can be subdivided in

Parallelism in External Forces

Evenly distribute particles on processors

For each particle on processor, apply the external force

Parallelism in Nearby Forces

Nearby forces require interaction communication

Usual parallel model is domain decomposition of physical domain

O(n2/p) particles per processor; if evenly distributed

Challenge 1: interactions of particles near processor boundary

need to communicate particles near boundary to neighboring processors

Challenge 2: load imbalance, if particles cluster

galaxies, electrons hitting a device wall

Parallelism in Far-Field Forces

Use more clever algorithms to beat O(n2)

Far-field forces: ParticleMesh Methods

Superimpose a regular mesh

Accuracy depends on how fine the grid is and uniformity of particles

Far-field forces: Tree Decomposition

They resemble a single larger particle

Use tree; each node contains an approximation of descendents

Ordinary Differential Equations

Example: electronic circuit

Kirchoffs circuit laws

$C " d /dt #1' $V ' $ 0 '

Write as single!large system of ODEs

Structural Analysis Example

Another example is structural analysis in Civil Engineering

Variables are displacement of points in a building.

Given a set of ODEs, two kinds of questions are:

Solving ODEs: Explicit Methods

x[i + 1] " x[i]

[wiki numerical ordinary differential equations]

Second order ODE

Now take x1 = x and x2 = dx/dt

Any order ODE

Solving ODEs: Explicit Methods

Assume ODE is dx /dt = f (x) = A " x , where A is a sparse matrix

x[i + 1] = x[i] + "t # slope x and dx/dt are now vectors !

u[j,i+1]= zu[j-1,i]+ (1-2z)u[j,i]+zu[j+1,i]