Sie sind auf Seite 1von 78

Overview of different

types of Simulation
Adapted after an initial set of slides of course CS267, UC Berkeley
Thanks to Jim Demmel and Kathy Yelick

Feb-15-08
1

Outline
Simulation models
Discrete event systems
Particle systems
Systems of Ordinary Differential Equations (ODEs)
Partial Different Equations, PDEs.
Casestudies in Lab Course

Feb-15-08

Simulation Models

Feb-15-08

Sources of Parallelism and Locality


in Simulation

Real world problems have parallelism and locality


Many objects do not depend on other objects
Objects often depend more on nearby than distant objects
Dependence on distant objects often can be simplified
Scientific models may introduce more parallelism
When a continuous problem is discretized only nearest
neighbors may feel each other.
Far-field effects may be ignored or approximated if they have
little effect

Feb-15-08

Various Kinds of Simulation

Discrete event systems


e.g., Monte Carlo simulations, timing level simulation for circuits
Particle systems
e.g., billiard balls, semiconductor device simulation, galaxies
Differential systems
Functions depending on single independent variable
ODEs, e.g., circuit simulation, structural mechanics, chemical
kinetics
Functions depending on multiple independent variables
PDEs, e.g., heat, elasticity, electrostatics

Feb-15-08

Aspects of
communication patterns
Local
Structured
Static
Synchronous

Feb-15-08

Global
Unstructured
Dynamic
Asynchronous

Discrete Event Systems

Feb-15-08

Discrete Event Systems

Discrete Event Systems are represented as


Number of objects that have internal state
State of objects change upon arrival of events according to
transition function
System may be
synchronous:at each discrete time step evaluate all transition
functions of all objects
asynchronous: transition functions are evaluated only if the
inputs of an object change, based on an event from another
object in the system

[see also wiki discrete event systems]


Feb-15-08

Simulation of Discrete
Event System
State after next event determined completely by
present state
Explicit solution method
Impact of single event on system? Local/Global
Parallelization strategy:
Allocate objects of the system to processes
Graph partitioning

Feb-15-08

Example: Conways Game of Life


The universe of the Game of Life is an infinite
two-dimensional orthogonal grid of square cells,
each of which is in one of two possible states, live or dead.
Every cell interacts with its eight neighbors, which are
the cells that are directly horizontally, vertically, or
diagonally adjacent. At each step in time, the following
transitions occur:

Any live cell with fewer than two live neighbors dies, as if by loneliness.
Any live cell with more than three live neighbors dies, as if by
overcrowding.
Any live cell with two or three live neighbors lives, unchanged, to the next
generation.
Any dead cell with exactly three live neighbors comes to life.
[wiki game of life]

Feb-15-08

10

Parallelism in GoL [1]


First choice: List of cells

Difficult to parallelize: cells assigned to processes


Create process (coming alive)
Kill process (when dead)

Computation for a single cell in a process is way too small

Alternative: Groups of cells assigned to a process


Locality problem: groups move, grow, diffuse
Load balancing problem

Feb-15-08

11

Parallelism in GoL [2]


Second Choice: Store cell in a grid

The simulation is synchronous


use two copies of the grid (old and new)
value of new grid cell depends on 9 cells (itself plus 8 neighbors) in old
grid
simulation proceeds in steps, where each cell is updated at every step

Easy to parallelize using domain decomposition


P1 P2 P3
P4 P5 P6
P7 P8 P9

Repeat
compute locally to update local system
exchange state info with neighbors
until done

Locality is achieved by using large patches of the grid


boundary values from neighboring patches are needed

Feb-15-08

12

Example: Synchronous circuit


simulation
Example: functional level circuit simulation
state is represented by a set of Boolean variables
set of logical rules defining state transitions (AND, OR, NOT, etc.)
synchronous: only interested in state at clock ticks

Feb-15-08

13

Parallelism in Synchronous
Circuit Simulation

Circuit is a graph made up of subcircuits (objects) connected by


wires
generally a circuit is irregular (graph)
parallel algorithm is synchronous
compute subcircuit outputs
propagate outputs to connected subcircuits

Graph partitioning assigns subgraphs to processes


Goal; even distribution (load balance) with minimum edge crossing
(minimize communication)
edge crossings = 6

Feb-15-08

edge crossings = 10

14

Parallelism in Asynchronous
Simulation

Synchronous simulations may waste a lot of time


A lot of inputs may not change for a long period
Asynchronous simulations update only when an event arrives
Simulate only parts that really change
No global steps are made, but events contain time stamp
Examples:
Logic circuit simulation with delays
Event: a change in input of e.g. an AND-gate
Function transition: Update output of that AND-gate
Elevator systems
Parallelization: Again partitioning of circuit (i.e.graph)

Feb-15-08

15

Particle Systems

Feb-15-08

16

Particle Systems

A particle system has


a finite number of particles
moving in space according to e.g. Newtons Laws (i.e. F = ma)
time is continuous
Examples
stars in space with laws of gravity
electron beam semiconductor manufacturing
atoms in a molecule with electrostatic forces
protein folding
neutrons in a fission reactor

Feb-15-08

17

Example: Protein folding


Folding@home

Feb-15-08

18

Forces in Particle Systems


force = external_force + nearby_force + far_field_force

Force on particle can be subdivided in


External force
externally imposed electric field in electron beam
Nearby force (short range)
balls on a billiard table bounce off of each other
Van der Waals forces in fluid (potential as 1/r6)
Far-field force (long range)
gravity, electrostatics (potential as 1/r)

Feb-15-08

19

Classification
Different type of forces, require a different
parallelization strategy:
External forces: no communication, easy
Short range forces: local communication,
medium
Long range forces: global communication,
difficult

Feb-15-08

20

Parallelism in External Forces


These are the simplest
Force on each particle is independent of other particles
Called embarrassingly parallel

Evenly distribute particles on processors


Any distribution works
Locality is not an issue, no communication

For each particle on processor, apply the external force

Feb-15-08

21

Parallelism in Nearby Forces

Nearby forces require interaction communication


Force may depend on other nearby particles

Usual parallel model is domain decomposition of physical domain

O(n2/p) particles per processor; if evenly distributed

Challenge 1: interactions of particles near processor boundary

Example: collisions
simplest algorithm is O(n2) : look at all pairs to see if they collide

need to communicate particles near boundary to neighboring processors


surface to volume effect means low communication

Challenge 2: load imbalance, if particles cluster

galaxies, electrons hitting a device wall

Need to check
for collisions
between regions
Feb-15-08

22

Parallelism in Far-Field Forces


Far-field forces involve all-to-all interaction
communication
Force depends on all other particles
Example: gravity in galaxies
Simplest algorithm is O(n2)
Just decomposing space does not help since every particle
apparently needs to visit every other particle

Use more clever algorithms to beat O(n2)


Price paid for higher performance is accuracy. One has
to carefully look whether this is acceptable.

Feb-15-08

23

Far-field forces: ParticleMesh Methods

Superimpose a regular mesh


Move particles to nearest grid point
Exploit fact that far-field satisfies a PDE that is easy to solve on a
regular mesh
FFT, Multigrid

Accuracy depends on how fine the grid is and uniformity of particles

Feb-15-08

24

Far-field forces: Tree Decomposition

Based on approximation
O(n log n) or O(n) instead of O(n2)
Forces from group of far-away particles simplifies

They resemble a single larger particle

Use tree; each node contains an approximation of descendents


Several algorithms can be applied

Barnes-Hut
Fast Multipole Method (FMM) of Greengard/Rohklin
Anderson

Irregular domain
decomposition
dynamic load
balancing problem

Feb-15-08

25

Ordinary Differential Equations

Feb-15-08

26

ODEs
Many systems can be modeled as
Coupled functions of continuous variables and their derivatives
with respect to one independent continuous variable (usually
time)

Example: electronic circuit


wires are links
nodes are connections between 2 or more wires
each link has resistor, capacitor, inductor or voltage source
Variables are related by Ohms Law, Kirchoffs Laws, etc.
[wiki ordinary differential equations]
Feb-15-08

27

Circuit Example
State of the system is represented by
node voltages
branch currents all at time t

Equations include

Kirchoffs circuit laws


Thevenins theorem
Ohms law
Capacitance
Inductance

$C " d /dt #1' $V ' $ 0 '


&
)*& )=& )
R ( % I ( %Vi (
% 1
R

Vi

I
V

Write as single!large system of ODEs

[wiki thevenin]
[wiki kirchhoff]

Feb-15-08

28

Structural Analysis Example

Another example is structural analysis in Civil Engineering

Variables are displacement of points in a building.


Newtons and Hooks (spring) laws apply.
Static modeling: exert force and determine displacement.
Dynamic modeling: apply continuous force (earthquake).
Eigenvalue problem: do the resonant modes of the building match an
earthquake. The system in these case (and many others) will be
sparse.

d2y
m " 2 = #k " y
dt

Feb-15-08

29

Solving ODEs
Usually ODE system matrices are sparse:
i.e., most array elements are 0.

Given a set of ODEs, two kinds of questions are:


Compute the values of the variables at some time t
Explicit methods
Implicit methods
Compute modes of vibration
Eigenvalue problem

Notation:
x(t): continuous time representation
x[i]: discrete computer representation

Feb-15-08

30

Solving ODEs: Explicit Methods


Assume ODE is first order:
Compute x(i " #t) = x[i]
at i=0,1,2,
Approximate dx(i " #t)
!/dt

dx
= x(t) = f (x) = a " x
dt

by

!
x[i + 1] = x[i] + "t # slope
Use slope at x[i]
t

!
! Explicit methods, e.g., (Forward) Eulers method.
Approximate dx /dt = a " x by

x[i + 1] " x[i]


= a $ x[i]
#t

[wiki numerical ordinary differential equations]


Feb-15-08

31

Second order ODE


Assume ODE is second order

d2x
dx
dx
=
f
(
,
x)
=
b
"
+ a" x
2
dt
dt
dt

Now take x1 = x and x2 = dx/dt


We get

dx 2
= b " x 2 + a " x1
dt
dx1
= x2
dt
which are two coupled first order equations

!
Feb-15-08

32

Any order ODE


We can generalize this and write it as: d x /dt = f (x) = A " x
where x, dx/dt, and f are now vectors and A is a matrix

!
A=

Feb-15-08

in the example

33

Solving ODEs: Explicit Methods

Assume ODE is dx /dt = f (x) = A " x , where A is a sparse matrix


Compute x(i " #t) = x[i]
at i=0,1,2,
Approximate dx(i " #t) /dt

x[i + 1] = x[i] + "t # slope x and dx/dt are now vectors !


!
Explicit methods, e.g., (Forward) Eulers method.
dx
x[i + 1] " x[i]
Approximate
!
= A " x by
= A $ x[i]
dt
#t
!
x[i + 1] = x[i] + "t # A # x[i]
i.e. sparse matrix-vector multiplication.
Tradeoffs:

Simple!algorithm: sparse!matrix vector multiply.


Stability problems: May need to take very small time steps, especially if
system is stiff (i.e. can change rapidly).
Feb-15-08

34

Solving ODEs: Implicit Methods

Assume ODE is dx /dt = f (x) = A " x , where A is a sparse matrix


Compute x(i " #t) = x[i]
at i=0,1,2,
Approximate dx(i " #t) /dt

!x[i + 1] = x[i] + "t # slope

Use slope at x[i+1]

Implicit!method, e.g., Backward Euler solver.

Approximate dx /dt = A " x

by

t+ t

x[i + 1] " x[i]


= A $ x[i + 1]
#t

i.e. we need to solve a sparse linear system

(I " #t $ A) $ x[i + 1] = x[i]


!
Trade-offs:
!
Larger time step possible: especially for stiff problems
! More difficult algorithm: need to do a sparse solve at each step

Feb-15-08

35

Solving ODEs

Explicit methods to compute solution(t)


E.g., Eulers method
Simple algorithm: sparse matrix vector multiply
May need to take very small time steps, especially if system is stiff (i.e.
can change rapidly)

Implicit methods to compute solution(t)


E.g., Backward Eulers Method
Larger time steps, especially for stiff problems
More difficult algorithm: solve a sparse linear system

All these reduce to sparse matrix problems


Explicit: sparse matrix-vector multiplication
Implicit: solve a sparse linear system
iterative solvers: use sparse matrix-vector multiplication

Feb-15-08

36

Parallel Sparse Matrix-vector


multiplication

y = A*x, where A is a sparse n x n matrix

P0

Questions

i: [j1,v1], [j2,v2],

which processors store


y[i], x[i], and A[i,j]
which processors compute
y[i] = sum (from j=1 to n) A[i,j] * x[j]
= (row i of A) * x
a sparse dot product

Partitioning

P2
P3
Most
problematic

Partition index set {1,,n} = N1 + N2 + + Np.


For all i in Nk, Processor k stores y[i], x[i], and row i of A
For all i in Nk, Processor k computes y[i] = (row i of A) * x
owner computes rule: Processor k compute the y[i]s it owns.
Feb-15-08

P1

37

Example

.
.
.
.
.
.
.
.

x
x
x
0
0
0
0
0
0

x
0
x
0
0
0
0
0
0

Feb-15-08

0
0
0
x
0
0
x
0
0

0
0
0
0
x
x
0
0
0

x
0
0
x
x
x
x
0
0

0
0
0
0
0
x
0
o
0

x
0
0
0
0
0
0
x
x

x
0
0
0
0
0
0
x
0

x
y .
.
.
.
.
.
.
.

x
x
x
0
0
0
0
0
0

x
0
x
0
0
0
0
x
0

0
0
0
x
0
0
x
0
0

0
0
0
0
x
x
0
0
0

x
0
x
x
x
x
x
0
0

0
0
0
0
0
x
0
o
0

x
0
0
0
0
0
0
x
x

x
0
0
0
0
0
0
x
0

38

Matrix Reordering

Ideal matrix structure for parallelism: block diagonal


p (number of processors) blocks, can all be computed locally.
few non-zeros outside these blocks, which require communication.

Can we reorder the rows/columns to achieve this?

P0
P1
=

P2
P3
P4

Feb-15-08

39

Effects of reordering [1]


2
1
2
3
4

1
x
x
x
x

2
x
x

3
x

4
x

3
1

1
1
1 x
2
3
4 x

Feb-15-08

x
x

x
x

4
x
x
x
x

3
4

40

Effects of reordering [2]


1
1 x
2
3 x
4

1
1 x
2 x
3
4

Feb-15-08

3
x

4
x

x
x

2
x
x

3
x
x

4
x
x

41

Goals of Reordering
Performance goals
balance load
balance storage
minimize communication

Some algorithms reorder for other reasons


Reduce # nonzeros in answer (fill)
Improve numerical properties

Feb-15-08

42

Graph Partitioning and Sparse Matrices


Relationship between matrix and graph
1

1 1

2 1

3
4

1
1

5 1
6

1
6

A good partition of the graph has


equal number of (weighted) nodes in each part (load balance)
minimum number of edges crossing between (minimize communication)

Can reorder the rows/columns of the matrix by putting all the nodes in one
partition together
Feb-15-08

43

Implicit Methods and Eigenproblems


Direct methods (Gaussian elimination)
Called LU Decomposition, because we factor A = L*U.
Future lectures will consider both dense and sparse cases.
More complicated than sparse-matrix vector multiplication.

Iterative solvers
Will discuss several of these in future.
Jacobi, Successive over-relaxation (SOR) , Conjugate Gradient
(CG), Multigrid,...
Most have sparse-matrix-vector multiplication in kernel.

Eigenproblems
Also depend on sparse-matrix-vector multiplication, direct
methods.
Feb-15-08

44

Summary for Systems of ODEs


Computation
Sparse matrix-vector operations

Communication
Pattern may be arbitrary dependent on graph.
No spatial discretization, so no natural grid exists

Important considerations for


parallelization:
These are problems without meshes that are to be
solved with techniques from linear algebra
It is not trivial how one can optimally assign
different parts of graph to different processes
Feb-15-08

45

Partial Differential Equations

Feb-15-08

46

Continuous Variables, Continuous


Parameters
Examples of such systems include
Parabolic (time-dependent) problems:
Heat flow: Temperature(position, time)
Diffusion: Concentration(position, time)

Elliptic (steady state) problems:


Electrostatic or Gravitational Potential: Potential(position)

Hyperbolic problems (waves):


Quantum mechanics: Wave-function(position,time)

Many problems combine features of above


Fluid flow: Velocity,Pressure,Density(position,time)
Elasticity: Stress,Strain(position,time)
Feb-15-08

47

Terminology
Term hyperbolic, parabolic, elliptic, come from special
cases of the general form of a second order linear PDE
# 2u
# 2u
# 2u
#u
#u
a" 2 + b"
+ c" 2 + d" + e" + f = 0
#x
#x#t
#t
#x
#t
where t is time

!Analog to solutions of general quadratic equation


a " x 2 + b " xy + c " x 2 + d " x + e " y + f = 0

Notation
!

Feb-15-08

$ % $ $ $ (
"#
#' ,
,
*
$x i & $x1 $x 2 $x 3 )
48

Example: Flow problems


Advection

The Navier Stokes equations


v = velocity vector
RE= Renolds number

"v
1 2
=
# v $ v % #v + f x $ #p
"t RE
#%v =0
Diffusion
Zero Divergence

Pressure

External Forces

!
Feb-15-08

Courtesy: R. Westerman TU-Munich

49

Example: Deriving the Heat Equation


BAR

x-h

x+h

Consider a simple problem


A bar of uniform material, insulated except at ends
Let u(x,t) be the temperature at position x at time t
Heat travels from x-h to x+h at rate proportional to:
u(x # h,t) # u(x,t) u(x,t) # u(x + h,t)
#
du(x,t)
h
h
= C"
dt
h
2
As h0, we get the heat equation: "u(x,t) = C # " u(x,t)
"t

Feb-15-08

"x 2

50

!
!

Details of the Explicit Method for Heat

From experimentation (physical observation) we have:


u(x,t)/t = 2u(x,t)/x2

(assume C = 1 for simplicity)

Discretize time and space and use explicit approach (as described
for ODEs) to approximate derivative:

u(x,t + 1) " u(x,t) u(x " h,t) " 2 $ u(x,t) + u(x + h,t)
=
#t
h2
u(x,t + 1) " u(x,t) = (#t /h 2 )(u(x " h,t) " 2 $ u(x,t) + u(x + h,t))
u(x,t + 1) = u(x,t) + (#t /h 2 )(u(x " h,t) " 2 $ u(x,t) + u(x + h,t))

Let z = t/h2
u(x,t + 1) = z " u(x # h,t) + (1# 2z)u(x,t) + z " u(x + h,t)
By changing variables (x to j and t to i):
u[ j,i + 1) = z " u[ j # 1,i] + (1# 2z)u[ j,i] + z " u[ j + i,i]

Feb-15-08

51

Explicit Solution of the Heat Equation


Use finite differences with u[j,i] as the heat at
time t= i*t (i = 0,1,2,) and position x = j*h (j=0,1,,N=1/h)
initial conditions on u[j,0]
boundary conditions on u[0,i] and u[N,i]

At each timestep i = 0,1,2,...


For j=1 to N-1

u[j,i+1]= z*u[j-1,i]+ (1-2*z)*u[j,i]+z*u[j+1,i]

with z = t/h2

This corresponds to
matrix-vector multiply
nearest neighbors on grid
Feb-15-08

t=5
t=4
t=3
t=2
t=1
t=0
u[0,0] u[1,0] u[2,0] u[3,0] u[4,0] u[5,0]
52

Stencil Template
Multiplying by a tridiagonal matrix at each step
1-2z
z

L=

z
1-2z
z

Graph and 3 point stencil

z
1-2z
z

z
1-2z
z

1-2z

1-2z

u[:,i + 1] = L " u[:,i]

For a 2D mesh (5 point stencil) the matrix is


pentadiagonal
More on the matrix/grid views later
Feb-15-08

53

Parallelism in Explicit Method for PDEs

Partitioning the space (x) into p equal chunks


good load balance (assuming large number of points relative to p)
minimized communication (only p chunks)

Generalizes to
multiple dimensions
arbitrary graphs (= sparse matrices)

Problem with explicit approach


numerical instability
solution blows up eventually if z = t/h2 > .5
need to make the timesteps very small when h is small: t < .5*h2

Feb-15-08

54

Instability in Solving the Heat Equation


Explicitly

Feb-15-08

55

Implicit Solution of the Heat Equation


Discretize time and space and use implicit approach
(backward Euler) to approximate derivative:
u(x,t + 1) " u(x,t) u(x " h,t + 1) " 2 $ u(x,t + 1) + u(x + h,t + 1)
=
#t
h2
u(x,t) = u(x,t + 1) + (#t /h 2 )(u(x " h,t + 1) " 2 $ u(x,t + 1) + u(x + h,t + 1))

Let z = t/h2 and change variables (t to j and x to i)


!

u[:,i] = (I " z # L) # u[:,i + 1]

Where I is identity and


L is Laplacian
Feb-15-08

L=

-1

-1

-1

-1

-1

-1

-1

-1

2
56

Implicit Solution
The previous slide used Backwards Euler, but using the
trapezoidal rule gives better numerical properties.
This turns into solving the following equation

(I + (z /2) " L) " u[:,i + 1] = (I # (z /2) " L) " u[:,i]


Here I is the identity matrix and L is:

L=

-1

-1

-1

-1

-1

-1

-1

-1

Graph and stencil

-1

-1

i.e., essentially solving Poissons equation in 1D


Feb-15-08

57

2D Implicit Method
Similar to the 1D case, but the matrix L is now
4

-1

-1

-1

-1

-1

-1

L=

-1

Graph and stencil

-1
-1
4

-1

-1

-1

-1

-1

-1

-1

-1

-1
-1

-1

-1
4

-1

-1

-1

-1

-1
4

-1

-1

Multiplying by this matrix (as in the explicit case) is


simply nearest neighbor computation on 2D grid
To solve this system, there are several techniques
Feb-15-08

58

Solvers for Poisson 2D


There are many types of numerical solvers
Each has different properties regarding
parallel time complexity
number of processor required
storage occupation

Which one to pick depends heavily on matrix structure


and dynamic range of element values
more details in coming lectures

Feb-15-08

59

Summary of Approaches to Solving PDEs


As with ODEs, either explicit or implicit approaches are
possible
Explicit, sparse matrix-vector multiplication
Implicit, sparse matrix solve at each step
Direct solvers are hard (more on this later)
Iterative solves turn into sparse matrix-vector multiplication

Grid and sparse matrix correspondence:


Sparse matrix-vector multiplication is nearest neighbor
averaging on the underlying mesh

Not all nearest neighbor computations have the same


efficiency
Factors are the mesh structure (nonzero structure) and the
number of calculation per mesh point
Feb-15-08

60

Lab assignments [1]


Poisson equation in 2-d
regular grid
no time, various algorithms (Red-Black Gauss-Seidel, CG)
static regular partitioning of grid

Finite elements in 2-d

irregular grid
no time, various algorithms (Red-Black Gauss-Seidel, CG)
static irregular partitioning of grid
grid generation and adjustment, load imbalance.

Interacting particles in 2-d

Feb-15-08

no grid
time-stepper (Verlet algorithm, explicit)
particle-in-cell method for efficient parallelization
dynamic repartitioning of particles over processes
61

Lab Assignments [2]


All software is written in C
No emphasis on language constructs
All parallel programs use the MPI library for communication
All programs run on the DAS-3
Understand the structure of a Message Passing program.
Build parallel program
Investigation and optimization of performance are
important

What is the program doing all the time?


Why do various phases take so long?
How does execution time changes if the problem size goes up?
How do you compare the efficiency of one algorithm with that of
Feb-15-08 another?
62

Comments on practical meshes

Regular 1D, 2D, 3D meshes


Important as building blocks for more complicated meshes

Practical meshes are often irregular


Composite meshes, consisting of multiple bent regular meshes joined
at edges
Unstructured meshes, with arbitrary mesh points and connectivities
Adaptive meshes, change resolution during solution process to put
computational effort where it is needed

Feb-15-08

63

Parallelism in Regular meshes


Computing a Stencil on a regular mesh
need to communicate mesh points near boundary to
neighboring processors.
Often done with ghost regions
Surface-to-volume ratio keeps communication down, but
Still may be problematic in practice

Implemented using
ghost regions.
Adds memory overhead

Feb-15-08

64

Adaptive Mesh Refinement (AMR)

Adaptive mesh around an explosion


Refinement done by calculating errors
Parallelism
Mostly between patches, dealt to processors for load balance
May exploit some within a patch (SMP)
Feb-15-08

65

Adaptive Mesh

Shock waves in a gas dynamics using AMR (Adaptive Mesh Refinement)66


See: http://www.llnl.gov/CASC/SAMRAI/

Feb-15-08

Composite mesh from a


mechanical structure

Feb-15-08

67

Converting the mesh to a matrix

Feb-15-08

68

Effects of Reordering on Gaussian Elimination

Feb-15-08

69

Irregular mesh: NASA Airfoil in 2D

Feb-15-08

70

Challenges of irregular meshes (and a


few solutions)
How to generate them in the first place
Triangle, a 2D mesh partitioner by Jonathan Shewchuk
3D is harder !

How to partition them


ParMetis, a parallel graph partitioner

How to design iterative solvers


PETSc, a Portable Extensible Toolkit for Scientific Computing
Prometheus, a multigrid solver for finite element problems on
irregular meshes

How to design direct solvers


SuperLU, parallel sparse Gaussian elimination

These are challenges to do sequentially, the more so in


parallel
Feb-15-08
71

Summary PDEs
Computation
Sparse matrix-vector operations
Generation/calculation and storage of matrix elements

Communication
Partitioning: mesh or matrix

Problems may be regular/irregular, dynamic/static


Important for class:
Problems with meshes that need to be solved with
clever techniques
How to assign different parts of mesh to different
processes
Feb-15-08

72

Solving ODEs: Eigensolvers

Computing modes of vibration: finding

eigenvectors.

eigenvalues and

Seek solution of d (x) /dt = A " x of form x(t) = sin( f " t) " x 0
, where x0 is a constant vector.
Plug in to get -f2 *x0 = A*x0, so that f2 is an eigenvalue and
x0 is an eigenvector of A.
!
!
Solution schemes reduce either to sparse-matrix multiplication,
or solving sparse linear systems.

Feb-15-08

73

Scheduling Asynchronous
Circuit Simulation
Optimization strategy if process is ready to go to next
event, but not sure if another event that must be
processed earlier will yet arrive

Conservative:

Only simulate up to the minimum time stamp of its inputs

Speculative:

Assume no new inputs will arrive and keep simulating, instead of waiting
May need to backup if assumption wrong

More relevant for compiler optimization, but not so for


design of parallel algorithms

Feb-15-08

74

Example: Circuit Simulation


Circuits are simulated at many different levels
Level
Instruction level

Primitives
Instructions

Examples
SimOS, SPIM

Cycle level

Functional units

VIRAM-p

Register Transfer
Level (RTL)
Gate Level

Register, counter,
MUX
Gate, flip-flop,
memory cell
Ideal transistor

VHDL

Resistors,
capacitors, etc.
Electrons, silicon

Spice

Switch level
Circuit level
Device level

Feb-15-08

Thor
Cosmos

75

Relation of Poisson to Gravity,


Electrostatics

Poisson equation arises in many problems


E.g., force on particle at (x,y,z) due to particle at 0 is
-(x,y,z)/r3, where r = sqrt(x2 +y2 +z2 )
Force is also gradient of potential V = -1/r
= -(d/dx V, d/dy V, d/dz V) = -grad V
V satisfies Poissons equation (try working this out!)

Feb-15-08

76

Algorithms for 2D Poisson


Equation (size N)
Algorithm
Dense LU
Band LU
Jacobi
Explicit Inv.
Conj.Grad.
RB SOR
Sparse LU
FFT
Multigrid
Lower bound

Serial
N3
N2
N2
N2
N3/2
N3/2
N3/2
N*log N
N
N

PRAM
N
N
N
log N
N1/2 *log N
N1/2
N1/2
log N
log2 N
log N

Memory
N2
N3/2
N
N2
N
N
N*log N
N
N
N

#Procs
N2
N
N
N2
N
N
N
N
N
N

PRAM is an idealized parallel model with zero cost communication


Reference: James Demmel, Applied Numerical Linear Algebra, SIAM, 1997.
Feb-15-08

77

Overview
of Algorithms
Sorted in two orders (roughly):
from slowest to fastest on sequential machines.
from most general (works on any matrix) to most specialized (works on matrices like L).
Dense LU: Gaussian elimination; works on any N-by-N matrix.
Band LU: Exploits the fact that L is nonzero only on sqrt(N) diagonals nearest main diagonal.
Jacobi: Essentially does matrix-vector multiply by L in inner loop of iterative algorithm.
Explicit Inverse: Assume we want to solve many systems with L, so we can precompute and
store inv(L) for free, and just multiply by it (but still expensive).
Conjugate Gradient: Uses matrix-vector multiplication, like Jacobi, but exploits mathematical
properties of L that Jacobi does not.
Red-Black SOR (successive over-relaxation): Variation of Jacobi that exploits yet different
mathematical properties of L. Used in multigrid schemes.
LU: Gaussian elimination exploiting particular zero structure of L.
FFT (fast Fourier transform): Works only on matrices very like L.
Multigrid: Also works on matrices like L, that come from elliptic PDEs.
Lower Bound: Serial (time to print answer); parallel (time to combine N inputs).
Details in class notes and www.cs.berkeley.edu/~demmel/ma221.

Feb-15-08

78

Das könnte Ihnen auch gefallen