Beruflich Dokumente
Kultur Dokumente
Computational Science
Boyana Norris
Argonne National Laboratory
http://www.mcs.anl.gov/~norris
Outline
Automatic differentiation
Applications
in optimization
How AD works
Ice thickness for the standard (left) and tuned (right) parameter
values, with actual observations at two locations indicated.
March 15, 2005 6
Optimization Problems
Often we look for extreme, or optimum, values that a
function has on a given domain. More formally:
Solution
with the minimal
area that satisfies
Dirichlet boundary
conditions and is
constrained to lie
above a solid
plate.
Error
March 15, 2005 9
Example: Minimum Surface (Cont.)
Solution
Error
March 15, 2005 10
We can compute derivatives via:
Analytic code
By hand
Automatic differentiation
Numerical approximation: finite differencing (FD).
For finite differences, recall:
df f ( x + h) - f ( x )
lim
dx h 0 h
d f f ( x + h) + 2 f ( x ) - f ( x - h)
2
2 2
dx h
March 15, 2005 11
Why use AD?
f ( g ( x)) g ( x) f ( g ( x))
( f1 ( f 2 (( f N ( x))))) f N f N -1 ( f N ( x)) f N - 2 ( f N -1 ( f N ( x)))
f1( f 2 (( f N ( x))))
March 15, 2005 15
A Simple Example (Fortran)
Original program Differentiated program
x = 3.14159265/4.0
x = 3.14159265/4.0
dxdx = 1.0 ! Initialize “seed matrix”
a = sin(x) a = sin(x)
dadx = cos(x)*dxdx ! TL/CR
b = cos(x)
b = cos(x)
t = a/b dbdx = -sin(x)*dxdx ! TL/CR
t = a/b
Key
dtda = 1.0/b ! TL
t
dtdx: dtdb = -a/(b*b) ! TL
x
CR: Chain rule dtdx = dtda*dadx + dtdb*dbdx ! CR
TL: Table lookup
March 15, 2005 16
Modes of AD
Forward mode
Mode used in simple example
Propagates derivative vectors, often denoted u or g_u
Derivative vector u contains derivatives of u with respect to
independent variables
Time and storage proportional to vector length (# indeps)
Reverse (or adjoint) mode
Propagates adjoints, denoted ū or u_bar
Adjoint ū contains derivatives of dependent variables with
respect to u
Propagation starts with dependent variables—must reverse
flow of computation
Time proportional to adjoint vector length (# dependents)
Storage proportional to number of operations
Because of this limitation, often applied to subprograms
Control
AD
Files Compile
Support
& Link
Libraries
User’s
Derivative Derivative
Driver Program
var_2 var_3
f ( g ( x)) : , g ( x) : , n~ n
n~
n m ~
m
Global-to-local
scatter of ghost values ADIFOR or ADIC
Coded manually;
Seed matrix can be automated
initialization
Parallel Hessian
assembly User code PETSc code AD-generated code
March 15, 2005 30
Outline
Automatic differentiation
Components for scientific computing
Introduction
Example applications
Performance evaluation and modeling
Summary
CCA
Common Component Architecture
Architectures
Components
Data Redistribution
Parallel I/O
Aerodynamics
Fusion
Science
Industry
?CA
C
March 15, 2005 37
CCA Delivers
Performance
Local
No CCA overhead within components
Small overhead between components
Small overhead for language interoperability Maximum 0.2% overhead for CCA vs
Be aware of costs & design with them in mind native C++ code for parallel molecular
dynamics up to 170 CPUs
Small costs, easily amortized
Parallel
No CCA overhead on parallel computing
Use your favorite parallel programming model
Supports SPMD and MPMD approaches
Distributed (remote)
No CCA overhead – performance depends
on networks, protocols
CCA frameworks support OGSA/Grid
Services/Web Services and other Aggregate time for linear solver
component in unconstrained minimization
approaches problem w/ PETSc
March 15, 2005 38
Overhead from Component
Invocation
Invoke a component with
different arguments Function arg
Array type f77 Component
Complex
Double Complex
Compare with f77 method
invocation Array 80 ns 224ns
Environment
500 MHz Pentium III
Linux 2.4.18
GCC 2.95.4-15
Complex 75ns 209ns
Components took 3X longer
Ensure granularity is
appropriate! Double
Paper by Bernholdt, Elwasif, complex 86ns 241ns
Kohl and Epperly
March 15, 2005 39
Language interoperability: what is
so hard?
Native
f77 cfortran.h
SWIG
C f90
JNI
Siloon
C++ Python
Chasm
Java Platform
Dependent
March 15, 2005 40
SIDL/Babel makes all supported
languages peers
f77 This is not a
Lowest Common
Denominator
C f90 Solution!
C++ Python
Java
March 15, 2005 41
CCA Concepts: Components and Ports
Components provide or use OptimizerPort FunctionPort FunctionPort
one or more ports
GradientPort
Components include some Objective Function
code which interacts with a HessianPort
CCA framework GradientPort
Optimization Algorithm
Frameworks provide
services, such as Function Gradient
component instantiation and
port connection
HessianPort
Reused
TAO Solver
Driver/Physics
Long term
Is a more organized (but not too restrictive)
environment for scientific software lifecycle
development possible/desirable?
March 15, 2005 62
Multimethod linear solver components
Physics Nonlinear Linear
Solver Solver
Mesh
Physics Nonlinear
Solver
Linear
Performance Solver B
Checkpointing
Monitor
Linear
Solver C
March 15, 2005 63
AD as Component Factory
Both NEOS and
PETSc rely on a well-
Function defined function
interface in order to
provide derivatives
via AD
Extend this idea to
components
AD Tool
Jacobian