10 views

Original Title: Automatic Differentiation

Uploaded by gorot1

- Matlabtutorial8 Rootss
- Resonant Converter
- Past Exams 1
- MATH112_Mid1_2011Spring
- Batina, Implicit Upwind-euler
- mba syllabus
- UT Dallas Syllabus for math2413.003.11s taught by Wieslaw Krawcewicz (wzk091000)
- pdelta
- MAT 191
- KEC
- scholarschallenge4mazen
- calculating Arakawa Jacobian
- Midterm 01 Practice
- 1_5 ANALYZING GRAPHS OF EQNS.pdf
- Maths Presentation
- 002Syllabus2015
- 4 implicit differentiation alexis ellie 1
- Mws Gen Dif Txt Discrete
- pcimod3630analysisandautooptimization-150825062613-lva1-app6892.pdf
- Design and Off Design Simulation of Complex Energy Systems

You are on page 1of 22

H avard Berland

Department of Mathematical Sciences, NTNU

1 / 21

Abstract

Automatic dierentiation is introduced to an audience with basic mathematical prerequisites. Numerical examples show the deency of divided dierence, and dual numbers serve to introduce the algebra being one example of how to derive automatic dierentiation. An example with forward mode is given rst, and source transformation and operator overloading is illustrated. Then reverse mode is briey sketched, followed by some discussion.

(45 minute talk)

2 / 21

Automatic dierentiation (AD) is software to transform code for one function into code for the derivative of the function. Automatic dierentiation

f(x) {...};

human programmer

df(x) {...};

y = f (x )

y = f (x )

3 / 21

Scientic code often uses both functions and their derivatives, for example Newtons method for solving (nonlinear) equations; nd The Newton iteration is xn+1 = xn f (xn ) f (xn ) x such that f (x ) = 0

But how to compute f (xn ) when we only know f (x )? Symbolic dierentiation? Divided dierence? Automatic dierentiation? Yes.

4 / 21

Divided dierences

f (x + h) approximate exact

f (x ) x

x +h

5 / 21

error =

f (x + h ) f (x ) h

3x 2 useless accuracy

ni te p rec isio n e ula r rro

ce nt er e d

form

ce

di e r

6 / 21

en

Dual numbers

Extend all numbers by adding a second component, x x +x d d is just a symbol distinguishing the second component, analogous to the imaginary unit i = 1. But, let d2 = 0, as opposed to i2 = 1. Arithmetic on dual numbers: d) + (y + y d) = x + y + (x +y )d (x + x

=0

(x + x d) (y + y d) = xy + x y d + xy d+x y d2

=0

(x + x d) (y + y d) = xy + x y d + xy d+x y d2 = xy + (x y + xy )d (x + x d) = x x d, 1 1 x = 2d x +x d x x (x = 0)

7 / 21

Let P (x ) = p0 + p1 x + p2 x 2 + + pn x n and extend x to a dual number x + x d. Then, P (x + x d) = p0 + p1 (x + x d) + + pn (x + x d)n = p0 + p1 x + p2 x 2 + + pn x n +p1 x d + 2p2 x x d + + npn x n1 x d = P (x ) + P (x )x d x may be chosen arbitrarily, so choose x = 1 (currently). The second component is the derivative of P (x ) at x

8 / 21

Similarly, one may derive d) = sin(x ) + cos(x ) x d sin(x + x cos(x + x d) = cos(x ) sin(x ) x d

d) e (x + x = ex + ex x d x log(x + x d) = log(x ) + d x = 0 x x x +x d = x + d x = 0 2 x

9 / 21

Derived from dual numbers: A function applied on a dual number will return its derivative in the second/dual component. We can extend to functions of many variables by introducing more dual components: f (x1 , x2 ) = x1 x2 + sin(x1 ) extends to 1 d1 , x2 + x 2 d2 ) = f (x1 + x (x1 + x 1 d1 )(x2 + x 2 d2 ) + sin(x1 + x 1 d1 ) = x1 x2 + (x2 + cos(x1 ))x 1 d1 + x1 x 2 d2 where di dj = 0.

10 / 21

Computer code for f (x1 , x2 ) = x1 x2 + sin(x1 ) might read

Original program w1 w2 w3 w4 w5 = x1 = x2 = w1 w2 = sin(w1 ) = w3 + w4 Dual program w 1 w 2 w 3 w 4 w 5 =0 =1 =w 1 w2 + w1 w 2 = 0 x2 + x1 1 = x1 = cos(w1 )w 1 = cos(x1 ) 0 = 0 =w 3 + w 4 = x1 + 0 = x1

and

f = x1 x2

The chain rule f f w5 w3 w2 = x2 w 5 w 3 w 2 x2 ensures that we can propagate the dual components throughout the computation.

11 / 21

Our current procedure: 1. Decompose original code into intrinsic functions 2. Dierentiate the intrinsic functions, eectively symbolically 3. Multiply together according to the chain rule How to automatically transform the original program into the dual program? Two approaches, Source code transformation (C, Fortran 77) Operator overloading (C++, Fortran 90)

12 / 21

function.c

double f ( double x1 , double x2 ) { double w3 , w4 , w5 ; w3 = x1 * x2 ; w4 = sin ( x1 ); w5 = w3 + w4 ;

return w5 ; }

function.c

13 / 21

diff function.c

double* f ( double x1 , double x2 , double dx1, double dx2) { double w3 , w4 , w5, dw3, dw4, dw5, df[2]; w3 = x1 * x2 ; dw3 = dx1 * x2 + x1 * dx2; w4 = sin ( x1 ); dw4 = cos(x1) * dx1; w5 = w3 + w4 ; dw5 = dw3 + dw4; df[0] = w5; df[1] = dw5; return df; }

function.c

diff function.o

C compiler

13 / 21

Operator overloading

function.c++

Number f ( Number x1 , Number x2 ) { w3 = x1 * x2 ; w4 = sin ( x1 ); w5 = w3 + w4 ; return w5 ; }

14 / 21

Source code transformation: Possible in all computer languages Can be applied to your old legacy Fortran/C code. Allows easier compile time optimizations. Source code swell More dicult to code the AD tool Operator overloading: No changes in your original code Flexible when you change your code or tool Easy to code the AD tool Only possible in selected languages Current compilers lag behind, code runs slower

15 / 21

Forward mode AD

We have until now only described forward mode AD. Repetition of the procedure using the computational graph: f (x1 , x2 ) w 5 = w 3 + w 4 Forward propagation of derivative values w5 w 4 = cos(w1 )w 1 w4 sin w 1 x1 d

16 / 21

+ w 3 = w 1 w2 + w1 w 2 w3 w 2 x2

seeds, w 1, w 2 {0, 1}

w 1

Reverse mode AD

The chain rule works in both directions. The computational graph is now traversed from the top. f (x1 , x2 ) Backward propagation of derivative values

=w f 5 = 1 (seed)

w5

w5 w 4 = w 5 5 1 w4 = w w4 sin a =w w 1 4 cos(w1 )

+

w5 w 3 = w 5 5 1 w3 = w

w3

w3 w 2 = w 3 3 w1 w2 = w

b w 1

=w 3 w2

x1 x 1 =

a w 1

x2 x 2 = x1 2 =w

17 / 21

b w 1

= cos(x1 ) + x2 d

Jacobian computation

Given F : Rn Rm and the Jacobian J = DF (x) Rmn .

f1 x1 f1 xn

J = DF (x) =

fm x1 fm xn

One sweep of forward mode can calculate one column vector of the Jacobian, J x , where x is a column vector of seeds. One sweep of reverse mode can calculate one row vector of the Jacobian, yJ , where y is a row vector of seeds. Computational cost of one sweep forward or reverse is roughly equivalent, but reverse mode requires access to intermediate variables, requiring more memory.

18 / 21

Reverse mode AD is best suited for F : Rn R Forward mode AD is best suited for G : R Rm

Forward and reverse mode represents just two possible (extreme) ways of recursing through the chain rule. For n > 1 and m > 1 there is a golden mean, but nding the optimal way is probably an NP-hard problem.

?

19 / 21

Discussion

Accuracy is guaranteed and complexity is not worse than that of the original function. AD works on iterative solvers, on functions consisting of thousands of lines of code. AD is trivially generalized to higher derivatives. Hessians are used in some optimization algorithms. Complexity is quadratic in highest derivative degree. The alternative to AD is usually symbolic dierentiation, or rather using algorithms not relying on derivatives. Divided dierences may be just as good as AD in cases where the underlying function is based on discrete or measured quantities, or being the result of stochastic simulations.

20 / 21

Applications of AD

Newtons method for solving nonlinear equations Optimization (utilizing gradients/Hessians) Inverse problems/data assimilation Neural networks Solving sti ODEs For software and publication lists, visit www.autodiff.org Recommended literature: Andreas Griewank: Evaluating Derivatives. SIAM 2000.

21 / 21

- Matlabtutorial8 RootssUploaded byRudy Dwi Prasetyo
- Resonant ConverterUploaded bySijo Augustine
- Past Exams 1Uploaded byapi-3807258
- MATH112_Mid1_2011SpringUploaded byexamkiller
- Batina, Implicit Upwind-eulerUploaded byDanielle Watson
- mba syllabusUploaded byMuzamil Yassin
- UT Dallas Syllabus for math2413.003.11s taught by Wieslaw Krawcewicz (wzk091000)Uploaded byUT Dallas Provost's Technology Group
- pdeltaUploaded byKwan Hau Lee
- MAT 191Uploaded bykiller0514
- KECUploaded bysappy2shail
- scholarschallenge4mazenUploaded byapi-287721094
- calculating Arakawa JacobianUploaded bymak112005gmail
- Midterm 01 PracticeUploaded byEvan Lee
- 1_5 ANALYZING GRAPHS OF EQNS.pdfUploaded byMahtab Alam
- Maths PresentationUploaded byCherry Meer
- 002Syllabus2015Uploaded byspandanroy2202
- 4 implicit differentiation alexis ellie 1Uploaded byapi-320552898
- Mws Gen Dif Txt DiscreteUploaded bysteal88
- pcimod3630analysisandautooptimization-150825062613-lva1-app6892.pdfUploaded byparmesh
- Design and Off Design Simulation of Complex Energy SystemsUploaded byJoao Baltagesti
- Power Control Algorithms in Wireless CommunicationUploaded bygzb012
- Reliability-Based Optimization: Small Sample Optimization StrategyUploaded byAnnisa Rakhmawati
- Module 19 - Further Calculus 1 (self study)Uploaded byapi-3827096
- 494-115Uploaded bySwagat Pradhan
- 508-391.pdfUploaded byLê Thị Phương Viên
- List of Open Electives for Institute CBCS BatchUploaded byPradumn Suryakar
- Raw Mix Prep 3bhs 211545 Reva LrUploaded byHazem Diab
- r05410301 Operations ResearchUploaded byvarun
- Calculus III Lecture NotesUploaded byAhmad Al-Assady
- lec_01__ORUploaded byTapu mojumder

- DEKLERK.2006.LMFBSUploaded bygorot1
- Substructuring Tutorial Imac2010Uploaded bygorot1
- Forced Harmonic Response Analysis of Nonlinear StructuresUploaded bygorot1
- 1-s2.0-S0022460X11002677-mainUploaded bygorot1
- RtsUploaded bygorot1
- 1-s2.0-S0022460X10003792-mainUploaded bygorot1
- Ritz Method and EmaUploaded bygorot1
- lt7-vanderjeugtUploaded bygorot1
- j2006_AnnRevCont.pdfUploaded bygorot1
- 1-s2.0-S0888327009003768-mainUploaded bygorot1
- 44522365 Application of Lie s Theory of Ordinary and PDEsUploaded bygorot1
- 05160346Uploaded bygorot1
- 1-s2.0-S0888327012000702-mainUploaded bygorot1
- 2012 Rahimi MSc ThesisUploaded bygorot1
- j2003_ASME-JDSMC.pdfUploaded bygorot1
- 2013 Rtd 005 Eurostars2 EnUploaded bygorot1
- imm4000.pdfUploaded bygorot1
- InTech-Transition_modelling_for_turbomachinery_flows.pdfUploaded bygorot1
- The Matrix Cookbook2012Uploaded bymehdicheraghi506
- ElasticityUploaded bygorot1
- j2005_ASME-JVA.pdfUploaded bygorot1
- karel-milos.pdfUploaded bygorot1
- Project FormUploaded byNilotpal Paul
- DirectMethod BeamsUploaded byougueway
- IMECS2009_pp1221-1224.pdfUploaded bygorot1
- isma2010_0534.pdfUploaded bygorot1
- IWSHM'03.pdfUploaded bygorot1
- Eureka a Eurostars Kc u Hajku 26-3-2013Uploaded bygorot1
- IJIAS-13-035-01.pdfUploaded bygorot1
- 98491486 Truesdell the Non Linear Field Theories of MechanicsUploaded bygorot1

- DeLanda on DeleuzeUploaded byBernardo Oliveira
- Kenneth Scheel InterviewUploaded byDarcie Davis, The Game Gal
- Design of aDesign of a Front–End Amplifier for the Maximum Power Delivery and Required Noise by HBMO with Support Vector Microstrip Model Front–End Amplifier for the MaximumUploaded byPeyman Mahouti
- BiodiversityReport Guidelines RubricUploaded bykatwasnothere
- componentes dinámicos del poderUploaded byalcs65
- Subsidiarity in EU LawUploaded byApollyon
- NZJP Article Personality and Ethnicity (2005)Uploaded byMatheus Carvalho Batista
- HEave behavior of granular pile anchor foundation PLAXIS.pdfUploaded byAnonymous 5exSer
- morrison.pdfUploaded byConnie Sianipar
- 1917886Uploaded byBrainy12345
- Expert PDF Pro v5 1 Build 200 0Uploaded byDanas
- What Are the Strongest Arguments for Moral Realism and Are They ConclusiveUploaded bysandimaldini
- Fiber Optics and Laser Instruments - Lecture Notes, Study Material and Important Questions, AnswersUploaded byM.V. TV
- Graphing on your TI83 Handout1Uploaded byKhang Huynh
- Rekabentuk Dan Analisis ProdukUploaded byhelmi
- Lab ManUploaded byapi-3693527
- Granite shear strengthUploaded bydafo407
- Scip y LecturesUploaded byv
- OpenFOAM IntroUploaded bytouhid82
- All About HiWiUploaded byrahehaqguests
- Kaneva Popescu-BU NatId Lite-InternationalJCultiuralStudies2011Uploaded byTani Andreeva
- Ahmet; Akin, Umran; (2015) - Suma Psicologica. Pág 37 a 43Uploaded byYEZUGAN
- HookedUploaded byTaha Nasir
- Full 64256 Ok OkUploaded bymghgol
- Advaita Vedanta and Jungian PsychologyUploaded byjeremiezulaski
- Abbasid CaliphateUploaded byClarence Chewe Mulenga
- Common Microsoft Access Database ErrorsUploaded byVaruna Prabhakar
- Latex BeamerUploaded bykdsarode
- UntitledUploaded byapi-287810762
- Basic Human NeedsUploaded byRicha Goswami