Beruflich Dokumente
Kultur Dokumente
1 / 18
Outline
Results Remarks
2 / 18
Motivation
Global carbon cycle, CO2 uptake of the worlds oceans Parameter estimation in biogeochemical models
80 60 Latitude [degrees] 40 20 0 -20 -40 -60 -80 -180 -135 -90 -45 0 45 Longitude [degrees] 90 135 180 2.2 2.0 1.8 1.6 1.4 1.2 1.0 0.8 0.6
3 Simulated concentration of nutrients (phosphate, P O4 ) at surface layer in 3 m mol P/m . The longitudinal and latitudinal resolution is at 1.0 .
3 / 18
Motivation contd
Ocean Biogeochemical Dynamics [Sarmiento and Gruber, 2006] System of transport equations for biogeochemical tracers: yi = (yi ) (v yi ) + qi (y, u), t
diusion advection reaction
i = 1, . . .
Climatological (annual periodic) forcing Solution is steady annual periodic state (equilibrium) Involves integration over thousands of model years
4 / 18
Motivation contd
Transport Matrix Method [Khatiwala et al., 2005] Discretized and approximated problem: yj +1 = A imp,j (A exp,j yj + t qj (yj , u)),
transport matrices
j = 0, . . .
Monthly averaged matrices provided Interpolation needed: A imp,j = j Ai[i,j ] + j Ai[i,j ] A exp,j = j Ae[i,j ] + j Ae[i,j ]
5 / 18
Algorithm
Assuming: 1 year 360 days and t 3 h y = y0 2: repeat 3: for j = 1, . . . , 2880 do 4: evaluate biogeochemical model: yq = qj (y, u) 5: interpolate matrices to time step j 6: perform explicit step: y = Aexp,j y 7: perform implicit step: y = Aimp,j (y + t yq ) 8: end for 9: until steady annual cycle is reached
1:
6 / 18
Operations
Interpolate matrices:
MatCopy(); MatScale(); MatAXPY();
7 / 18
Porting to GPU
Software:
Metos3D [Piwonski and Slawig, 2013] github.com/metos3d PETSc based implementation in C [Balay et al., 1997] Using PETSc matrix and vector operations Coupling biogeochemical models implemented in Fortran
Objectives:
Do not change Fortran implementation at all Do not change C implementation, if possible
8 / 18
Biogeochemical model
Fortran implementation:
Obtained PGI CUDA Fortran compiler license Using wrapper le model.CUF with Fortran kernels Including original through: #include "model.F" Using macro to change subroutine to attributes(device) subroutine
9 / 18
Model Driver
Objective: Do not change software implementation Translates to: Change library beneath PETSc-dev
GPU enabled PETSc version MatMult() is already implemented [Minden et al., 2010] MatCopy(), MatScale(), MatAXPY() have to added ..
10 / 18
PETSc contd
PETSc classes Basic PETSc principles: Object oriented programming using C language:
Data encapsulation Polymorphism Inheritance
11 / 18
PETSc contd
Class Mat Operations: struct MatOps ... PetscErrorCode PetscErrorCode PetscErrorCode PetscErrorCode ... }; { (*mult)(Mat,Vec,Vec); (*copy)(Mat,Mat,...); (*scale)(Mat,PetscScalar); (*axpy)(Mat,PetscScalar,Mat,...);
12 / 18
Results
GPU: GeForce GTX 480 CPU: Intel Xeon E5520, running at 2.27 GHz Biogeochemical model: N-DOP [Dutkiewicz et al., 2005] Min 621.43 s 28.17 s 22.06 Max 626.79 s 28.20 s Avg 622.14 s 28.18 s StdDev 0.540 0.003
CPU GPU
Overall performance gain simulating one model year using the N-DOP model at a longitudinal and latitudinal resolution of 2.8125 .
13 / 18
Results contd
Performance gain per operation: Routine BGCStep MatCopy MatScale MatAXPY MatMult CPU 469.76 s 34.04 s 23.33 s 37.49 s 58.19 s GPU 13.05 s 3.91 s 1.99 s 2.89 s 5.87 s CPU : GPU 36.00 8.70 11.70 12.96 9.92
14 / 18
Results contd
N-DOP model, CPU (626.07s/a)
BGCStep 46.0% 75.0% Other 9.3% 6.0% 3.7% 5.4% MatMult MatCopy 2.2% 20.7% MatMult MatAXPY Other
MatScale
Distribution of computational time per operation during the simulation of one model year.
E. Siewertsen, J. Piwonski, T. Slawig (CAU) GPU accelerated PETSc 19. M arz 2013 15 / 18
Results contd
Simulation of one cycle (N-DOP, 2.8125, 2880 time steps) 2.10 GHz AMD Barcelona (rzcluster) 2.67 GHz Intel Westmere (rzcluster) 2.93 GHz Intel Gainestown (HLRN) GeForce GTX 480
150
100
50 28 10 17 20 28 30 40 processor count 50 56 60
Remarks
Only sequential PETSc operations were implemented Ongoing work on multi GPU implementation
17 / 18
Thanks
18 / 18