Sie sind auf Seite 1von 2

Prediction and Microscopic Understanding of Materials for Renewable

Energy Conversion and Sustainable Manufacturing


Code Performance and Scaling Document
1. Introduction

In our research, we are using commercial or open source scientific packages as is. Parallelization of the
codes that we use is well studied, generic information about their scaling performance can be found in the
following references for VASP, QE, BerkeleyGW and LAMMPS. While determining our computational
requirements, we made sure that our parameters and resources we use per job comply with the general
instructions in the references. Additionally, we performed individual tests for systems that we studied. In
general VASP and QE codes are scalable up to 256 cores, BerkeleyGW and LAMMPS codes are shown to
be scalable up to 1000 cores in different systems, which contain number of atoms that are of the same
order.

2. Scaling performance of individual projects


2.1. Nanoporous Graphene for Ethanol/Water Separation
Figure 1 shows a detailed performance test on Stampede as a function of CPU cores, and we found that the
performance of ReaxFF simulations scales badly when the number of cores increases. In particular a larger
number of cores (i.e., > 32 CPU cores) cannot significantly speed up the rGO formation simulations. As a
result, 16-32 CPU cores will be used for these calculations to maximize the utilization of XSEDE
(Stampede) resource, while a decent computation speed is still obtained.

0.08 8

4 cores

CPU time (# cores*s) per MD step


Computing time (s) per MD step

0.06 6

0.04 8 cores 4

16 cores 32 cores
0.02 2
512 cores
64 cores

128 cores 256 cores

0 1 2
0
10 10
Number of CPU cores on Stampede

Figure 1. Performance scaling test of the rGO formation simulation on Stampede. Blue spheres (left y-axis)
indicate the computational time per MD step as a function of the number of CPU cores. The result clearly
indicates a non-linear scaling behavior, and, in particular, it shows that the simulation speed scales badly
when more than 32 cores are used. Green spheres (right y-axis) show the CPU time per MD step as a
function of the number of CPU cores, clearly suggesting that the use of 32 CPU cores will be the most
effective way to utilize the XSEDE resource, while a decent calculation speed is still obtained.
2.2. Functionalized Graphene and COF for Thermoelectric Applications
To obtain the electronic structure of a system containing about 300 atoms, we need to relax the structure
first, then perform bandstructure calculations on a dense kgrid, which will be achieved by the VASP
package. In previous years, we have finished simulations for the first prototype systema graphene sheet
passivated by two alternating types of ligands (donor: H-aryne, acceptor: F-aryne). As shown in figure 2,
parallelization beyond 5 nodes is inefficient, thus less than 5 nodes will be used for most calculations. The
number of nodes will only be increased slightly when a large memory is required, such as in simulations
with particularly dense k-grids. With 5 nodes, each SCF loop for our prototype system took 0.51 hours, and
the geometry optimization finished after 323 SCF loops with a total amount of 137 hours. Then, the
calculations for density of states, bandstructure, and charge density with hybrid functional (each of these
are multiple-step calculations) cost 92, 141, and 46 hours respectively. Furthermore, we calculated the
carrier mobility by deformation theory, which requires a very dense kpoint sampling to correctly capture
the energy variation with respect to lattice stretching. Each single-point energy calculation cost about 18
hours, and 10 samples with different lattice constants
were considered.

Figure 2. Scaling information for 1 SCF loop in DFT


calculation of the prototype system using VASP
package






2.3. First Principles Calculations of Electronic Structure of PbS Quantum Dots
From scaling test, we observed that VASP simulation of PbS quantum dot systems on Stampede offers
highly efficient parallelism. It scales well up to 768 processors with speed up of 3.86X relative to 48
processors. All the scaling relevant details are summarized/plotted in figure1 below. Note that this scaling
test is done for the smallest sample in terms of number of atoms (or electrons) since we used bare PbS
quantum dots (~134 atoms). In most cases, however, we plan to used ligand-passivated PbS quantum dots
that consists of up to 700 atoms. Also, note that wall time obtained here is CPU time for only one SCF
loops (24 electronic steps). Hence, the actual wall time will be significantly larger due to thousands of ionic
relaxation steps expected.
Figure 3. Scaling test result
on Stampede. Upper, middle,
and lower plots are for wall
time (in sec), total CPU hours
(in hr) and Speed up. Core
numbers tested on Stampede
are 48, 96, 192, 384, 768,
1532, 3064, and 6144. Wall
time here represent the CPU
time used for only one SCF
loop (24 electronic steps).
Geometry is 132 atoms PbS
system (isolated quantum dots
with vacuum region of >15A
in all x, y, and z directions)
Other notable VASP settings
in INCAR are (1) ENCUT =
400eV, (2) LREAL=A, and
(3) NPAR= # of nodes used.

Das könnte Ihnen auch gefallen