Beruflich Dokumente
Kultur Dokumente
In our research, we are using commercial or open source scientific packages as is. Parallelization of the
codes that we use is well studied, generic information about their scaling performance can be found in the
following references for VASP, QE, BerkeleyGW and LAMMPS. While determining our computational
requirements, we made sure that our parameters and resources we use per job comply with the general
instructions in the references. Additionally, we performed individual tests for systems that we studied. In
general VASP and QE codes are scalable up to 256 cores, BerkeleyGW and LAMMPS codes are shown to
be scalable up to 1000 cores in different systems, which contain number of atoms that are of the same
order.
0.08 8
4 cores
0.06 6
0.04 8 cores 4
16 cores 32 cores
0.02 2
512 cores
64 cores
0 1 2
0
10 10
Number of CPU cores on Stampede
Figure 1. Performance scaling test of the rGO formation simulation on Stampede. Blue spheres (left y-axis)
indicate the computational time per MD step as a function of the number of CPU cores. The result clearly
indicates a non-linear scaling behavior, and, in particular, it shows that the simulation speed scales badly
when more than 32 cores are used. Green spheres (right y-axis) show the CPU time per MD step as a
function of the number of CPU cores, clearly suggesting that the use of 32 CPU cores will be the most
effective way to utilize the XSEDE resource, while a decent calculation speed is still obtained.
2.2. Functionalized Graphene and COF for Thermoelectric Applications
To obtain the electronic structure of a system containing about 300 atoms, we need to relax the structure
first, then perform bandstructure calculations on a dense kgrid, which will be achieved by the VASP
package. In previous years, we have finished simulations for the first prototype systema graphene sheet
passivated by two alternating types of ligands (donor: H-aryne, acceptor: F-aryne). As shown in figure 2,
parallelization beyond 5 nodes is inefficient, thus less than 5 nodes will be used for most calculations. The
number of nodes will only be increased slightly when a large memory is required, such as in simulations
with particularly dense k-grids. With 5 nodes, each SCF loop for our prototype system took 0.51 hours, and
the geometry optimization finished after 323 SCF loops with a total amount of 137 hours. Then, the
calculations for density of states, bandstructure, and charge density with hybrid functional (each of these
are multiple-step calculations) cost 92, 141, and 46 hours respectively. Furthermore, we calculated the
carrier mobility by deformation theory, which requires a very dense kpoint sampling to correctly capture
the energy variation with respect to lattice stretching. Each single-point energy calculation cost about 18
hours, and 10 samples with different lattice constants
were considered.
2.3. First Principles Calculations of Electronic Structure of PbS Quantum Dots
From scaling test, we observed that VASP simulation of PbS quantum dot systems on Stampede offers
highly efficient parallelism. It scales well up to 768 processors with speed up of 3.86X relative to 48
processors. All the scaling relevant details are summarized/plotted in figure1 below. Note that this scaling
test is done for the smallest sample in terms of number of atoms (or electrons) since we used bare PbS
quantum dots (~134 atoms). In most cases, however, we plan to used ligand-passivated PbS quantum dots
that consists of up to 700 atoms. Also, note that wall time obtained here is CPU time for only one SCF
loops (24 electronic steps). Hence, the actual wall time will be significantly larger due to thousands of ionic
relaxation steps expected.
Figure 3. Scaling test result
on Stampede. Upper, middle,
and lower plots are for wall
time (in sec), total CPU hours
(in hr) and Speed up. Core
numbers tested on Stampede
are 48, 96, 192, 384, 768,
1532, 3064, and 6144. Wall
time here represent the CPU
time used for only one SCF
loop (24 electronic steps).
Geometry is 132 atoms PbS
system (isolated quantum dots
with vacuum region of >15A
in all x, y, and z directions)
Other notable VASP settings
in INCAR are (1) ENCUT =
400eV, (2) LREAL=A, and
(3) NPAR= # of nodes used.