Sie sind auf Seite 1von 4

Supplemental Material for “Machine Learning Predictions of Molecular Properties:

Accurate Many-Body Potentials and Non-Locality in Chemical Space”

Katja Hansen1 , Franziska Biegler2 , Raghunathan Ramakrishnan3 , Wiktor Pronobis2 ,


O. Anatole von Lilienfeld3,4 , Klaus-Robert Müller2,5 , and Alexandre Tkatchenko1∗
1
Fritz-Haber-Institut der Max-Planck-Gesellschaft, Faradayweg 4-6, 14195, Berlin, Germany
2
Machine Learning Group, Technical University of Berlin, Marchstr. 23, 10587 Berlin, Germany
3
Institute of Physical Chemistry and National Center for Computational Design and Discovery of Novel Materials,
Department of Chemistry, University of Basel, Klingelbergstrasse 80, CH-4056 Basel, Switzerland
4
Argonne Leadership Computing Facility, Argonne National Laboratory, Argonne, Illinois 60439, USA
5
Department of Brain and Cognitive Engineering, Korea University, Korea
(Dated: May 20, 2015)

THE BAG-OF-BONDS (BOB) APPROACH λ=0.00002 for LUMO.

In the BoB model, each molecule is represented as a


vector composed of bags, where each bag represents a
particular pair of elements (i.e. C–C, C–N, and so on),
irrespective of electronic hybridization state. This ap-
proach builds on the so-called Coulomb matrix – a rep-
resentation introduced by Rupp et al. [1], which is cal-
culated as follows:
(
0.5Zi2.4 ∀i = j
Cij = Zi Zj
|Ri −Rj | ∀i 6= j,

where Zi and Zj are the nuclear charges while Ri and Rj


are the positions of the two atoms i and j participating in
a given “bond”. The exponent in the diagonal elements
of the Coulomb matrix corresponds to a fit to the total
potential energies of the free atoms[1]. The off-diagonal
elements are identical with the Coulomb repulsion for all
pairs of nuclei in the molecule. To vectorize the molecule,
we sorted all the off-diagonal Coulomb matrix entries
(corresponding to each combination of nuclear charges),
and filled them into one “bond” bag. We found that the
inclusion of diagonal elements from the Coulomb matrix
does not have any significant influence on the predictions
obtained from the BoB approach. We then concatenate
the resulting bags (for every atom type and for every pair
there is one bag) in a pre-defined order (which is irrel-
evant to the learning process). Zero-padding is used to
obtain bags of equal sizes for all molecules in the GDB-
7 database. This representation is naturally invariant
under molecular rotations and translations, whereas the
permutational invariance is enforced by the sorting step.
Note that unlike the Coulomb matrix sorted by norm of
rows, the BoB descriptor does not distinguish between
homometric molecules. However, our database is devoid
of such cases. Figure 1 shows schematically how the bag
of bonds vector for a given molecule is constructed.
The values for the BoB kernel parameters employed
in our paper are as follows for the models trained on
N =5732 (p=1, σ and λ in atomic units): σ=3486,
λ=0.0 for atomization energy, σ=4861, λ=0.00006 for
polarizability, σ=1350, λ=0.00007 for HOMO, σ=1163,
2

FIG. 1. Schematic view of the bag of bonds representation. (a) Shows the three-dimensional structure of ethanol (CH3 CH2 OH)
and (b) specifies the involved nuclear charges for each Coulomb matrix element. In (c) the different Coulomb matrix entries
which are present for ethanol are sorted into bags and the Bag of Bonds vector (d) is obtained by concatenating these bags
and adding zeros to allow for dealing with other molecules with larger bags.

C–O AND C–N PAIRWISE POTENTIALS mance of both models on the 134k dataset from Ref. [2]
in Figure 3. For the largest training dataset of N =
Figure 2 shows the polynomial pairwise potentials cor- 40000 molecules, the BoB model yields an out-of-sample
responding to Eq.(1) in the paper, obtained for C–O and accuracy of 2 kcal/mol, compared to 4 kcal/mol for the
C–N interactions. Coulomb matrix model.

COMPARISON OF BOB WITH COULOMB


MATRIX ON THE 134K DATASET

tkatchenko@fhi-berlin.mpg.de
[1] M. Rupp, A. Tkatchenko, K.-R. Müller, and O. A. von
In order to demonstrate the significant improvement Lilienfeld, Phys. Rev. Lett. 108, 058301 (2012).
of predictions of the newly developed BoB model when [2] R. Ramakrishnan, P. O. Dral, M. Rupp, and O. A. von
compared to the Coulomb matrix model previously pro- Lilienfeld, Sci. Data 1, 140022 (2014).
posed in the literature [1], we demonstrate the perfor-
3

0
4
Degree 6

Density based on 27872 pairs


Estimated energy [kcal/mol]
Degree 10
−50
Degree 18 3
Lennard-Jones
CO1
−100 0
2
−6

−150 −12
2.3 2.5 2.7 1
CO2
−200

1.0 1.5 2.0 2.5 3.0 3.5


Distance [Å]

0
4
Degree 6

Density based on 30720 pairs


Estimated energy [kcal/mol]

Degree 10
−50
Degree 18 3
CN1 Lennard-Jones

−100 0
2
−10
CN2
−150
−20
2.3 2.5 2.7 1

−200
CN3
1.0 1.5 2.0 2.5 3.0 3.5
Distance [Å]

FIG. 2. Polynomial potentials for C–O (top) and C–N (bottom) interaction: The normalized gray histogramm refers to the
distribution of C–C distances within the GDB-7 dataset and is associated with the right-hand axis. The red dots represent the
energies of the C–C single, double and triple bond as given by fits to experimental bond energies. In blue, polynomial two-body
potentials (as trained in cross validation) are shown.
4

CM BOB
14
U0
U
12 H
G

10

MAE [kcal/mol]
8

0
1k 10k 1k 10k
N N

FIG. 3. Comparison of the performance of the BoB and Coulomb matrix (CM) models on the 134k dataset of equilibrium
molecular geometries from Ref. [2].

Das könnte Ihnen auch gefallen