Beruflich Dokumente
Kultur Dokumente
>
s s
s
=
max
max ideal
ideal max
ideal
ideal
ideal
x x
x x x
x x
x x
x x
x x x B
if
if
if
0
0 . 1
1
) , , (
max
GOLD User Guide 51
In the GOLD implementation of ChemScore, the block function is sometimes convoluted with a
Gaussian function:
The effect is to smooth the function, e.g.:
0.0
1.0
x
ideal
x
max
}
}
+
+
=
du u g
du u g x x u x B
x x x B
ideal
ideal
) , (
) , ( ) , , (
) , , , ( '
max
max
o
o
o
2 2
2 /
) , (
o
o
u
e u g
=
174 GOLD User Guide
11.1 The Ligand Log File (gold_ligand_m1.log)
Ten docking runs have been set up for this ligand and, for each of these docking runs, the
progress of the genetic algorithm is displayed in the GOLD Output window. This information is
also recorded in the ligand log file gold_ligand_m1.log (where m1 is the index to the number of
the ligand in the input file).
Open and inspect gold_ligand_m1.log using a text editor (a section of an example ligand log file
is shown):
Following the completion of all docking runs on the ligand, the results from the different runs
are compared. The end of the gold_ligand_m1.log file will include a matrix of root mean square
deviations (rmsd) between the various docked ligand positions (see Section 14.10.2, page 120).
A clustering report is also given which can be used to identify different binding modes (see
Section 14.10.3, page 122). It is possible that fewer than the specified ten dockings were
completed due to the Allow early termination option being selected (see Section 5., page 167). In
the example output shown below, the solution found for docking attempt number 2 has the best
GOLD User Guide 173
Any error or warning messages produced will be displayed in a separate GA Program Error
Message window (this might normally contain a number of warning messages relating to the
GOLD atom type assigner). These messages can be safely ignored.
Once the job is complete the message GA Done will appear in the GOLD Output window. The
output displayed is also written to the ligand.log file but can be saved under a different filename
by selecting the Save Output button.
Dismiss the GOLD Output window by clicking on the Dismiss button.
11. Analysis of Output
11.1 The Ligand Log File (gold_ligand_m1.log) (see page 174)
11.2 Fitness Function Rankings Files (ligand_m1.rnk and bestranking.lst) (see page 175)
11.3 Files Containing The Docked Ligand (gold_soln_ligand_m#_n.mol2) (see page 176)
The specified output directory (see Section 9., page 170) will contain a number of files
including:
Files containing the initialised protein and ligand (gold_protein.mol2 and gold_ligand.mol2)
Files containing the docked ligand (gold_soln_ligand_m1_n.mol2)
Files containing fitness function rankings (ligand_m1.rnk and bestranking.lst)
Protein and ligand log files (gold_protein.log and gold_ligand_m1.log)
Files containing error messages (gold.err), this file will be empty if no errors are found.
Some of these output files will be dealt with in detail below. Further information on the content
of all these output files is available (see Section 14., page 109).
52 GOLD User Guide
6.4.3 Hydrogen-Bond Terms
The hydrogen-bond term is computed as a sum over all possible donor-acceptor pairs, such that
one atom belongs to the protein and the other to the ligand.
Each term in the summation is the product of three Gaussian-smoothed block functions (see
Section 6.4.2, page 50). The purpose of the block functions is to reduce the contribution of a
hydrogen bond according to how much its geometry deviates from (a) ideal H...A distance, (b)
ideal D-H...A angle and (c) ideal directionality with respect to the acceptor atom. The maximum
contribution of a given donor-acceptor pair to the summation is 1; this will occur if the pair form
a hydrogen bond of ideal geometry.
The tables below describe the various parameters in this equation, their meanings, and what they
are called in the ChemScore parameter file (see Section 6.5, page 58).
D-H..A distance parameters (D= Donor, A = Acceptor)
Term Meaning Name in ChemScore file Default
value
r The ideal hydrogen..acceptor
(H...A) distance (in )
R_IDEAL 1.85
0.0
1.0
x
ideal
x
max
) , , , '*( ). , , , ( ' . ) , , , ( '
| o
o | | | o o o o o
max ideal max ideal r max ideal hbond
B B r r r B G A A A A A A A A A = A
pairs
acceptor - donor all
GOLD User Guide 53
Ar The absolute deviation of the
actual H..A separation from r
Calculated for each H-
bond
-
Ar
ideal
The tolerance window around
the H..A distance, r, within
which the H-bond is regarded
as ideal
DELTA_R_IDEAL 0.25
Ar
max
The maximum possible
deviation from the ideal
distance; above this, the
interaction is not regarded as
an H-bond
DELTA_R_MAX 0.65
o
r
The Gaussian smearing sigma
associated with this term.
HBOND_R_SIGMA 0.1
D-H..A angle parameters (D= Donor, A = Acceptor)
Term Meaning Name in ChemScore file Default
value
o The ideal D-H..A angle (in
degrees)
ALPHA_IDEAL 180.0
A o The absolute deviation of the
actual D-H..A angle from o
Calculated for each H-
bond
-
A o
ideal
The tolerance window around
the D-H..A angle, o, within
which the H-bond is regarded
as ideal
DELTA_ALPHA_IDEAL 30.0
A o
max
The maximum possible
deviation from the ideal D-
H..A angle; above this, the
interaction is not regarded as
an H-bond
DELTA_ALPHA_MAX 80.0
o
o
The Gaussian smearing sigma
associated with this term.
HBOND_ALPHA_SIGMA 10.0
DH..A-X acceptor-centred angle parameters (D= Donor, A = Acceptor, X =
Heavy atom attached to A)
Term Meaning Name in ChemScore file Default
value
| The ideal H..A-X angle (in
degrees)
BETA_IDEAL 180.0
172 GOLD User Guide
Filter out all solutions with fitness scores lower than a specified value
By default the Keep all solutions option from the Selecting Docked Solutions panel in the Output
preferences window should be selected:
Select Done to close the Output preferences window.
10. Running GOLD
The main Control panel of the GOLD front end contains a number of options, including:
The Run button, which will start a GOLD job, and display the output to the screen until
completion of the job.
Save&Exit which will save all the settings defined in the GOLD front end in a configuration
file (gold.conf) and then close the front end. The configuration file includes details of the
ligand, the protein binding site, the fitness-function parameter file to be used, the torsion
distribution file to be used, and the genetic algorithm parameters (see Section 15., page 126).
Submit&Exit which will start a GOLD run in the background (and also save a configuration
file), then close the front end.
The Configuration File button which enables the settings from a previously saved
configuration file to be opened. This will automatically load the saved parameter values into
the front end (see Section 15., page 126).
Click on the Run button in the GOLD front end.
As the job progresses output will be displayed in a GOLD Output window:
GOLD User Guide 171
Ensure that the Save rnk files and Save solution log files check boxes are switched on, this will
instruct GOLD to retain output files listing fitness-function rankings and ligand log files. The
content of these files are discussed later (see Section 11., page 173).
By default, docking solutions will be written out in the same format as was used for input (i.e.
MOL2 format), ensure that the Same as input output file format option is selected.
Click on the Output Directory... button and specify a directory, to which you have write
permission, this is where the GOLD output files will be written.
It is possible to write additional information to docked solution files. This information is written
to SD file tags; for MOL2 files, these tags are written to comment blocks. This information is
particularly important for post-processing docking results with SILVER. For the purpose of this
tutorial the Information in File settings can be left at their default settings.
GOLD can produce a large amount of output. However, it is possible to cut this down by
applying output filter options. These options can be used to:
Specify that all docking solutions are saved
Retain only the n best docking solutions
Save the top-ranked solution for the best m ligands only
54 GOLD User Guide
The third block function in the H-bond equation, B
*
, is the sum of all possible values for a given
hydrogen bond. For example, a tertiary amine acceptor has three covalently-bound atoms that
could be deemed as the X atom: in this case, the term added for an H-bond to the amine is the
product of the block-function values for all three possible H..A-X angles.
Hydrogen bonds have a regression coefficient associated with them, v
1
(see Section 6.4.1, page
49). By default, this is set to 3.34. The name of this coefficient in the ChemScore parameter file
(see Section 6.5, page 58) is HBOND_COEFFICIENT.
6.4.4 Metal-Binding and Lipophilic Terms
The metal-binding term in ChemScore is computed as a sum over all possible metal-ion ...
acceptor pairs, where the acceptor is an atom in the ligand that is capable of binding to a metal.
Each term in the summation is a Gaussian-smoothed block function (see Section 6.4.2, page 50)
whose purpose is to reduce the contribution of the metal-acceptor interaction if the geometry is
not ideal.
The table below describes the various parameters in this equation, their meanings, and what they
are called in the ChemScore parameter file (see Section 6.5, page 58).
A | The absolute deviation of the
actual H..A-X angle from |
Calculated for each H-
bond
-
A |
ideal
The tolerance window around
the H..A-X angle, |, within
which the H-bond is regarded
as ideal
DELTA_BETA_IDEAL 70.0
A |
max
The maximum possible
deviation from the ideal H..A-
X angle; above this, the
interaction is not regarded as
an H-bond
DELTA_BETA_MAX 80.0
o
|
The Gaussian smearing sigma
associated with this term.
HBOND_BETA_SIGMA 10.0
=
acceptors
ligand All
metals
protein All
) , , , (
metal max ideal aM metal
R R r B P o
GOLD User Guide 55
The metal-binding term has a regression coefficient associated with it, v
2
(see Section 6.4.1,
page 49). By default, this is set to 6.03. The name of this coefficient in the ChemScore
parameter file (see Section 6.5, page 58) is METAL_COEFFICIENT.
The lipophilic term is defined in a similar way:
The table below describes the various parameters in this equation, their meanings, and what they
are called in the ChemScore parameter file (see Section 6.5, page 58).
Metal-binding parameters in ChemScore
Term Meaning Name in ChemScore file Default
value
r
aM
The actual acceptor-metal distance
(in )
Calculated for each
acceptor-metal pair
-
R
ideal
The ideal acceptor-metal distance METAL_R1 2.6
R
max
The maximum acceptor-metal
distance to be considered a binding
interaction
METAL_R2 3.0
o
metal
The Gaussian smearing sigma
associated with this term
METAL_R_SIGMA 0.1
Lipophilic parameters in ChemScore
Term Meaning Name in ChemScore file Default
value
r
ll
The actual distance between the
pair of lipophilic atoms (in )
Calculated for each atom-
atom pair
-
R
ideal
The ideal atom...atom distance
separation
LIPO_R1 4.1
R
max
The maximum separation, beyond
which no interaction is deemed to
occur
LIPO_R2 7.1
o
lipo
The Gaussian smearing sigma
associated with this term
LIPO_R_SIGMA 0.1
=
atoms lipophilic
ligand All
atoms lipophilic
protein All
) , , , (
lipo max ideal ll lipo
R R r B P o
170 GOLD User Guide
are shown):
Care should be taken when altering these parameter settings and you are recommended to use
one of the pre-defined parameters sets offered. Alternatively, GOLD can decide on the optimal
settings to use for a given ligand (see Section 11.3, page 94).
To enable automatic GA settings, click on the Select GA Presets and Automatic Settings button in
the Genetic Algorithm Parameters panel (or hit Settings in the Control panel) then, in the
Settings selector window, click on Use automatic settings. Ensure the Search efficiency is set to
100%, then hit Done.
The criteria used by GOLD to determine the optimal GA parameter settings for a given ligand
include: the number of rotatable bonds in the ligand, ligand flexibility, i.e. number of flexible
ring corners, flippable nitrogens, etc., the volume of the protein binding site, and the number of
water molecules considered during docking. Details of the exact settings used will be given in
the ligand log file gold_ligand_m1.log (see Section 14.10, page 118).
9. Setting Output Preferences
Select the Output... button in the GOLD front end to open the Output preferences window:
GOLD User Guide 169
GoldScore is the original GOLD scoring function and is made up of four components:
protein-ligand hydrogen bond energy (external H-bond)
protein-ligand van der waals (vdw) energy (external vdw)
ligand internal vdw energy (internal vdw)
ligand torsional strain energy (internal torsion)
It is possible to alter the empirical parameters used in the fitness function (hydrogen bond
energies, atom radii and polarisabilities, torsion potentials, hydrogen bond directionalities, etc.)
within the GOLD parameters file. The default GOLD parameters file (gold.params) can be
found in:
UNIX: $GOLD_DIR/gold.params
Windows: <InstallDir>/GOLD/gold.params
where <InstallDir> is usually C:/Program Files/CCDC
For the purpose of this tutorial ensure that the Parameter File entry box in the Input Parameters
and Files section of the GOLD front end is set to gold.params, or DEFAULT when used for the
first time.
Torsion angle distributions, extracted from the Cambridge Structural Database (CSD), can be
used to restrict the ligand conformational space sampled by the genetic algorithm. Using torsion
angle distributions in this way may improve the chances of GOLD finding the correct answer by
biasing the search towards ligand torsion-angle values that are commonly observed in crystal
structures. It may also improve convergence and so make GOLD usable with faster settings (see
Section 9.1, page 83).
By default the use of torsion angle distributions should be enabled. Click on the Fitness & Search
options button in the GOLD front end. In the resulting window ensure the check box labelled
Use torsion angle distributions from the CSD is switched on.
8. Genetic Algorithm Parameter Settings
GOLD optimises the fitness score using a genetic algorithm (GA) (see Section 10., page 89).
A number of parameters control the precise operation of the genetic algorithm. Genetic
algorithm parameter settings can be specified in the GOLD front end (standard default settings
56 GOLD User Guide
The difference between the metal and lipophilic parameterisation is that the lipophilic term is
scored over a much longer range.
Lipophilic atoms are defined as non-accepting sulphurs, non-polar carbon atoms (polar carbon
atoms are carbon atoms attached to two or more polar atoms), and non-ionic chlorine, bromine
and iodine atoms.
The lipophilic term has a regression coefficient associated with it, v
3
(see Section 6.4.1, page
49). By default, this is set to 0.117. The name of this coefficient in the ChemScore parameter
file (see Section 6.5, page 58) is LIPO_COEFFICIENT.
6.4.5 Rotatable-Bond Freezing Term
The following formula is used to estimate the entropic loss that occurs when single, acyclic
bonds in the ligand become non-rotatable upon binding:
N
rot
is the number of frozen rotatable bonds in the ligand (a bond is considered frozen if one or
more atoms on both sides of the rotatable bond is in contact with the protein). The expression is
deemed to have a value of zero if there are no rotatable bonds in the ligand.
P
nl
(r) and P
nl
(r) are the percentages of non-hydrogen atoms on either side of the rotatable bond
that are not lipophilic. For example, if there are 10 non-hydrogen atoms on one side of the bond,
of which 3 are not lipophilic, and there are 20 non-hydrogen atoms on the other side, of which 2
are not lipophilic, then P
nl
(r) and P
nl
(r) are 30% and 10%, respectively.
The regression coefficient associated with this term, v
4
(see Section 6.4.1, page 49), has the
default value 2.56. The name of this coefficient in the ChemScore parameter file (see Section
6.5, page 58) is ROT_COEFFICIENT.
6.4.6 Clash Penalty and Internal Torsion Terms
Clashes between protein and ligand atoms and ligand internal torsional strain are accommodated
by penalty terms.
These terms are included to prevent poor geometries in docking.
The clash penalty terms in ChemScore differ on the nature of the contact, i.e. whether it is a
hydrogen-bonding contact, a metal-binding contact or neither of these.
Any hydrogen bond with an H...A distance shorter than r
hbond
contributes a clash term of:
+
+ =
r
nl nl
rot
rot
r P r P
N
p
2
)) ( ' ) ( (
)
1
1 ( 1
GOLD User Guide 57
The value of r
hbond
(default = 1.6) can be changed by altering the parameter
CLASH_RADIUS_HBOND in the ChemScore file (see Section 6.5, page 58).
Any metal coordination contact shorter than r
metal
contributes a clash term of:
The value of r
metal
(default = 1.3*) can be changed by altering the parameter
CLASH_RADIUS_METAL in the ChemScore file (see Section 6.5, page 58).
All other ligand-protein interatomic contacts contribute clash terms of the following form:
r
clash
varies with contact type: for contacts to protein sulphur atoms, it is set to 3.35; for all
other contacts, it is set to 3.10. These settings correspond to the parameters
CLASH_RADIUS_SULPHUR and CLASH_RADIUS_GENERAL in the ChemScore file (see
Section 6.5, page 58).
Internal ligand strain is accommodated by clash terms in combination with torsional strain terms
of the form:
( )
hbond hbond
hbond
hbond clash
r G
r r
P
A
=
0 . 20
( )
metal metal
metal
metal clash
r G
r r
P
A
=
0 . 20
( )
clash
clash
other clash
r
r r
P
+ =
0 . 4
0 . 1
( )
u u =
bonds
rotatable All
) cos( 1
0
n A P
i internal
168 GOLD User Guide
The orthogonal x, y, z coordinates of a solvent accessible point approximately at the centre of the
active site should be entered. The centre of the binding site in 1acm has already been centred
over the origin, so in this case the coordinates can be left as 0.0, 0.0, 0.0.
The approximate radius of the binding site must also be specified. By default the binding site
radius is set to 10.0 , ensure that this is the case. This radius should be large enough to contain
any possible binding mode of the N-phosphonacetyl-L-aspartate ligand.
A cavity detection algorithm, LIGSITE, is used to restrict the region of interest to concave,
solvent-accessible surfaces. Ensure that cavity detection is enabled by switching on the button
labelled Detect Cavity:
7. Fitness Function and Search Settings
During a docking run the solutions found by GOLD are scored according to a fitness function
(see Section 6., page 46).
GOLD offers a choice of three fitness functions, GoldScore (see Section 6.2, page 46),
ChemScore (see Section 6.4, page 49) and User Defined Score (see Section 6.10, page 62).
The User Defined Score allows you to modify existing scoring functions, or to implement a
completely new scoring function using an Applications Programming Interface (API). A good
knowledge of the C programming language is required together with some experience in using
GOLD. Full documentation for the GOLD Scoring Function API is provided:
UNIX: $GOLD_DIR/gold/api_doc/index.html
Windows: <InstallDir>/GOLD/gold/api_doc/index.html
where <InstallDir> is usually C:/Program Files/CCDC
Ensure that the default GoldScore scoring function is selected within the Fitness Function and
Search Settings panel of the GOLD front end (see Section 6., page 46):
GOLD User Guide 167
Add single ligands
Select a complete directory of ligand files.
Specify a single file containing several ligands (i.e. a multi-MOL2 or SD file).
Click on the Filename button and select ligand.mol2 from <GOLD_DIR>/examples/
tutorial1.
The number of dockings to be performed on each ligand is specified by entering a value for the
No. of GA runs. By default this should be set to ten, if not set the number of docking runs to ten.
Click on Add file or Update selected file, the filename of the selected ligand and the number of
dockings are now displayed in the Current Ligand File Selection list. Hit Done to close the
Ligand selection for docking run window.
5. Input Parameters and Files Settings
The specified protein input file should be displayed within the Input Parameters and Files panel
of the GOLD front end, and the Ligands Count should be displayed as 1.
By default the Set atom types check button for the Ligand only should be switched on in the
Input Parameters and Files panel, further information on atom type assignment is provided (see
Section 5.1, page 36). If this is not the case, then enable the Set atom types option for the Ligand.
By default the Allow early termination check box should be switched on and contain the
following early termination criteria:
This will instruct GOLD to terminate the docking if, at any point, the best three solutions found
are all within 1.5 rmsd of each other. In this case, it is probable that the answer is correct and
further docking runs will not be required.
6. Defining the Ligand Binding Site
It is necessary to specify the approximate centre and extent of the protein binding site, this can
be done in a number of ways, including:
from a point (see Section 3.8.1, page 25);
from a protein atom (see Section 3.8.2, page 25);
from a file containing a list of atoms (see Section 3.8.3, page 26);
from a protein residue (see Section 3.8.4, page 26);
from a file containing a list of residues (see Section 3.8.5, page 27);
from a reference ligand (see Section 3.8.6, page 28).
For this example, switch on the button labelled Point in the GOLD front end:
58 GOLD User Guide
Bonds are deemed to be rotatable if they are single and acyclic and involve pairs of atoms with
hybridisation states sp3-sp3, sp3-sp2 or sp2-sp2.
The parameters A, n and u in the above equation are set in the ChemScore file (see Section 6.5,
page 58). The relevant lines are SP3_SP3_BOND, SP3_SP2_BOND, SP2_SP2_BOND and
UNKNOWN_BOND. The syntax is of the form:
SP3_SP3_BOND A n u
0
For example:
SP3_SP3_BOND 0.18750 3.0 3.1515926
The overall contribution of intramolecular strain to the scoring function is scaled by the
coefficient called INTRA_COEFFICIENT in the ChemScore file (see Section 6.5, page 58)
6.4.7 Covalent Term
When covalent bonding is switched on (see Section 4.6, page 33) the ChemScore function is
modified in the following ways:
The clash term (see Section 6.4.6, page 56) is reduced so that no clash is registered for 1-2 or
1-3 contacts around the link atoms in the protein and ligand.
Torsion terms (see Section 6.4.6, page 56) are added for the rotatable parts of the linkage.
A valence-angle bending term is added to the overall energy to penalize poor link geometries.
The weight of the covalent link energy in the ChemScore function is controlled by the parameter
called LINK_BEND_COEFFICIENT in the ChemScore parameter file (see Section 6.5, page
58).
6.4.8 Constraint Terms
Constraints (see Section 8., page 68) are implemented in ChemScore in the same way as they are
in GoldScore.
6.5 Altering ChemScore Fitness-Function Parameters; the ChemScore File
The ChemScore parameter file is stored in the GOLD distribution directory. It contains all the
parameters used by the GOLD implementation of ChemScore. A full description of the meaning
of the various parameters is given elsewhere (see Section 6.4, page 49).
The ChemScore file can be customised by copying it, editing the copy, and instructing GOLD to
use the edited file.
A copy of the default file will be placed in your current directory (where it will be called
chemscore.params) if you click on the ChemScore File button in the GOLD front end.
GOLD User Guide 59
The entry box next to the ChemScore File button in the GOLD front end should say DEFAULT if
you want to use the default ChemScore parameter file. If you want to use a customised version
of the file, click on the ChemScore File button to select the required file or directly type the file
name into the entry box.
The format of the ChemScore file is quite strict: incorrect editing may cause GOLD to behave in
unexpected ways or even to crash. Because of the large number of parameters, no guarantee can
be given that the program will behave reliably with anything other than the default
parameterisation.
6.6 Altering GOLD Parameters: the gold.params File
The parameter file gold.params is stored in the GOLD distribution directory. It contains all of
the parameters used by GOLD (e.g. hydrogen bond energies, atom radii and polarisabilities,
torsion potentials, hydrogen bond directionalities, etc.) other than those which are specified in
the configuration file (i.e. can be set via the GOLD front end).
It also contains parameters that control the general behaviour of GOLD, e.g. whether the final
solution from a genetic algorithm run is to be minimised via a Simplex procedure before being
saved.
The parameter file can be customised by copying it, editing the copy, and instructing GOLD to
use the edited file.
Click on the Edit Parameters button to edit the parameter file. If the parameter file is set to
DEFAULT then the standard GOLD distribution parameter file is copied to the current directory.
GOLD gets the location of the parameter file from the configuration file line param_file =
<parameter file location>. This is most easily defined using the Parameter File button in the
front end.
The Parameter File entry box in the GOLD front end should say DEFAULT if you want to use
the default GOLD parameter file. You can click on the button to pick an alternative parameter
file, or directly type a file name into the entry box.
The format of the parameter file is quite strict: incorrect editing may cause GOLD to behave in
unexpected ways or even to crash. Because of the large number of parameters, no guarantee can
be given that the program will behave reliably with anything other than the default
parameterisation.
For more information see the comments in the parameter file, gold.params.
6.7 Kinase Scoring Function
Weak CH..O interactions can be accounted for by inclusion of a Chemscore term that calculates
a contribution for weak hydrogen bonds. This term can be useful when dealing with particular
proteins, e.g. most kinases contain weak N-heterocycle CH...O hydrogen bonds.
This term can be enabled by editing the chemscore.params file (see Section 6.5, page 58). The
166 GOLD User Guide
The ligand has been minimised into a low-energy starting conformation and the atom types have
been checked for accuracy (see Section 4.3, page 31).
2.3 Atom Type Assignment
Each protein and ligand atom must be assigned an atom type which is used to determine whether
the atom is capable of forming hydrogen bonds. GOLD atom typing is based on SYBYL (http://
www.tripos.com/) atom types. SYBYL bond types are also used.
GOLD will automatically assign atom types provided the Set atom types check buttons are
switched on in the Input Parameters and Files panel of the GOLD front end.
GOLD deduces atom types from the information about element types and bond orders in the
input structure file, it is therefore crucial that both the protein and ligand input files are prepared
according to the guidelines provided (see Section 5.1, page 36).
3. Specifying the Protein Input File
Open GOLD and click on the Protein button in the Input Parameters and Files section of the
front end to bring up the file selection window.
Select protein.mol2 from <GOLD_DIR>/examples/tutorial1, then click on Open.
4. Specifying the Ligand Input File
Click on the Edit Ligand File List button in the GOLD front-end. The Ligand selection for
docking run window will appear:
From here it is possible to:
GOLD User Guide 165
All other parts of the protein will be kept rigid, so the only way of dealing with a truly flexible
binding site is to perform separate GOLD runs on different binding-site conformations.
2.2 Preparing the Ligand Input File
The N-phosphonacetyl-L-aspartate ligand has already been prepared in accordance with the
requirements for setting up the ligand (see Section 4., page 30).
Within SILVER read in the file ligand.mol2 from <GOLD_DIR>/examples/tutorial1
and inspect the structure:
Acceptable ligand input file formats are MOL2 (i.e. Tripos format) or MOL (i.e. MDL SD
format), PDB files can also be used, although we do not recommend the use of PDB format for
ligands (see Section 4.4, page 31).
All hydrogen atoms must be present in the ligand input file (see Section 4.2, page 30). In this
example, all hydrogen atoms have been added thus ensuring that the ionisation and tautomeric
states are defined unambiguously.
Certain groups can be represented in more than one way (i.e. have more than one canonical
form), such as nitro, carboxylate and amidinium. In such cases, there is usually a right and a
wrong representation for use in GOLD. The conventions used for some common difficult groups
and further help on setting up the ligand is provided (see Section 4., page 30).
60 GOLD User Guide
following parameters are used:
# CH...O PARAMETERS
# ================
CHO_COEFFICIENT -2.00
# OFF no CHO term
# SPECIAL only CH adjacent to heteroatoms
# ARO all aromatic CH
CHO_TYPE OFF
#CHO_TYPE SPECIAL
CHO_R_IDEAL 2.35
CHO_DELTA_R_IDEAL 0.25
CHO_DELTA_R_MAX 0.65
CHO_ALPHA_IDEAL 180.0
CHO_DELTA_ALPHA_IDEAL 50.0
CHO_DELTA_ALPHA_MAX 100.0
CHO_BETA_IDEAL 180.0
CHO_DELTA_BETA_IDEAL 70.0
CHO_DELTA_BETA_MAX 80.0
To enable calculation of a weak CH...O hydrogen bonding term S(cho) the term CHO_TYPE
should be set to SPECIAL. This will enable the recognition of activated CH groups for
hydrogen bonding. Active CH groups are those in aromatic rings next to nitrogens (e.g. the CH's
in an imidazole ring). These groups are recognised both in the ligand and protein active site.
For further details please refer to Virtual Screening Using Protein-Ligand Docking: Avoiding
Artificial Enrichment (see Section 19., page 147).
6.8 Heme Scoring Function
The heme scoring function is available for both GoldScore (see Section 6.2, page 46) and
ChemScore (see Section 6.4, page 49).
By default GOLD makes no distinction between different H-bond acceptors in terms of their
strength of interaction with the metal. A recent publication by Kirton et al (S. B. Kirton, C. W.
Murray, M. L. Verdonk and R. D. Taylor, Proteins: Structure, Function, and Bioinformatics, 58,
836-844, 2005) demonstrated how metal parameters can be set up in GOLD for both GoldScore
and ChemScore, to take account of different H-bond acceptor types. Kirton et al described the
use of ligand specific iron parameters in the context of docking to heme containing proteins and
demonstrated improved performance. It is now possible in GOLD to optionally use these
parameters.
The parameters are derived from contact statistics obtained from the CSD and PDB databases.
GOLD User Guide 61
Parameters were derived for both GoldScore and ChemScore.
These parameters can be used by choosing the appropriate .params file from those that have
been supplied with the GOLD installation. The .params files that are available are:
goldscore.p450_csd.params
goldscore.p450_pdb.params
chemscore.p450_csd.params
chemscore.p450.pdb.params
The files are located within the $GOLD_DIR/gold directory. The graphic below shows the
iron parameters for GoldScore, derived from the CSD, as displayed in the
goldscore.p450_csd.params file.
To employ one of the files, click on either the GoldScore Parameter File: button (if using
GoldScore) or the ChemScore Parameter File: button (if using ChemScore), navigate to the
$GOLD_DIR/gold, select the file required then click on Open.
It was found necessary by Kirton et al to assign the planar nitrogens in the heme molecules as
lipophilic when using the ChemScore scoring function. In order to bring this about the
chemscore.p450 parameter files therefore contain the additional keyword:
MAKE_PLANAR_N_LIPO 1
NOTE: Use of this keyword has only been validated for nitrogen atoms within heme containing
proteins. Improvements in docking performance when used with non-heme containing proteins
are not guaranteed.
164 GOLD User Guide
Acceptable protein input file formats for GOLD are PDB and MOL2.
The protein input file may be the entire protein structure, or consist of just those residues that are
in the region of the ligand binding site. GOLD searches for contacts out to a distance of 20.0 .
In this example, parts of the protein remote from the binding site have been deleted, in order to
speed up the calculation. The protein has been cut down to a radius of 20.0 around the ligand
binding site thus ensuring that enough of the protein has been retained so that all of the residues
that might reasonably interact with the ligand are present.
All hydrogen atoms must be present in the protein input file (see Section 3.2, page 10). In this
example, hydrogen atoms have been placed on the protein (using a molecular modelling
program (see Section 1., page 1) in order to ensure that ionisation and tautomeric states are
defined unambiguously. Obviously, this involved making hypotheses about the protonation
states of residues such as His, Glu and Asp.
GOLD allows for partial protein flexibility. Specifically, the torsion angles of Ser, Thr and Tyr
hydroxyl groups will be allowed to rotate during docking in order to optimise their hydrogen-
bonding to the ligand. Lysine NH
3
+
groups are similarly optimised.
Note: the optimised positions of polar protein hydrogen atoms that are generated during docking
(these will usually be different for each docked ligand pose) can be saved to the docked solution
file (see Section 14.2, page 111)
GOLD User Guide 163
Tutorial 1: A Step-By-Step Guide to Using GOLD
1. Introduction (see page 163)
2. Preparation of Input Structures for Use in GOLD (see page 163)
3. Specifying the Protein Input File (see page 166)
4. Specifying the Ligand Input File (see page 166)
5. Input Parameters and Files Settings (see page 167)
6. Defining the Ligand Binding Site (see page 167)
7. Fitness Function and Search Settings (see page 168)
8. Genetic Algorithm Parameter Settings (see page 169)
9. Setting Output Preferences (see page 170)
10. Running GOLD (see page 172)
11. Analysis of Output (see page 173)
1. Introduction
This tutorial aims to provide a step-by-step guide to using GOLD. To illustrate this, the
procedure for setting-up and running an example docking will be explained and additional
information will be provided on related issues.
In this example GOLD will be used to determine the binding mode of N-phosphonacetyl-L-
aspartate with the aspartate carbamoyltransferase, PDB entry code 1acm.
2. Preparation of Input Structures for Use in GOLD
2.1 Preparing the Protein Input File (see page 163)
2.2 Preparing the Ligand Input File (see page 165)
2.3 Atom Type Assignment (see page 166)
GOLD will only produce reliable results if the protein and ligand input files are set up correctly.
It is therefore essential that a number of key steps are followed when preparing any input
structure for use in GOLD ((see Section 3.1, page 9) and (see Section 4.1, page 30)).
2.1 Preparing the Protein Input File
The aspartate carbamoyltransferase, 1acm, has already been prepared in accordance with the
requirements for setting up the protein (see Section 3.1, page 9).
Open SILVER and read in the file protein.mol2 from <GOLD_DIR>/examples/
tutorial1 and inspect the structure:
62 GOLD User Guide
6.9 Internal Energy Offset
Click on the Fitness & Search Options button and switch on the Offset internal ligand energy by
best energy that is encountered during run check-box.
Enabling this option will result in the internal energy terms (internal torsion, internal vdw, and
internal Hbond) being corrected according to the best energy encountered for these terms during
the run.
By applying this correction the internal energy will be calculated with respect to that of a close
to optimal non-bound structure, thereby taking into account any irreducible internal energy.
The internal energy offset can be used with both Goldscore and Chemscore.
For Chemscore the ligand energy correction value is written to the docked solution files in the
tag <Gold.Chemscore.Internal.Correction>. This is the best (i.e. minimum energy) value
encountered.
For GoldScore the correction value is written to the docked solution files in the tag
<Gold.Goldscore.Internal.Correction>. This is the best score (ie. the maximum value)
encountered.
In both cases, best value encountered is subtracted from the ligand score (or energy) value before
being passing to the final GOLDscore or Chemscore-energy term. Note: The final Chemscore-
energy is converted to Chemscore-score by taking the negative.
Note: The .rnk file is corrected at the end of a run with the best energy encountered after all
docking attempts on a particular ligand (individual solution files are not). Therefore you may
observe small deviations for the best energy found between the solutions and rank file.
Increasing the number of dockings or the number of GA operations in each docking will result in
the discrepancy being less pronounced.
6.10 User Defined Fitness Function
In addition to the choice of scoring functions currently provided, i.e., GoldScore and
ChemScore, users can now implement their own scoring function, which can be accessed from
the GOLD front end by selecting User Defined Score:
GOLD User Guide 63
The GOLD scoring function Application Programming Interface (API) allows users to modify
the GOLD scoring-function mechanism in order to:
Calculate and write out additional data after each docking
Add extra terms to the scoring function
Implement a completely new scoring function
Full documentation for the GOLD Scoring Function Application Programming Interface (API)
is provided with the GOLD distribution:
UNIX: $GOLD_DIR/gold/api_doc/index.html
Windows: <InstallDir>/GOLD/gold/api_doc/index.html
where <InstallDir> is usually C:/Program Files/CCDC
see: GOLD Scoring Function Application Programming Interface (API) documentation.
A good knowledge of the C programming language is required together with some experience in
using GOLD.
Selecting Scoring Function Shared Object Name (UNIX) or Scoring Function DLL Name
(Windows) from the Fitness Function Settings panel enables you to specify a path to a
dynamically loadable shared object library.
GOLD uses shared objects (or dynamically loadable libraries) to allow new or modified scoring
functions to be plugged in. Two shared object files are relevant:
The main GOLD shared object, which is called libgold.so (UNIX) or gold.dll
(Windows)
The scoring-function shared objects which, by default, are called libfitfunc_dll.so
(UNIX), goldscore.dll or chemscore.dll(Windows)
On UNIX the file libgold.so is included in the GOLD distribution, together with two
versions of libfitfunc_dll.so, one implementing the normal GOLD scoring function and
the other implementing the ChemScore function.
On Windows the file gold.dll is included in the GOLD distribution, together with two files
called goldscore.dll, for implementing the normal GOLD scoring function, and
chemscore.dll, for implementing the ChemScore function.
It effectively provides a mechanism by which data may be intercepted and modified during
docking. Users may therefore post-process the results of a docking, or modify the GOLD
function, or implement their own scoring function, by building their own versions of
libfitfunc_dll.so (UNIX) or, e.g. goldscore.dll (Windows).
162 GOLD User Guide
Appendix E: GOLD Tutorials
In order to familiarise yourself with GOLD it is recommended that you work through the tutorial
examples provided. Tutorial 1 will go through the process of setting up and running an example
docking in some detail, subsequent tutorials will be more concise but will introduce other, more
advanced, aspects of the program.
For the purpose of these tutorials it is assumed that the user has access to either SILVER
(supplied with GOLD) or another visualisation program (for instructions on how to use SILVER
refer to the SILVER User Guide). In addition, if you wish to set up your own protein and ligand
input files ((see Section 3.1, page 9) and (see Section 4.1, page 30)) then you will need access to
a molecular modelling program. Full details of the software requirements needed in order to use
GOLD are given elsewhere (see Section 1., page 1).
Please note: due to the non-deterministic nature of GOLD results may vary from those described
in the tutorials.
Tutorial 1: A Step-By-Step Guide to Using GOLD (see page 163)
Tutorial 2: Handling of Metals in GOLD (see page 178)
Tutorial 3: Use of Hydrogen Bonding Constraints (see page 185)
Tutorial 4: Use of Substructure Based Distance Constraints (see page 194)
Tutorial 5: Docking with Water in the Binding Site (see page 202)
Tutorial 6: Docking with a Flexible Side Chain (see page 208)
Tutorial 7: Docking using Localised Soft Potentials (see page 215)
GOLD User Guide 161
Correlation of prediction quality with number of flexible torsions in ligand
Errors or Wrong 53.9 38.5 3.6
Prediction Result Max Avg Min
Good or Close 24 9.0 0
Errors or Wrong 14 8.4 3
64 GOLD User Guide
7. Ligand Flexibility
7.1 Flipping Ring Corners (see page 64)
7.2 Flipping Amide Bonds (see page 64)
7.3 Flipping Planar Nitrogens (see page 65)
7.4 Flipping Pyramidal Nitrogens (see page 66)
7.5 Intramolecular Hydrogen Bonds (see page 66)
7.6 Protonated Carboxylic Acids (see page 66)
7.7 Fixing Rotatable Bonds at Their Input Conformation (see page 66)
7.1 Flipping Ring Corners
Click on the Fitness & Search Options button and switch on the Flip ring corners check-box to
allow free corners of ligand rings to flip. This will result in GOLD performing a limited
conformational search of cyclic systems by allowing free corners of rings to flip above or below
the plane of their neighbouring atoms.
If the Flip ring corners check box is not switched on then rings will be held rigid at the input
conformation during docking.
The rules govening flipping of ring corners in GOLD are given in:
A. W. R. Payne and R. C. Glen, J. Mol. Graphics, 1993, 10, 74-91
7.2 Flipping Amide Bonds
During initialisation of the ligand amides (including thioamides, ureas, and thioureas) will be set
to the trans conformation.
Click on the Fitness & Search Options button and switch on the Flip amide bonds check box to
allow amides, thioamides, ureas, and thioureas in the ligand to flip between cis and trans.
In order to flip between cis and trans conformations the CO-NRR' torsion is first made planar (at
the initialised trans conformation).
Note: N,N disubstituted amides are not made planar; CO-NH
2
will be set so that the NH
2
group
is in plane with the CO (care must be taken that the input RNH
2
group itself is planar since
GOLD will not change this).
On occasion this flattening of the CO-NRR' torsion may result in clashes in the initialised
structure. If this occurs, it is advisable to turn off normalisation of amide bonds using the
FLATTEN_BONDS keyword in the gold.params file. In this case it is recommended to fix
the bond by switching off Flip amide bonds, or by explicitly specifying that the appropriate
rotatable bonds are held at their input conformation (see Section 7.7, page 66).
If the use of torsion angle distribution has been enabled (see Section 9., page 83) GOLD will
attempt to match amide torsions against the torsion angles distributions file. If an amide torsion
matches, this will override the Flip amide bonds flag setting.
Note: Data in the CSD show that both cis and trans conformations occur in ureas, it is therefore
GOLD User Guide 65
recommended that amide flipping be turned on in order to sample R-N-C(O)-N torsions of 0
degrees when docking ureas.
7.3 Flipping Planar Nitrogens
Click on the Fitness & Search Options button and switch on the Flip all planar R-NR1R2 check
box to allow planar trigonal nitrogens in the ligand (bound to sp2 carbons) to flip between cis
and trans conformations during docking (otherwise, they will be held fixed at the input
geometry).
It is possible to further specify whether or not ring-NHR and ring-NRR' groups are also allowed
to flip (i.e. rotate 180 deg.).
When running GOLD from the command line a number of keyword modifiers can be specified
after the flip_planar_n command in the gold.conf file:
flip_planar_n = <1|0> <keyword>
These keywords allow further control over the behaviour of this flag. The following keywords
can be used:
flip_ring_NRR
flip_ring_NHR
This allows flipping of ring-NHR or ring-NRR groups and is equivalent to using the including
ring-NHR and including ring-NRR settings in the interface.
fix_ring_NRR
fix_ring_NHR
This fixes these bonds at their input conformation and is equivalent to using the do not flip ring-
NHR and do not flip ring-NRR settings in the interface.
rot_ring_NRR
rot_ring_NHR
Use these keywords to allow free rotation of ring-NHR or ring-NRR groups.
For example, setting flip_planar_n = 1 fix_ring_NRR will allow all planar R
3
N
groups to flip, but will fix ring-NRR groups.
160 GOLD User Guide
Appendix D: GOLD Predictions in Second Series of Validation
Tests
3D plots of individual predictions are available on the CCDC web page.
The tables in this section list:
Subjective classification of GOLD predictions (see page 153)
Correlation of prediction quality with number of heavy atoms in ligand (see page 158)
Correlation of prediction quality with percentage of heavy atoms in ligand that can form
hydrogen bonds (see page 158)
Correlation of prediction quality with number of flexible torsions in ligand (see page 158)
Subjective classification of GOLD predictions
Correlation of prediction quality with number of heavy atoms in ligand
Correlation of prediction quality with percentage of heavy atoms in ligand that can form hydrogen
bonds
Subjective Result No. PDB Codes
Good 12 1BMA 1CIL 1FRP 2GBP 1GLP 1LAH
1LPM 1MMQ 1MRG 1TRK 1TNL 1WAP
Close 13 1ATL 1BBP 1BYB 1CBS 1COM 1FEN
1HFC 1IMB 1LCP 1NCO 1TNG 1TNI 1TPH
Some significant errors 6 2CMD 1CTR 2LGS 1LNA 1SNC 1UKZ
Wrong 3 1CDG 1LMO 1TYL
Prediction Result Max Avg Min
Good or Close 48 21.2 8
Errors or Wrong 29 19.9 10
Prediction Result Max Avg Min
Good or Close 60.0 29.5 0.0
GOLD User Guide 159
Correlation of prediction quality with protein resolution
Resolution () Total No. Good +
No. Close
No. Errors +
No. Wrong
> 1.0, <= 1.5 2 2 0
> 1.5, <= 2.0 44 34 10
> 2.0, <= 2.5 32 24 8
> 2.5, <= 3.0 20 11 9
> 3.0 1 0 1
66 GOLD User Guide
7.4 Flipping Pyramidal Nitrogens
Click on the Fitness & Search Options button and switch on the Flip pyramidal N check box to
allow pyramidal (i.e. non-planar sp3) nitrogens to invert during docking (otherwise, they will be
held fixed at the input geometry).
Given a non-planar group RRRN or tetrahedrally surrounded RRRNH, the Flip pyramidal N
switch enables flipping of the local stereochemistry around the nitrogen (the energy barrier for
this umbrella-like change of geometry around the nitrogen is low).
Flipping only changes the stereochemistry around RRRN and RRRNH nitrogens. It does not
affect other chiral centers.
7.5 Intramolecular Hydrogen Bonds
Click on the Fitness & Search Options button and switch on the Internal H-bonds check box to
allow intramolecular hydrogen bonds in the ligand to be formed during docking.
Use this with care as it can make ligands like methotrexate curl up.
7.6 Protonated Carboxylic Acids
Click on the Fitness & Search Options button and switch on the Protonated carboxylic acids
check box. Protonated carboxylic acids can then either be allowed to flip (i.e. rotate 180 deg.) or
rotate freely during docking.
If the Protonated carboxylic acids check box is not switched on then these groups will be held
rigid at their input conformation.
7.7 Fixing Rotatable Bonds at Their Input Conformation
GOLD was designed to dock flexible ligands into protein binding sites. However, sometimes it
can be useful to fix the geometry of part or all of the ligand e.g. in order to study the possible
binding of a pre-determined ligand geometry.
The ability to fix rotatable bonds at their input conformation is not available from the GOLD
front end. To do this, you need to edit the gold.conf file (see Section 15.1, page 126). The
following options are available:
To fix the rotatable bond between two specified atoms, add the following line to the
gold.conf file:
fix_rotatable_bond = <atom number 1> <atom number 2>
(numbering as in the input file).
Note: The ability to fix rotatable bonds at their input conformation is also available using the
rotatable_bond_override.mol2 file (see Section 5.4, page 38). This is particularly useful if
GOLD User Guide 67
docking a library of ligands that have a common substructure rather than the method above
which is more suitable when docking an individual ligand.
To fix all rotatable bonds in the ligand at their input conformation, add the following line to
the gold.conf file:
fix_rotatable_bond = all
To fix all non-terminal rotatable bonds (i.e. not -CH
3
, -OH, etc.), add the following line to the
gold.conf file:
fix_rotatable_bond = all_but_terminal
Note: When fixing all rotatable bonds at their input conformation (i.e. performing a rigid ligand
docking) GOLD will try to find the best orientation of the ligand in the binding site by mapping
donor-acceptor (as well as hydrophobic-hydrophobic) fitting points. However, GOLD will not
perform a local optimisation (simplex) on the final solution. This may lead to penalisation of
near-optimal conformations. Performing a few cycles of molecular-mechanics minimisation
before docking may help to take the ligand close to its local potential-energy minimum.
158 GOLD User Guide
Correlation of subjective classification with rms deviation
Correlation of prediction quality with number of heavy atoms in ligand
Correlation of prediction quality with percentage of heavy atoms in ligand that can form hydrogen
bonds
Correlation of prediction quality with number of flexible torsions in ligand
Rms Devn. () Total No. No. Good No. Close No. Errors No. Wrong
<= 0.5 8 8 0 0 0
> 0.5, <= 1.0 27 24 3 0 0
> 1.0, <= 1.5 20 7 13 0 0
> 1.5, <= 2.0 11 2 9 0 0
> 2.0, <= 2.5 2 0 2 0 0
> 2.5, <= 3.0 3 0 2 1 0
> 3.0 28 0 1 8 19
Prediction Result Max Avg Min
Good or Close 52 20.4 6
Errors or Wrong 55 24.3 9
Prediction Result Max Avg Min
Good or Close 66.7 31.9 8.7
Errors or Wrong 53.9 25.1 4.8
Prediction Result Max Avg Min
Good or Close 28 7.9 0
Errors or Wrong 40 11.4 0
GOLD User Guide 157
1ETR 4.23 1.55 5.65 12.81 Errors
1NIS 4.29 3.49 3.99 4.31 Wrong
2MCP 4.37 2.45 4.43 8.26 Wrong
6RSA 4.42 4.29 4.50 5.24 Errors
1RDS 4.78 1.49 6.00 11.00 Errors
1ACK 4.99 3.82 4.95 10.10 Errors
2AK3 5.08 2.41 5.43 10.20 Wrong
3CLA 5.45 2.22 5.59 6.88 Wrong
4FAB 5.69 1.24 3.60 6.69 Wrong
1BAF 6.12 4.96 5.76 6.17 Errors
1MCR 6.23 3.40 5.32 6.73 Wrong
2RO7 8.23 8.23 11.32 17.12 Wrong
1ICN 8.63 4.14 9.92 16.98 Wrong
1IGJ 9.42 9.08 10.43 13.21 Wrong
2MTH 10.12 0.90 4.65 10.12 Wrong
1TDB 10.48 4.47 8.57 12.06 Wrong
1HDC 10.49 1.65 10.64 13.50 Errors
1LIC 10.78 6.32 12.88 15.65 Errors
1ETA 11.21 7.19 9.69 12.84 Wrong
1IDA 12.12 1.41 6.84 14.43 Close
1EED 12.43 2.87 10.06 13.78 Wrong
1AAQ 12.85 1.52 7.04 15.35 Wrong
2PLV 13.92 9.11 12.65 16.21 Wrong
1HRI 14.01 11.70 14.40 16.97 Wrong
PDB Code Rms Devn.
of Top-
Ranked
Solution
Rms Devn.
of Closest
Solution
Average
Rms Devn.
of All
Solutions
Rms Devn.
of Worst
Solution
Subjective
Rating
68 GOLD User Guide
8. Setting and Releasing Constraints
8.1 Using the Constraint Editor (see page 68)
8.2 Distance Constraints (see page 69)
8.3 Hydrogen Bond Constraints (see page 73)
8.4 Region (Hydrophobic) Constraints (see page 77)
8.5 Template Similarity Constraints (see page 79)
8.6 Scaffold Match Constraint (see page 80)
8.1 Using the Constraint Editor
Click on the Edit Constraints button within the Fitness Function and Search Settings panel of
the GOLD front end. This will open the Constraints Editor:
To define a constraint, select a constraint type from those listed and specify the required settings.
The following constraint types are available:
Distance constraint, for use with individual ligands (see Section 8.2, page 69).
GOLD User Guide 69
Substructure based distance constraint, for use with multiple ligands that have a common
substructure or functional group (see Section 8.2, page 69).
Hydrogen bond constraint, for specifying a hydrogen bond between a particular ligand atom
and a particular atom in the protein (see Section 8.3, page 73).
Protein hydrogen bond constraint, for specifying that a particular protein atom should be
hydrogen-bonded to the ligand, but without specifying to which ligand atom (see Section 8.3,
page 73).
Region (hydrophobic) constraint, for biasing the docking towards solutions in which
particular regions of the binding site are occupied by specific ligand atoms or types of ligand
atom (see Section 8.4, page 77).
Template similarity constraint, for biasing the conformation of docked ligands towards a
given solution, or template (see Section 8.5, page 79).
Once the settings for a constraint have been specified click on the Add constraint or Update
selected constraint button to add the constraint definition to the Current Constraints.
Repeat the above procedure if you want to specify additional constraints.
To edit a constraint highlight the corresponding entry in the Current Constraints list, make the
required change and then hit the Add constraint or Update selected constraint button.
To remove a constraint from the Current Constraints list highlight the entry and hit the Delete
Selection button, or to remove all entries hit the Clear List button.
It is possible to instruct GOLD not to dock ligands when the specified constraint is physically
impossible to satisfy (e.g. if no suitable group is present in the ligand to form the required H-
bond constraint). This is done by selecting the Never dock a ligand when a constraints is
physically impossible check box in the Constraint Editor.
Click on Done in the Constraints Setup window when you are satisfied with the constraints
specified. The count of Constraints will be updated in the GOLD front end.
Note: When using constraints GOLD will be biased towards finding solutions in which the
specified constraint is satisfied. However, it is important to remember that such a solution is not
guaranteed (i.e. it is not possible to force a constraint to be satisfied in the final solution).
8.2 Distance Constraints
Any distance between a ligand and protein atom (or between two ligand atoms) can be
constrained to lie between minimum and maximum distance bounds. GOLD features two types
of distance constraint:
A standard distance constraint for use with individual ligands (see Section 8.2.1, page 70).
A substructure-based distance constraint for use with multiple ligands which have a common
functional group (see Section 8.2.3, page 72).
156 GOLD User Guide
1GLQ 1.35 0.97 3.77 9.47 Close
1PHG 1.35 1.35 3.57 4.59 Close
4EST 1.38 1.04 2.76 4.96 Close
1DRI 1.41 1.04 1.35 1.43 Close
4DFR 1.44 0.80 3.98 10.85 Good
1GHB 1.45 1.22 2.59 4.80 Close
5P2P 1.55 1.24 6.15 11.69 Close
4CTS 1.57 1.56 1.57 1.61 Close
3CPA 1.58 0.90 1.47 1.89 Close
1APT 1.62 1.62 6.50 9.97 Close
1TMN 1.68 1.46 5.25 10.61 Close
1DWD 1.71 1.71 6.50 9.56 Close
1FKG 1.81 1.67 6.26 11.32 Good
1HEF 1.87 1.87 10.01 14.04 Good
1TKA 1.88 0.86 2.54 5.09 Close
1BLH 1.95 0.53 1.60 2.31 Close
1RNE 2.00 1.79 6.70 10.90 Close
1EPB 2.08 2.03 6.50 12.91 Close
1IVE 2.16 1.23 2.05 2.17 Close
1AZM 2.52 2.25 2.46 2.56 Close
3GCH 2.64 1.67 1.99 2.64 Close
1EAP 3.00 1.33 3.78 10.48 Errors
1DID 3.72 0.51 3.59 5.88 Wrong
1ROB 3.75 0.80 3.83 7.43 Errors
1MUP 3.96 3.41 4.10 4.58 Wrong
1ACJ 4.00 0.23 3.73 5.52 Wrong
PDB Code Rms Devn.
of Top-
Ranked
Solution
Rms Devn.
of Closest
Solution
Average
Rms Devn.
of All
Solutions
Rms Devn.
of Worst
Solution
Subjective
Rating
GOLD User Guide 155
1ABE 0.86 0.73 1.12 3.06 Good
1ACO 0.86 0.80 1.49 3.43 Good
1COY 0.86 0.54 3.15 6.63 Good
8GCH 0.86 0.86 5.84 8.54 Good
1LST 0.87 0.47 0.84 1.07 Good
1XID 0.92 0.92 1.95 2.38 Close
2SIM 0.92 0.73 1.20 1.56 Good
1HDY 0.94 0.79 1.30 2.08 Good
3PTB 0.96 0.64 0.91 1.78 Good
1HSL 0.97 0.63 0.81 0.97 Good
2CGR 0.99 0.82 0.98 1.05 Good
1LDM 1.00 1.00 1.00 1.00 Close
1MRK 1.01 0.74 1.45 5.86 Good
1DIE 1.03 0.86 1.94 3.82 Close
6ABP 1.08 0.27 0.99 3.05 Close
1HYT 1.10 1.01 1.11 1.15 Good
1AEC 1.11 0.35 1.42 6.07 Good
4PHV 1.11 1.02 5.74 12.87 Good
3HVT 1.12 1.12 4.25 4.81 Close
1DBB 1.17 0.43 4.86 11.48 Good
2YHX 1.19 1.12 2.99 8.58 Close
6RNT 1.20 0.72 4.16 8.17 Close
1PHA 1.24 0.86 2.88 6.14 Close
1POC 1.27 1.20 2.73 12.37 Good
2DBL 1.31 1.29 8.65 16.31 Close
2PK4 1.34 1.11 1.83 7.01 Close
PDB Code Rms Devn.
of Top-
Ranked
Solution
Rms Devn.
of Closest
Solution
Average
Rms Devn.
of All
Solutions
Rms Devn.
of Worst
Solution
Subjective
Rating
70 GOLD User Guide
8.2.1 Setting Up a Distance Constraint (see page 70)
8.2.2 Method Used for Substructure-Based Distance Constraints (see page 71)
8.2.3 Setting Up Substructure-Based Distance Constraints (see page 72)
8.2.1 Setting Up a Distance Constraint
A distance between a specified ligand and protein atom (or between two ligand atoms) can be
constrained to lie between minimum and maximum distance bounds.
During a GOLD run, if a constrained distance is found to lie outside its bounds, a spring energy
term is used to reduce the fitness score, i.e.
E = kx
2
where:
x is the difference between the distance and the closest constraint bound;
k is a user-defined spring constant.
To constrain a distance, click on the Edit Constraints button to bring up the Constraint Editor.
Then, select Distance Constraint from the list of constraint types.
Specify the required settings using the protein and ligand atom numbers as defined in the MOL2
input files (if PDB input is used, use the sequence number). The maximum and minimum
separation of the constrained atoms must be entered (distances are in ), and the spring constant
must also be specified. For example:
GOLD User Guide 71
If the specified ligand atom is topologically equivalent to other atoms in the ligand (e.g. it is one
of the oxygen atoms of an ionised carboxylate group), then GOLD will compute the constraint
term using whichever of the equivalent atoms gives the best value automatically.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68).
8.2.2 Method Used for Substructure-Based Distance Constraints
It is possible to apply a distance constraint to multiple ligands which have a common functional
group.
The constraint forces GOLD to limit the distance between a protein atom and one atom of this
functional group. Docking solutions will be biased towards the specified distance range.
During docking the constraint will be applied to any ligands which contain the specified
substructure (matching is performed on the basis of the atom types and 2D connectivity) and the
resulting solutions will be biased towards the specified distance range. GOLD always accounts
for topology in the substructure.
154 GOLD User Guide
Rms deviations between GOLD predictions and observed ligand positions
PDB Code Rms Devn.
of Top-
Ranked
Solution
Rms Devn.
of Closest
Solution
Average
Rms Devn.
of All
Solutions
Rms Devn.
of Worst
Solution
Subjective
Rating
1ULB 0.32 0.32 0.38 0.53 Good
2CTC 0.32 0.24 0.38 1.94 Good
1MDR 0.36 0.36 0.50 0.65 Good
2ADA 0.40 0.40 0.47 6.20 Good
1SRJ 0.42 0.42 4.86 1.11 Good
3AAH 0.42 0.36 0.66 0.49 Good
1TPP 0.43 0.37 0.43 0.61 Good
1ASE 0.49 0.36 0.60 1.31 Good
1AHA 0.51 0.51 0.51 0.51 Good
1CBX 0.54 0.49 0.53 0.58 Good
1PBD 0.57 0.18 0.45 0.70 Good
2CHT 0.59 0.57 0.62 0.85 Good
1STP 0.69 0.56 0.67 0.98 Good
1XIE 0.69 0.69 2.20 4.93 Good
1FKI 0.71 0.71 1.81 6.22 Good
1DBJ 0.72 0.39 4.16 6.13 Good
2PHH 0.72 0.63 0.68 0.73 Good
1SLT 0.78 0.78 6.64 8.43 Good
7TIM 0.78 0.64 0.81 1.71 Good
3TPI 0.80 0.36 0.91 1.98 Good
1ACM 0.81 0.79 1.01 1.23 Good
1CPS 0.84 0.60 1.91 6.56 Good
1PHD 0.85 0.32 0.85 2.15 Good
GOLD User Guide 153
Appendix C: GOLD Predictions in First Series of Validation Tests
3D plots of individual predictions are available on the CCDC web page.
The tables in this section list:
Subjective classification of GOLD predictions (see page 153)
Rms deviations between GOLD predictions and observed ligand positions (see page 154)
Correlation of subjective classification with rms deviation (see page 158)
Correlation of prediction quality with number of heavy atoms in ligand (see page 158)
Correlation of prediction quality with percentage of heavy atoms in ligand that can form
hydrogen bonds (see page 158)
Correlation of prediction quality with number of flexible torsions in ligand (see page 158)
Correlation of prediction quality with protein resolution (see page 159)
Subjective classification of GOLD predictions
Subjective Result No. PDB Codes
Good 41 1ABE 1ACM 1ACO 1CBX 1COY 1CPS 1DBB
1DBJ 1FKG 1FKI 1HDY 1HEF 1HYT 1LST 1MDR
1MRK 1PBD 1PHD 1POC 1SRJ 1STP 1TPP 1ULB
1XIE 2ADA 2CGR 2CHT 2CTC 2PHH 2SIM 3AAH
3PTB 3TPI 4DFR 4PHV 7TIM 8GCH 1AEC 1AHA
1ASE 1HSL
Close 30 1BLH 1DIE 1DR1 1DWD 1EPB 1GHB 1GLQ 1IDA
1IVE 1LDM 1PHA 1PHG 1RNE 1SLT 1TKA 1TMN
1XID 2DBL 2PK4 2YHX 3CPA 3GCH 3HVT 4CTS
5P2P 6ABP 6RNT 1APT 1AZM 4EST
Some significant errors 9 1BAF 1EAP 1ETR 1HDC 1LIC 1RDS 1ROB 6RSA
1ACK
Wrong 19 1AAQ 1ACJ 1DID 1EED 1ETA 1HRI 1ICN 1IGJ
1MCR 1MUP 2R07 1NIS 1TDB 2AK3 2MTH 2PLV
3CLA 4FAB 2MCP
72 GOLD User Guide
Note: the substructure must be a sub-graph rather than a complete molecule.
As with normal distance constraints (see Section 8.2.1, page 70), the score is reduced for
unfavourable ligand solutions. The amount of decrease in the score is determined by a weight
term that the user must supply.
8.2.3 Setting Up Substructure-Based Distance Constraints
To use a substructure-based distance constraint, first create a file containing the substructure in
MOL2 format (e.g. substructure.mol2). It is recommended that you set atom types manually (see
Section 5.3, page 37) since an incomplete fragment can cause problems with automatic atom-
typing. The actual conformation of the group in this file is not important, as only the atom types
and 2D connectivity will be used.
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Substructure
Constraint from the list of constraint types.
Click on the Substructure file name button, then select the substructure file and hit Open.
Enter the Protein atom number and Substructure atom number to which the distance constraint
GOLD User Guide 73
applies (numbering as in the MOL2 files).
Specify the allowed range of separation by entering a Maximum separation and a Minimum
separation (distances are in ).
Enter the spring constant (i.e. the weight of the term). This causes a spring-based distance
constraint to be added for the specified substructure atom and protein atom. The weight specifies
the spring energy term; usually, a weight in the range of 5 to 10 will work well.
It is possible to define a distance constraint from a centroid of a ring in the ligand. To do this
specify an atom within the ring of interest and enable the Use ring center nearest to selected atom
in ligand check-box. The closest ring center to the selected atom will be used.
Note: when defining a distance constraint involving a ring center ensure that the maximum and
minimum separations are adjusted accordingly.
If the constraint refers to a substructure atom (and therefore a ligand atom) which is
topologically equivalent to other atoms (e.g. it is one of the oxygen atoms of an ionised
carboxylate group), GOLD will automatically compute the constraint term using whichever of
the equivalent atoms gives the best value.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68).
8.3 Hydrogen Bond Constraints
Two types of hydrogen bond constraints may be specified:
A hydrogen bond constraint: H Bond Constraint (see Section 8.3.1, page 73), which can be
used to force a hydrogen bond between a particular protein atom and a particular ligand atom.
A protein hydrogen bond constraint: Protein H Bond Constraint (see Section 8.3.3, page 75),
which can be used to specify that a particular protein atom should be hydrogen-bonded to the
ligand, but without specifying to which ligand atom.
8.3.1 Setting Up Hydrogen Bond Constraints (see page 73)
8.3.2 Method Used for Protein H Bond Constraints (see page 74)
8.3.3 Setting up Protein H Bond Constraints (see page 75)
8.3.1 Setting Up Hydrogen Bond Constraints
A ligand atom may be constrained to form a hydrogen bond to a particular protein atom. One
atom should be a donatable hydrogen atom (you must give the number of the hydrogen atom, not
the O or N atom to which it is attached) and the other should be an acceptor. The protein atom
should be available for ligand binding (i.e. solvent accessible).
Note: that this constraint does not work with metals.
The constraint is incorporated into the least-squares fitting routine used by GOLD. Thus, when
least-squares fitting is used to dock the ligand (by attempting to form hydrogen bonds encoded
within the chromosome) the constraint is added to the least-squares mapping. The constraint has
152 GOLD User Guide
Note: Certain docking-score terms are the product of a term dependent on the magnitude of a
particular physical contribution (e.g. hydrogen bonding) and a scale factor determined e.g. by a
regression coefficient.
The docking-score term descriptors included in the output file can therefore consist of weighted
terms, non-weighted terms or both (as specified in the GOLD Output Preferences).
Weighted terms will be indicated as such in the tag name, e.g.
Gold.Chemscore.Hbond.Weighted.
Gold.Goldscore.Inter-
nal.Correction
Internal ligand energy offset (see Section 6.9, page
62)
Gold.Chemscore.Zero-
Coef
The Chemscore zero coefficient (see Section 6.4.1,
page 49)
Gold.Chemscore.Rot Rotatable-bond freezing term contribution to Chem-
score value
(see Section 6.4.5,
page 56)
Gold.Chemscore.Fitness Total Chemscore fitness value of docked ligand (see Section 6.4.1,
page 49)
Gold.Chemscore.Hbond Protein-ligand H-bond contribution to Chemscore
value
(see Section 6.4.3,
page 52)
Gold.Chemscore.Lipo Protein-ligand lipophilic contribution to the Chem-
score value
(see Section 6.4.4,
page 54)
Gold.Chemscore.Metal Metal-binding contribution to Chemscore value (see Section 6.4.4,
page 54)
Gold.Chem-
score.internal_Hbond
Internal ligand intramolecular H-bond contribution to
Chemscore value
(see Section 6.4.3,
page 52)
Gold.Chemscore.DEClash Protein-ligand clash penalty to the Chemscore value (see Section 6.4.6,
page 56)
Gold.Chem-
score.DEInternal
Internal ligand torsional strain penalty to the Chem-
score value
(see Section 6.4.6,
page 56)
Gold.Chemscore.DG Free energy change (that occurs on ligand binding)
contribution to Chemscore value
(see Section 6.4.1,
page 49)
Gold.Chemscore.Cova-
lent
Covalent bonding contribution to Chemscore value (see Section 6.4.7,
page 58)
Gold.Chemscore.Con-
straint
Constraint contribution to Chemscore value (see Section 6.4.8,
page 58)
Gold.Chemscore.CHO-
Score
Contribution for weak CH...O H-bonds
(see Section 6.7, page
59)
Gold.Chemscore.Inter-
nal.Correction
Internal ligand energy offset (see Section 6.9, page
62)
Name Explanation See
GOLD User Guide 151
Appendix B: Additional Tags in Output Files
Solution output files for the docked ligand(s) can contain additional information such as the
scoring function terms and the rotated protein hydrogen atom positions that were generated
during the docking.
This information can be written to SD file tags; for MOL2 files, these tags are written to
comment blocks. This additional information is particularly important when post-processing
docking results with SILVER. It is possible to control the information written to solution files
from the Output Preferences window (see Section 14.2, page 111).
The table below lists the tag names that you are likely to see in GOLD solution files:
Name Explanation See
Gold.Protein.ActiveR-
esidues
List of protein residues used to define the binding site. (see Section 3.8.5,
page 27)
Gold.Protein.Rota-
tedAtoms
Optimised positions of polar protein hydrogen atoms
that are generated during docking.
(see Section 14.6,
page 115)
Gold.Protein.Rotated-
WaterAtoms
Optimised positions of water hydrogen atoms gener-
ated during docking
(see Section 3.4, page
16)
Gold.Protein.Rotated-
Torsions
Optimised torsions for rotatable bonds in the ligand.
Also for protein side chain torsions which have been
specified as being allowed to rotate during docking
(see Section 3.6, page
18)
Gold.Id.Protein Enabling the association of a solution with its protein
Gold.Goldscore.Fitness Total GoldScore fitness value of docked ligand (see Section 6.2, page
46)
Gold.Goldscore.Exter-
nal.Hbond
Protein-ligand H-bond contribution to GoldScore
value
(see Section 6.2, page
46)
Gold.Goldscore.Exter-
nal.Vdw
Protein-ligand vdw contribution to GoldScore value (see Section 6.2, page
46)
Gold.Goldscore.Inter-
nal.Hbond
Internal ligand intramolecular H-bond contribution to
GoldScore value
(see Section 6.2, page
46)
Gold.Goldscore.Inter-
nal.Vdw
Internal ligand vdw contribution to GoldScore value (see Section 6.2, page
46)
Gold.Goldscore.Inter-
nal.Torsion
Internal ligand torsion-strain contribution to Gold-
Score value
(see Section 6.2, page
46)
Gold.Goldscore.Cova-
lent.Energy
Covalent bonding contribution to Goldscore value (see Section 6.2, page
46)
Gold.Goldscore.Con-
straint.Score
Constraint contribution to GoldScore value (see Section 6.2, page
46)
74 GOLD User Guide
a weight of 5 relative to a normal hydrogen bond taken from the chromosome.
To specify a hydrogen bond constraint, click on the Edit Constraints button to bring up the
Constraint Editor. Then, select H-Bond Constraint from the list of constraint types.
Specify the ligand and protein atom numbers as defined in the MOL2 input files (if PDB input is
used, use the sequence number):
The hydrogen bond constraint weighting can be altered within the # FITNESS FUNCTION
section of the GOLD parameters file by changing the value of the parameter CONSTRAINT_WT.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68).
8.3.2 Method Used for Protein H Bond Constraints
A protein hydrogen bond constraint can be used to specify that a particular protein atom should
be hydrogen-bonded to the ligand, but without specifying to which ligand atom.
GOLD will be biased towards finding solutions in which the specified protein atoms form
GOLD User Guide 75
hydrogen bonds. The fitness score of a given docking will be penalised by a user specified value
c for every protein H-bond constraint that is not satisfied (i.e. for every protein atom that you
have specified should form a hydrogen bond but does not).
GOLD assesses the geometry of each required hydrogen bond on a scale of 0 to 1, with 1
denoting perfect. If this geometry weight for the constrained Hbond falls below the Minimum H-
bond geometry weight specified by the user, a penalty will be applied to the score for the
unfulfilled hydrogen bond. i.e. it will not be considered to be an H-bond and will therefore
contribute a penalty to the fitness score.The magnitude of this penalty is equal to the weight
specified for the constraint.
Each trial ligand docking in a genetic algorithm run is generated by a least-squares fit of
mapping points (H-bonding or hydrophobic binding points on the protein with complementary
points on the ligand). The inclusion of a protein H-bond constraint will ensure that at least one of
the specified protein atoms is included as one of the mapping points. i.e. use of the specified
points is enforced at the mapping stage of the algorithm.
If a ligand simply does not contain sufficient complementary hydrogen-bonding atom(s) to
satisfy the specified protein H-bond constraints (e.g. you require an H-bond to a protein acceptor
but the ligand contains no donors), then GOLD can be set up not to dock ligands when the
specified constraint is physically impossible to satisfy (see Section 8.1, page 68).
8.3.3 Setting up Protein H Bond Constraints
A protein hydrogen bond constraint can be used to specify that a particular protein atom should
be hydrogen-bonded to the ligand, but without specifying to which ligand atom.
To do this, click on the Edit Constraints button to bring up the Constraint Editor. Then, select
Protein H-Bond Constraint from the list of constraint types.
Specify which protein atoms are to form hydrogen bonds by typing their atom numbers, as
defined in the MOL2 input file, into the Protein atom required to form H-bond entry box.
Note: Either a donatable hydrogen atom (you must give the number of the hydrogen atom, not
the O or N atom to which it is attached) or an acceptor can be specified. The protein atom should
be available for ligand binding (e.g. solvent accessible). This constraint does not work with
metals.
150 GOLD User Guide
Bond types:
single 1
double 2
triple 3
aromatic ar
amide am
delocalised, e.g. in carboxylate, guanidinium ar
GOLD User Guide 149
Appendix A: List of Atom and Bond Types
GOLD uses SYBYL atom and bond types as follows:
Atom types:
Hydrogen H
Carbon sp
3
C.3
Carbon sp
2
C.2
Carbon sp C.1
Carbon aromatic C.ar
Carbocation (guanadinium) C.cat
Nitrogen sp
3
N.3
Nitrogen sp
2
N.2
Nitrogen sp N.1
Nitrogen aromatic, e.g. in pyridine N.ar
Nitrogen amide N.am
Nitrogen trigonal planar, e.g. in nitro, pyrrole N.pl3
Nitrogen sp
3
positively charged, e.g. in lysine
N.4
Oxygen sp
3
O.3
Oxygen sp
2
O.2
Oxygen in carboxylates and phosphates O.co2
Sulphur sp
3
S.3
Sulphur sp
2
S.2
Sulphoxide sulphur S.o
Sulphone sulphur S.o2
Phosphorus sp
3
P.3
Halogens, metals normal element symbols, e.g. F, Cl,
Ca, Zn
76 GOLD User Guide
The Constraint weight is the strength of bias applied to the formation of a specified hydrogen
bond in the least squares mapping algorithm within GOLD. The Constraint weight is also the
value of the penalty applied to the fitness score for each constrained H bond that is not formed.
The Minimum H bond geometry weight is a user defined score that determines how good a
hydrogen bonding interaction has to be in order for it to be considered a hydrogen bond by
GOLD. The Minimum H bond geometry weight takes a range of values from 0 to 1, by default
this value is set at 0.005.
For a given protein H bond constraint more than one protein atom number can be entered in the
Protein atom entry box. This will instruct GOLD to use an either-or type of constraint during
docking. For example, specifying two protein atoms, acceptor m and acceptor n, separated by a
space, will result in the constraint being satisfied if an H bond is formed to either m or n during
docking. This is of use when defining constraints involving, for example, carboxylates where it
is not important which oxygen atom forms an H bond, provided that one of them does.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible
GOLD User Guide 77
to specify several different protein H bond constraints, with different weights for each
constraint.
8.4 Region (Hydrophobic) Constraints
This constraint can be used to bias the docking towards solutions in which particular regions of
the binding site are occupied by specific ligand atoms (or types of ligand atom, e.g. hydrophobic
atoms).
8.4.1 Method Used for Region (Hydrophobic) Constraints (see page 77)
8.4.2 Setting Up Region (Hydrophobic) Constraints (see page 77)
8.4.1 Method Used for Region (Hydrophobic) Constraints
This constraint can be used to bias the docking towards solutions in which particular regions of
the binding site are occupied by specific ligand atoms (or types of ligand atom).
For each region (hydrophobic) constraint specified a sphere is placed at an explicitly-defined
position (using x,y,z coordinates) within the binding site. Each sphere is assigned a user-defined
radius, so a sphere can be adjusted if required, e.g, to fill an entire pocket in the binding-site.
Minimum settable radius as 0.5 .
A contribution (determined according to a user-specified weighting) is then added to the score
for each specified non-hydrogen ligand atom that lies within the designated sphere.
Note: A contribution is added to the score for each atom located within the sphere, (i.e. the total
contribution will depend on the number of atoms found in the region of interest and ultimately
the ligand-accessible volume of the region).
The ligand atoms used in the constraint can be specified explicitly from a list of atom numbers
(as defined in the MOL2 input file). Alternatively, it is possible to use all hydrophobic ligand
atoms, or to use only those hydrophobic atoms in aromatic rings. Atoms considered to be
hydrophobic include:
Carbon atoms bound to at least two H or C atoms.
Atoms typed C.cat.
Atoms typed S.3 and bound to two carbons.
H atoms bound to an sp2, sp3 or aromatic carbon (Note: only heavy atoms found within the
sphere will contribute to the score).
Details of the region (hydrophobic) constraint calculation, including the final contribution to the
fitness score, are given in the ligand log file (see Section 14.10, page 118).
8.4.2 Setting Up Region (Hydrophobic) Constraints
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Region
(Hydrophobic) Constraint from the list of constraint types.
148 GOLD User Guide
20. Acknowledgments
GOLD was written by Gareth Jones (University of Sheffield, UK) in a DTI LINK collaboration
with GlaxoWellcome and the Cambridge Crystallographic Data Centre (CCDC).
Funding was provided by the Biotechnology and Biological Sciences Research Council, the
Department of Trade and Industry, the Medical Research Council, GlaxoWellcome Ltd and
CCDC.
Peter Willett (University of Sheffield), Robert Glen (Wellcome), Andrew Leach
(GlaxoWellcome) and Jacques Barbanton (Lipha Pharmaceuticals) are also thanked for
significant contributions to the development of GOLD.
ChemScore in GOLD was implemented by Astex Technology, Cambridge, UK.
CCDC staff involved in GOLD are Jason Cole, Simon Bowden and Robin Taylor.
One of the torsion libraries supplied with GOLD was developed by Gerhard Klebe and Thomas
Mietzner (BASF).
GOLD User Guide 147
19. References
Molecular Recognition of Receptor Sites Using a Genetic Algorithm with a Description of
Desolvation
G. Jones, P. Willett and R. C. Glen
J. Mol. Biol., 245, 43-53, 1995
Development and Validation of a Genetic Algorithm for Flexible Docking
G. Jones, P. Willett, R. C. Glen, A. R. Leach and R. Taylor,
J. Mol. Biol., 267, 727-748, 1997
A New Test Set for Validating Predictions of Protein-Ligand Interactions
J. W. M. Nissink, C. Murray, M. Hartshorn, M. L. Verdonk, J. C. Cole and R. Taylor
Proteins, 49(4), 457-471, 2002
Life-science Applications of the Cambridge Structural Database
R.Taylor
Acta Cryst., D58, 879-888, 2002
Improved Protein-Ligand Docking using GOLD
M. L. Verdonk, J. C. Cole, M. J. Hartshorn, C. W. Murray, R. D. Taylor
Proteins, 52, 609-623, 2003
Virtual Screening Using Protein-Ligand Docking: Avoiding Artificial Enrichment
Marcel L. Verdonk, Valerio Berdini, Michael J. Hartshorn, Wijnand T. M. Mooij, Christopher W.
Murray, Richard D. Taylor, and Paul Watson,
J. Chem. Inf. Comput. Sci., 44, 793-806, 2004
Protein-Ligand Docking and Virtual Screening with GOLD
J. C. Cole, J. W. M. Nissink, R. Taylor in Virtual Screening in Drug Discovery (Eds. B.
Shoichet, J. Alvarez), Taylor & Francis CRC Press, Boca Raton, Florida, USA (2005).
Modeling Water Molecules in Protein-Ligand Docking Using GOLD
Marcel L. Verdonk, Gianni Chessari, Jason C. Cole, Michael J. Hartshorn, Christopher W.
Murray, J. Willem M. Nissink, Richard D. Taylor, and Robin Taylor,
J. Med. Chem., 48, 6504-6515, 2005
Comparing protein-ligand docking programs is difficult
Jason C. Cole, Christopher W. Murray, J. Willem M. Nissink, Richard D. Taylor, Robin Taylor
Proteins, 60, 325-332, 2005
78 GOLD User Guide
Specify the ligand atoms to be used in the constraint by selected either All hydrophobic ligand
atoms, Hydrophobic ligand atoms in aromatic rings, or User-specified list. If User-specified list is
selected then enter the ligand atom numbers (as defined in the MOL2 input file) into the Ligand
atoms entry box. Atom numbers should be separated by spaces.
Specify the position of the centre of the sphere (defined using x,y,z coordinates), and the radius
of the sphere (distances are in ).
A score contribution must also be specified. This is the value that will be added to the fitness
score for each specified non-hydrogen ligand atom found within the sphere region.
Note: the total contribution added will therefore depend on the number of atoms located within
the sphere.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible
to define multiple region (hydrophobic) constraints.
GOLD User Guide 79
8.5 Template Similarity Constraints
This constraint can be used to bias the conformation of docked ligands towards a given solution,
or template.
8.5.1 Method Used for Template Similarity Constraints (see page 79)
8.5.2 Setting Up a Template Similarity Constraint (see page 79)
8.5.1 Method Used for Template Similarity Constraints
This constraint will bias the conformation of docked ligands towards a given solution. This
solution, or template, can, for example, be another ligand in a known conformation, a common
core (useful when docking ligands of a combinatorial set), or it may just be a large substructure
that is expected, or known, to bind in a certain way.
The template must be supplied as a MOL2 file or PDB file.
Unlike the distance-based constraints, which reduce the score for ligands that adopt
unfavourable orientations, this constraint will add an energy term to the score based on the
similarity between the ligand being docked and the template provided. The similarity between
the two is evaluated as a Gaussian overlap term.
The similarity constraint can be applied in three ways that differ in the way that the overlap
between ligand and template is calculated. The similarity can be evaluated:
by using the overlap between all donor atoms in the template and the ligand being docked.
by using the overlap between all acceptor atoms in the template and the ligand being docked.
by using the overlap of all atoms of the template (this can be regarded as a ligand-shape
constraint).
The energy term to be added is calculated as similarity times weight (the similarity value is
between 0 and 1, where 1 indicates identity of template and ligand).
Note: If you wish to place a fragment at an exact specified position in the binding site, as
opposed to biasing the docking, use the scaffold match constraint (see Section 8.6, page 80).
8.5.2 Setting Up a Template Similarity Constraint
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Template
Similarity Constraint from the list of constraint types.
Fill in the form to specify the similarity type to be used [H-bond donor overlap, H-bond-acceptor
overlap, or shape overlap (see Section 8.5.1, page 79)]; the similarity template file; and the
weight of the constraint.
146 GOLD User Guide
identify_ligand.py can be invoked from the command line. The structure of the command is:
identify_ligand.py <ligand data file> <ligand number>
Note: identify_ligand.py is a Python script and as such requires a working installation of Python
(http://www.python.org).
GOLD User Guide 145
For example, the table of rms deviations below for nine dockings of a molecule produces the
following clustering with the complete linkage method:
18.4 identify_ligand.py
identify_ligand.py can be used to extract a specific ligand description from PDB SDFile or
MOL2 format input files.
It requires a filename and a ligand number (n) as arguments and then locates the nth ligand in the
file. If any descriptive information, such as the ligand name, is available for that ligand, it is then
displayed.
2 3 4 5 6 7 8 9
1 0.8 1.1 1.0 1.0 1.4 2.3 5.0 4.6
2 0.9 1.1 1.1 1.2 2.3 5.2 4.6
3 0.4 0.8 0.9 2.3 5.0 4.5
4 0.6 1.1 2.3 4.9 4.5
5 1.3 2.0 4.9 4.5
6 1.8 5.1 4.4
7 5.3 4.5
8 2.4
Step Distance between
clusters being
merged
Clusters
1 0.40 1 | 2 | 3, 4 | 9 | 5 | 6 | 7 | 8 |
2 0.84 1 | 2 | 3, 4, 5 | 9 | 8 | 6 |
3 0.84 1, 2 | 7 | 3, 4, 5 | 9 | 8 | 6 |
4 1.13 1, 2, 3, 4, 5 | 7 | 6 | 9 | 8 |
5 1.42 1, 2, 3, 4, 5, 6 | 7 | 8 | 9 |
6 2.35 1, 2, 3, 4, 5, 6, 7 | 9 | 8 |
7 2.38 1, 2, 3, 4, 5, 6, 7 | 8, 9|
8 5.28 1, 2, 3, 4, 5, 6, 7, 8, 9 |
80 GOLD User Guide
The similarity template file should contain the template molecule or fragment in its docked
position (i.e. expressed with respect to the same coordinate frame as the protein and with the
coordinates required to place it in the correct pose).
The weight term determines the maximum energy term that would be added to the score in the
case of perfect overlap between ligand and template. As an initial value for this term, we suggest
a value between 5 and 30.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68). Using the Constraints Editor it is possible
to define multiple constraints, e.g. one for donors and one for acceptors.
8.6 Scaffold Match Constraint
The scaffold match constraint can be used to place a fragment at an exact specified position in
the binding site, the geometry of the fragment will not be altered during docking.
GOLD User Guide 81
8.6.1 Method Used for Scaffold Match Constraint (see page 81)
8.6.2 Setting Up Scaffold Match Constraints (see page 81)
8.6.1 Method Used for Scaffold Match Constraint
This constraint will attempt to a place a ligand onto a given scaffold location. The scaffold, can,
for example, be a common core, or fragment (useful when docking ligands of a combinatorial
set), or it may just be a substructure known to adopt a certain binding position.
The scaffold must be supplied as a MOL2. The file should contain the scaffold fragment in its
docked position (i.e. expressed in the same coordinate frame as the protein and with the
coordinates required to place it in the correct pose).
Note: It is important that the Sybyl atom and bond types in the scaffold mol2 file match those in
the scaffold portion of the ligand. The scaffold matching algorithm matches heavy atoms only.
However it is recommended that the scaffold have hydrogens correctly placed on all appropriate
atoms other than the unfulfilled valency at the substitution point, which must not be blocked by
hydrogen.
Unlike the template similarity constraint, which will bias the docking by adding an energy term
to the score based on the similarity between the ligand being docked and the template provided,
this constraint is enforced at the mapping stage in GOLD. Ligand placements are generated
using a best least-squares fit with the scaffold heavy atom positions. i.e. this constraint forces all
atoms on the matching portion of the ligand to lie very close, or coincident, with the
corresponding scaffold. There is no S(con) contribution to the fitness score to bias dockings.
How closely ligand atoms fit onto the scaffold is governed by a user specified weight. Setting a
higher weight will force the ligand to be placed onto the scaffold locations more strictly. A
default weight of 5.0 is used.
Note: setting high weightings can have a detrimental effect on the fitness score if the placement
results in e.g. bad protein-ligand clashes. If desired, values below 1 can be used to achieve a
more lenient overlay.
Symmetry effects (such as the flipping of a phenyl ring by 180 degrees) are not taken into
account during matching of the ligand onto the scaffold. Therefore, a scaffold that will give a
unique match should ideally be provided.
For a given ligand, it is not possible to match multiple scaffolds at the same time. Scaffolds are
evaluated in the order supplied by the user and the scaffold that matches the ligand first will be
used. This means that it is possible to specify two or more different scaffolds, and GOLD will
use the scaffold that matches the ligand first. This can be useful when docking multiple different
series of compounds.
8.6.2 Setting Up Scaffold Match Constraints
Click on the Edit Constraints button to bring up the Constraint Editor. Then, select Scaffold
Match Constraint from the list of constraint types.
144 GOLD User Guide
C:\Program Files\CCDC\GOLD\gold\d_win32\bin\smartrms_win32.exe [-hv]
conformation_1 conformation_2
The flags are:
h use heavy atoms only (the calculation easily becomes intractable if Hs are included).
v verbose output.
conformation_1 and conformation_2 are MOL2 files containing the two conformations.
18.3 rms_analysis
rms_analysis calculates an rms difference matrix for a set of structures (as MOL2 files) and
performs hierarchical cluster analysis. A graph isomorphism algorithm is used to determine
optimal rms values.
rms_analysis can be invoked from the command line.
The structure of the command is dependent on the platform being used:
UNIX:
$GOLD_DIR/utilities/rms_analysis -method [simple|complete|group_average] <file1>.mol2
<file2>.mol2 <file3>.mol2 <file4>.mol2...
Note: this command will only work if users have their GOLD_DIR environment variable
correctly set. To e.g. carry out a simple cluster analysis for the files file1.mol2 and file2.mol2,
the following command would be used:
$GOLD_DIR/utilities/rms_analysis -method simple file1.mol2 file2.mol2
Windows (via the command prompt):
<install_dir>\gold\d_win32\bin\rms_analysis_win32.exe -method
[simple|complete|group_average] <file1>.mol2 <file2>.mol2 <file3>.mol2 <file4>.mol2...
where <install_dir> is the GOLD installation directory. If specifying the full path, the
command will need to be in inverted commas, e.g. :
C:\Program Files\CCDC\GOLD\gold\d_win32\bin\rms_analysis_win32.exe -method
[simple|complete|group_average] <file1>.mol2 <file2>.mol2 <file3>.mol2 <file4>.mol2...
Choose simple for single linkage cluster analysis, complete for complete linkage, group_average
for group average.
GOLD User Guide 143
18.2 smart_rms
smart_rms calculates the rms difference between two conformations of the same structure, while
taking account of symmetry effects (such as the flipping of a phenyl ring by 180 degrees). Using
a graph isomorphism algorithm, an rms score is calculated for each way of mapping the
molecule onto itself.
smart_rms can be invoked from the command line. The following platform-dependent
commands should be used.
UNIX platforms:
$GOLD_DIR/untilities/smart_rms [-hv] conformation_1 conformation_2
Windows platforms (at the Windows command prompt):
<install_dir>\gold\d_win32\bin\smartrms_win32.exe [-hv] conformation_1 conformation_2
where <install_dir> is the GOLD installation directory. If specifying the full path, the
command will need to be in inverted commas, e.g. :
82 GOLD User Guide
The scaffold structure file should contain the scaffold molecule or fragment in its docked
position (i.e. within the same coordinate frame as the protein).
The Scaffold Match Constraint Weight determines how closely ligand atoms fit onto the
scaffold. Setting a higher weight will force the ligand to be placed onto the scaffold locations
more strictly.
By default, all heavy atoms in the supplied scaffold structure file will be used for matching.
However, it is possible to specify only a subset of those atoms in the scaffold structure (these
may include non-heavy atoms). Atoms should be specified using the atom indices as defined in
the scaffold structure file (indices should be separated by a single space). Limiting the number of
atoms to be matched can be useful for large, rigid scaffolds. In such a case, specifying only a few
atoms distributed throughout the scaffold can be sufficient to obtain a good 3D superimposition.
Click on the Add constraint or Update selected constraint button to add the constraint definition
to the Current Constraints (see Section 8.1, page 68).
GOLD User Guide 83
9. Torsion Angle Distributions
9.1 Basic Use of Torsion Angle Distributions (see page 83)
9.2 Choice of Torsion Angle Distribution Files (see page 83)
9.3 Editing Torsion Angle Distribution Files (see page 84)
9.4 Matching Torsion Angle Distributions at Run Time (see page 88)
9.1 Basic Use of Torsion Angle Distributions
Torsion angle distributions extracted from the Cambridge Structural Database (CSD) can be
input to GOLD. These distributions are used to restrict the ligand conformational space sampled
by the genetic algorithm.
Using torsion angle distributions in this way will not make GOLD go any faster. However, it
may improve the chances of GOLD finding the correct answer by biasing the search towards
ligand torsion-angle values that are commonly observed in crystal structures. It may also
improve convergence and so make GOLD usable with faster settings (see Section 11.3, page 94).
To enable the use of torsion angle distributions click on the Fitness & Search Options button in
the Fitness Function and Search Settings panel in the GOLD front end, then in the resulting
window switching on the check box labelled Use torsion angle distributions from the CSD.
9.2 Choice of Torsion Angle Distribution Files
Three torsion angle distribution files are provided:
gold.tordist - this is the default file.
gold.tordist.new - this contains all the torsions in gold.tordist and many more new
distributions. However, many of these newer torsions have very few hits in the CSD and no
significant improvement was found when using this new file in GOLD.
mimumba.tordist - this contains all the torsional distributions used in the MIMUMBA
program (Klebe and Mietzner, J.Comput.-Aided Mol.Des., 8, 583-606, 1994).
Click on the Distributions File button in the GOLD front end to pick a torsion angle distribution
file. Alternatively, type the required file into the entry box.
It is possible to customise torsion angle distribution information by editing one of the standard
torsion angle distribution files (see Section 9.3, page 84).
142 GOLD User Guide
18. Utility Programs
A number of utility programs are supplied to assist in the analysis of GOLD docking results
The following utility is available in the sgi_utils directory of the GOLD distribution:
18.1 grommitt (see page 142) - used for simple visualisation of dockings, available for SGI
users running IRIX only.
The following utilities are available in the utilities directory of the GOLD distribution:
18.2 smart_rms (see page 143) - computes rms deviations between two conformations of
the same structure.
18.3 rms_analysis (see page 144) - performs cluster analysis on a set of docking solutions.
18.4 identify_ligand.py (see page 145) - extracts descriptive information such as ligand
name for a specified structure record in a file.
18.1 grommitt
grommitt is a simple molecular viewer for examining binding modes and available for SGI users
running IRIX only.
When GOLD is being run interactively, grommitt can be used to display the current top solution
from a genetic algorithm run. To do this, click on the Display/Output Options button in the
GOLD front end (see Section 2.2, page 4).
grommitt can also be opened from the command line, e.g. to display overlays of SYBYL MOL2
files. The structure of the command is:
grommitt [-chp] <files>
The flags are:
c each molecule is coloured differently. Normally, molecules are coloured by atom type.
h only display heavy atoms.
p pretty (but slow) display.
<files> is a list of SYBYL MOL2 and/or PDB files.
grommitt is useful for visualising a set of GOLD solutions, e.g. to see at a glance if all solutions
are identical or whether there are several different binding modes. For example:
%grommitt -h gold_soln*
displays the window:
GOLD User Guide 141
Non-parametric tests indicate that GOLD score and activity are not significantly correlated
(Spearman r
s
= -0.564, p = 0.056; Kendall t =-0.382, p = 0.086).
There is not a statistically significant relationship between the GOLD score and activity. It is
worth noting that the compounds are all structurally similar and all are active.
17. Context-Dependent Help
Context-dependent help is available in the front end, by clicking the middle mouse button on the
item for which information is required. For example, clicking on:
brings up this help window:
84 GOLD User Guide
9.3 Editing Torsion Angle Distribution Files
To edit the torsion angle distribution file click on the Edit Distributions button in the Fitness
Function and Search options window (accessible by clicking on the Fitness & Search Options
button in the Fitness Function and Search Settings panel in the GOLD front end).
If you are using the default torsion angle distribution file, it will be copied to the current
directory.
The format of entries in the torsion angle distribution file is quite strict: incorrect editing of the
file may cause GOLD to behave in unexpected ways or even to crash.
9.3.1 Format of Torsion Angle Distribution File Header (see page 84)
9.3.2 Format of Torsion Angle Distributions (see page 85).
9.3.3 Example Torsion Angle Distributions (see page 87).
9.3.4 Extracting Torsion Angle Distributions from the Cambridge Structural Database (see
page 88)
9.3.1 Format of Torsion Angle Distribution File Header
The first section of the torsion angle distribution file sets parameters and tells GOLD what to do
with the distributions.
N_BINS is the number of bins used in the torsion histogram.
REMOVE_HIGH_ENERGY and DELTA_E are parameters that can be used to control the
filtering out of high-energy torsion angles.
If torsion angle distributions are used, GOLD will no longer sample over 360 degrees but will
constrain the torsion to values contained in the histogram. However, if a histogram contains a
large number of entries, there may be some high-energy torsions within the histogram. GOLD
therefore provides a method for filtering out such high-energy torsions: set
REMOVE_HIGH_ENERGY = 1 and DELTA_E = E to remove those bars in the histogram that
correspond to torsions that are E kcal/mol higher in energy than the most populated state. The
ground state of the torsion is assumed to correspond to the maximum peak in the torsional
histogram. The energy difference between this ground state and any other peak in the torsion
angle histogram is then assumed to be approximately given by the partition function.
The following table indicates the relationship between the value of DELTA_E and the ratio high/
low, where high is the height of the biggest bar in the histogram and low is the height below
which bars will be removed from the histogram:
GOLD User Guide 85
For example, if REMOVE_HIGH_ENERGY=1 and DELTA_E = 2.5, those bars which are 1/69th
or less of the height of the largest bar will be removed from the histogram and torsion angles
corresponding to these bars will never be sampled by the genetic algorithm.
The relationship between DELTA_E and ratio, based on the partition function, is:
ratio = exp (DELTA_E/0.5898)
9.3.2 Format of Torsion Angle Distributions
Each torsion angle distribution entry comprises three lines: the first line is the name of the
torsion angle; the second line is the definition of the torsion angle; the third line is the histogram.
The histogram should be a list of space-separated integers. The ith integer should be the number
of observations in the torsion-angle range of the ith bin. There should be N_BINS integers in all.
The first bin starts at -180 degrees and the last bin ends at +180.
Torsion angle distributions are defined using Backus-Naur Form (BNF) grammar, as follows (all
the symbols in the table are part of the grammar except for ||, which is used to indicate
alternative fields):
DELTA_E ratio
3.0 161
2.5 69
2.0 30
TORSION NODE | NODE | NODE | NODE | ||
NODE | NODE | NODE | NODE | DIRECTIVE ||
NODE | NODE | NODE | NODE | DIRECTIVE | DIRECTIVE
DIRECTIVE expand <min> <max> || period <min> <max>
NODE ATOM || ATOM (NEIGHBOURS)
NEIGHBOURS NEIGHBOUR_NODE || NEIGHBOUR_NODE NEIGHBOURS
NEIGHBOUR_NODE NODE || HYDROGENS
HYDROGENS 0H || 1H || 2H || 3H
ATOM ATOM_DEF || ATOM_DEF [FRAGMENT]
FRAGMENT ribose || adenine || uracil || benzene
ATOM_DEF TYPE_DEF || LINKAGE<no space>TYPE_DEF
140 GOLD User Guide
Non-parametric tests indicate that GOLD score and activity are significantly correlated
according to the Kendall test but not according to the Spearman test (Spearman r
s
= -0.191, p =
0.065; Kendall t =-0.150, p = 0.033).
These inhibitors are all extremely hydrophobic, representing a difficult case for GOLD.
Note: For this dataset and target GOLD is not predicting active molecules as inactive. This is
advantageous in virtual screening applications (inactives that are predicted as actives are
acceptable in this context, the converse is not applicable).
16.2.3 Prediction of Binding Affinity to FKBP12
GOLD was used to dock a set of 13 FK506BP inhibitors (data from Holt et a., J. Am. Chem. Soc,
1993, 115, 9925). 20 docking runs were performed on each complex and the best fitness score
recorded.
A plot of fitness score against measured K
i
is shown below:
GOLD User Guide 139
The GOLD scores are a good indicator of activity for this series. It is most unlikely that this level
of prediction could have arisen through chance (_
2
= 15.27, p < 0.001, 1 degree of freedom)
16.2.2 Prediction of Binding Affinity to Alpha Chymotrypsin
GOLD was used to dock a set of 94 alpha-chymotrypsin inhibitors (data from Stewart et al., T.
C. Methods, 1990, 3, 713).
A plot of fitness score against measured K
i
is shown below:
The graph below omits the two outliers:
Predicted active Predicted inactive
Observed active 14 1
Observed inactive 5 14
86 GOLD User Guide
This grammar allows torsions to be specified as four fragment nodes. Each node defines an atom
type and, optionally, a set of neighbours to which the atom is connected. Each of the neighbours
is a node or an exact count of the number of hydrogen atoms to which the atom is bonded. Atom
types are defined using SYBYL atom types or elemental atom types. The atom can also be
required to be part of a pre-defined fragment.
Bonding environments can also be specified, using the symbols ~,=,-, which indicate,
respectively, that an atom forms an aromatic, double or single bond to its parent node.
Note: ~,=, and - should therefore not be used on the first atoms specified, these bond types are
specified for substituents only.
A node is a parent of all its neighbours and a top level node in the torsion definition is a parent of
subsequent nodes in the torsion.
There are currently four fragments available, one of which (the uracil fragment) matches both
thymine and uracil. More fragments can easily be added. The Ullman algorithm is used to
determine if an atom belongs to a fragment. Fragments are defined through SYBYL atom types
and connectivity (exact bond types are not used). Only heavy atoms are considered. Currently,
fragments are precompiled, but they could be read in at run-time if required.
Directives are allowed to take account of special circumstances. There are two directives:
expand and period.
The expand directive has the form expand <min> <max> where <max> - <min> = 180.0 or
<min> = 0. This directive is used for torsions where the CSD query has symmetry and torsions
are only measured over <min> to <max> degrees. However, although the CSD query may have
two-fold symmetry, often the matched structure does not. The expand directive fills out the rest
of the histogram with the correct values.
The period directive takes account of those torsional distributions for which the matched
structure has symmetry. This directive has the form period <pmin> <pmax>. The distribution
will only be expanded between angles <pmin> and <pmax>.
TYPE_DEF SYB_TYPE || EL_TYPE
LINKAGE ~ || = || -
SYB_TYPE C.3 || C.2 || C.1 || C.ar || C.cat || N.3 || N.2 || N.1 || N.ar || N.am || N.pl3
|| N.4 || O.3 || O.2 || O.co2 ||
S.3 || S.2 || S.o || S.o2 || P.3 || H || F || Cl || Br || I
EL_TYPE C || N || O || S || P
TORSION NODE | NODE | NODE | NODE | ||
NODE | NODE | NODE | NODE | DIRECTIVE ||
NODE | NODE | NODE | NODE | DIRECTIVE | DIRECTIVE
GOLD User Guide 87
9.3.3 Example Torsion Angle Distributions
Here are some examples of torsion angle distributions extracted from the Cambridge Structural
Database and in the correct format:
DIAGRAM
acid T1
C.2 (O.co2 O.co2) | C.3 (2H) | C.3 (2H) | C
41 8 0 0 0 0 0 0 0 1 8 7 2 0 0 0 0 1 1 0 0 0 1 0 4 1 0 1 0 0 0 0
0 2 2 41
DIAGRAM
acid T2
O.co2 | C.2 (O.co2) | C.3 (2H) | C.3 (2H C)
8 5 1 3 2 1 3 2 3 2 3 3 4 0 3 2 7 11 15 9 1 4 1 0 2 1 4 4 1 3 3 6
0 3 5 7
DIAGRAM
amide nh T2
C.2 (=O.2 N.am (1H)) | C.3 (1H C.3) | N.am (1H) | C.2 (=O.2)
1 1 14 16 29 25 23 38 35 50 82 156 53 6 1 0 0 0 0 0 0 1 1 14 17 15 4 4 2
1 2 5 2 2 0 0
DIAGRAM
uracil
O.3 [ribose] | C.3 [ribose] | N.am [uracil] (C.2 (1H))| C.2 [uracil] (=O.2)
24 73 85 44 59 60 40 14 8 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 7 5 3 0 0 1 4
3 3 5 10 6
DIAGRAM
benzyl sub
C | C.3 (2H) | C.ar (~C.ar (0H)) | ~C.ar (0H) | expand 0.0 180.0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 9 27 76 64 15 7 4 2
0 0 0 0
138 GOLD User Guide
activity. This has varied from a clear relationship for a test set of neuraminidase inhibitors, a
discernable relationship for alpha-chymotrypsin inhibitors, but no statistically significant
relationship for FK506 inhibitors.
16.2.1 Prediction of Binding Affinity to Influenza A Neuraminidase (see page 138)
16.2.2 Prediction of Binding Affinity to Alpha Chymotrypsin (see page 139)
16.2.3 Prediction of Binding Affinity to FKBP12 (see page 140)
16.2.1 Prediction of Binding Affinity to Influenza A Neuraminidase
GOLD was used to dock a set of 34 neuraminidase inhibitors. 25 docking runs were performed
on each complex and the best fitness score recorded.
A plot of fitness score against measured IC
50
(data supplied by GlaxoWellcome) is shown
below:
There are no compounds with low fitness and high activity and there is evidence of a correlation
(Spearman r
s
= -0.649, p < 0.001; Kendall t =-0.483, p < 0.001).
Considering 10m to be a cutoff for activity, there are 15 actives and 19 inactives. Using a
GOLD score of 74 or above as a predictor of activity gives:
GOLD User Guide 137
Classified in the validation experiments as a prediction that was wrong (1ICN - oleate docked
into a fatty-acid binding protein):
16.2 Correlation between Fitness Function and Biological Activity
The GOLD fitness function was designed to discriminate between different binding modes of
the same molecule. Extra terms are probably required to compare different molecules. For
example, a term is probably required to account for the entropic loss associated with freezing
rotatable bonds when the ligand binds.
Nevertheless, some correlation has been observed between GOLD fitness scores and biological
88 GOLD User Guide
9.3.4 Extracting Torsion Angle Distributions from the Cambridge Structural Database
The command process_tab (only available on SG machines) will extract the torsion angle
histogram from the .tab file produced by a search of the Cambridge Structural Database, and
reformat it so that it can be added into the GOLD torsional distribution file.
9.4 Matching Torsion Angle Distributions at Run Time
GOLD identifies each rotatable bond in the ligand and attempts to match it to a torsion angle
distribution in the torsion angle distribution file. This includes bonds that are identified by
GOLD as flippable (e.g., if torsions are switched on then ligand carboxylic acids (O)C-OH will
also use a torsion distribution).
In some cases, a rotatable bond may match more than one torsion angle distribution. If this
happens, a score is calculated for each torsion angle distribution and the distribution with the
highest score is selected.
Note: a weighting scheme is used when matching rotatable bonds in the ligand to a torsion angle
distribution such that more specific torsion definitions are taken in preference to more generic
ones.
Each portion of the torsion angle distribution contributes to the score as follows:
Element atom type 1.5
SYBYL atom type 2.0
Fragment 3.0
Hydrogen count 2.0
Bond linkage 0.5
GOLD User Guide 89
10. Genetic Algorithm Parameter Definitions
10.1 Genetic Algorithm Overview (see page 89)
10.2 Population Size (see page 89)
10.3 Selection Pressure (see page 90)
10.4 Number of Operations (see page 90)
10.5 Number of Islands (see page 90)
10.6 Niche Size (see page 91)
10.7 Operator Weights: Migrate, Mutate, Crossover (see page 91)
10.8 Van der Waals and Hydrogen Bonding Annealing Parameters (see page 91)
10.9 Hydrophobic Fitting Points (see page 92)
10.1 Genetic Algorithm Overview
GOLD optimises the fitness score by using a genetic algorithm.
A population of potential solutions (i.e. possible docked orientations of the ligand) is set up at
random. Each member of the population is encoded as a chromosome, which contains
information about the mapping of ligand H-bond atoms onto (complementary) protein H-bond
atoms, mapping of hydrophobic points on the ligand onto protein hydrophobic points, and the
conformation around flexible ligand bonds and protein OH groups.
Each chromosome is assigned a fitness score based on its predicted binding affinity and the
chromosomes within the population are ranked according to fitness.
The population of chromosomes is iteratively optimised. At each step, a point mutation may
occur in a chromosome, or two chromosomes may mate to give a child. The selection of parent
chromosomes is biased towards fitter members of the population, i.e. chromosomes
corresponding to ligand dockings with good fitness scores.
A number of parameters control the precise operation of the genetic algorithm, viz.
Population Size (see page 89)
Selection Pressure (see page 90)
Number of Operations (see page 90)
Number of Islands (see page 90)
Niche Size (see page 91)
Operator Weights: Migrate, Mutate, Crossover (see page 91)
Van der Waals and Hydrogen Bonding Annealing Parameters (see page 91)
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.2 Population Size
The genetic algorithm maintains a set of possible solutions to the problem. Each possible
solution is known as a chromosome and the set of solutions is termed a population.
136 GOLD User Guide
16.1.4 Examples of GOLD Dockings
The plots below show examples of GOLD dockings:
Classified in the validation experiments as a good prediction (4PHV - a peptide-like ligand
docked into HIV protease):
Classified in the validation experiments as a close prediction (1GLQ - a nitrophenyl-substituted
peptide ligand docked into glutathione-S-transferase):
Classified in the validation experiments as a prediction with significant errors (1EAP - a
succinylaminophosphonate ligand docked into an antibody):
GOLD User Guide 135
The aspartic protease set contains a high proportion of large ligands with several rotational
bonds; these complexes are difficult samples for docking. The lyases are difficult to dock as the
set features relatively shallow binding sites and polar ligands that are partly solvent-exposed
(examples are 1aco and 2h4n); crystal waters sometimes mediate binding (examples are 1pdz,
1okm).
However, it is extremely difficult to draw conclusions from data obtained using such small sets.
When GOLD solutions are classified as good or wrong using an RMS threshold of 2.0, a
simple chi-squared based test can be used to decide whether or not the observed result really is
different from the success rate obtained for the clean list.
It does show that the set of aspartic proteases can be regarded as different at a confidence level
of P<0.025. The lyase and lectin sets have significantly different results when P=0.10 is allowed,
and for the isomerases P<0.25 applies. The results for all other sets may just differ by chance,
and are not significantly different from the results obtained for the clean list.
Alternatively, F statistics can be used to decide whether a subset is really different from the clean
list in terms of RMS value. In this case, the F ratio is calculated using the null hypothesis that the
average RMS for the clean list of 224 entries and each sublist is equal.
Results for F indicate that only the subsets containing aspartic proteases and isomerases (and
possibly the lectin set) are significantly different from the clean list, showing clearly that it is
very difficult to draw any meaningful conclusions from the results for such small sets.
Influence of Mediating Water Molecules on GOLD Results
Waters have been removed from all complexes prior to docking. This probably lowers
performance of the docking algorithm, as waters can mediate interactions that are essential for
ligand-binding. To estimate this effect, a subset of structures were identified with at least one
strongly-bound water molecule within a 2.9 distance of both protein and ligand moieties.
GOLD success rates for this subset (40 entries) and structures lacking mediating water
molecules (55 entries) are reported below. All entries are subsets of the clean list. There seems to
be a trend towards lower success rates for structures that contain water-mediated contacts
between ligand and protein, although the impact of leaving water molecules out is not so high as
might be expected.
GOLD results for complexes with and without waters that mediate protein-ligand binding:
90 GOLD User Guide
The variable Population Size (or popsize) is the number of chromosomes in the population. If
n_islands is greater than one (i.e. the genetic algorithm is split over two or more islands),
popsize is the population on each island.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.3 Selection Pressure
Each of the genetic operations (crossover, migration, mutation) (see Section 10.7, page 91) takes
information from parent chromosomes and assembles this information in child chromosomes.
The child chromosomes then replace the worst members of the population.
The selection of parent chromosomes is biased towards those of high fitness, i.e. a fit
chromosome is more likely to be a parent than an unfit one.
The selection pressure is defined as the ratio between the probability that the most fit member of
the population is selected as a parent to the probability that an average member is selected as a
parent. Too high a selection pressure will result in the population converging too early.
For the GOLD docking algorithm, a selection pressure of 1.1 seems appropriate, although 1.125
may be better for library screening where the aim is faster convergence.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.4 Number of Operations
The genetic algorithm starts off with a random population (each value in every chromosome is
set to a random number). Genetic operations (crossover, migration, mutation) (see Section 10.7,
page 91) are then applied iteratively to the population. The parameter Number of Operations (or
maxops) is the number of operators that are applied over the course of a GA run.
It is the key parameter in determining how long a GOLD run will take.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.5 Number of Islands
Rather than maintaining a single population, the genetic algorithm can maintain a number of
populations that are arranged as a ring of islands. Specifically, the algorithm maintains n_islands
populations, each of size popsize.
Individuals can migrate between adjacent islands using the migration operator.
The effect of n_islands on the efficiency of the genetic algorithm is uncertain.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
GOLD User Guide 91
10.6 Niche Size
Niching is a common technique used in genetic algorithms to preserve diversity within the
population.
In GOLD, two individuals share the same niche if the rmsd between the coordinates of their
donor and acceptor atoms is less than 1.0 .
When adding a new individual to the population, a count is made of the number of individuals in
the population that inhabit the same niche as the new chromosome. If there are more than
NicheSize individuals in the niche, then the new individual replaces the worst member of the
niche rather than the worst member of the total population.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.7 Operator Weights: Migrate, Mutate, Crossover
The operator weights are the parameters Mutate, Migrate and Crossover (or pt_cross).
They govern the relative frequencies of the three types of operations that can occur during a
genetic optimisation: point mutation of the chromosome, migration of a population member
from one island to another, and crossover (sexual mating) of two chromosomes.
Each time the genetic algorithm selects an operator, it does so at random. Any bias in this choice
is determined by the operator weights. For example, if Mutate is 40 and Crossover is 10 then, on
average, four mutations will be applied for every crossover.
The migrate weight should be zero if there is only one island, otherwise migration should occur
about 5% of the time.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
10.8 Van der Waals and Hydrogen Bonding Annealing Parameters
When GoldScore is being used, the annealing parameters, van der Waals and Hydrogen
Bonding, allow poor hydrogen bonds to occur at the beginning of a genetic algorithm run, in the
expectation that they will evolve to better solutions.
At the start of a GOLD run, external van der Waals (vdw) energies are cut off when E
ij
> van der
Waals * k
ij
, where k
ij
is the depth of the vdw well between atoms i and j. At the end of the run,
the cut-off value is FINISH_VDW_LINEAR_CUTOFF. This allows a few bad bumps to be
tolerated at the beginning of the run.
Similarly, the parameters Hydrogen Bonding and FINAL_VIRTUAL_PT_MATCH_MAX are
used to set starting and finishing values of max_distance (the distance between donor hydrogen
and fitting point must be less than max_distance for the bond to count towards the fitness score).
This allows poor hydrogen bonds to occur at the beginning of a GA run.
Both the vdw and H-bond annealing must be gradual and the population allowed plenty of time
to adapt to changes in the fitness function.
Changes to genetic algorithm parameters should be made with care (see Section 11.3, page 94).
134 GOLD User Guide
GOLD Performance as a Function of Protein Type
Success rates for GOLD as a function of protein types are given below. Statistical analysis was
performed to asses whether the results are really different, or may have arisen by coincidence.
This check is essential, as the size of the sets being considered here is very small.
Performance appears to be above average for the metalloprotease, kinase, isomerase and lectin
sets. However, performance seems to be lower than expected for the aspartic protease and lyase
sets.
GOLD User Guide 133
GOLD Performance
A brief overview of the results obtained for GOLD with the CCDC/Astex test set are given
below. Figure 1 shows GOLD success rates as a function of the number of torsion angles in the
ligand. Results were obtained using the default settings; the values shown are the average values
derived from a set of 50 validation runs. Standard deviations are given. RMS (Root mean
squared deviations of atomic coordinates) values of 2.0 or less were considered to be good
results.
The following table shows the GOLD results for the clean set; results were calculated using both
the default settings and the threefold speed-up settings. As can be seen, there is a tradeoff
between speed and reliability. All success rates are average values over 50 validation runs.
Standard deviation is given in parentheses.
92 GOLD User Guide
10.9 Hydrophobic Fitting Points
GOLD automatically calculates a list of hydrophobic fitting points in the binding site. These are
used during the generation of trial docking solutions to map hydrophobic ligand atoms into
favourable regions of the binding site.
GOLD generates its hydrophobic fitting points by placing a fine grid over the binding site. At
each grid position, the van der Waals interaction energy between a bare carbon atom and the
protein is evaluated. By default, positions at which the interaction energy is below -2.5 kcal/
mole are added to the list of fitting points.
Note: the potential and threshold for selecting fitting points can be changed by editing the
gold.params file and changing the values of INTERNAL_POTENTIAL_FITPTS and
E_FITPT_THRESHOLD.
In this way, a map is constructed that contains positions onto which the placement of a
hydrophobic ligand atom should be favourable.
The ligand fitting points are used for the matching of hydrophobic regions.
By default only carbon atoms in the ligand are considered when identifying fitting points. The
selection of suitable ligand atoms can be extended to include carbon, halogen and non-polar
sulfur atoms by uncommenting the following line in the gold.params file:
#LIGAND_FITPTS_SELECTION EXTENDED_HAL_S
During docking, GOLD selects a list of lipophilic ligand atoms and matches them onto a subset
of the hydrophobic fitting points.
It is possible to use customised hydrophobic fitting points. This might be appropriate if GOLD is
not giving good results on a particular protein and you suspect that the fault may lie in the
placement of hydrophobic ligand groups.
Customised fitting points must be supplied in a MOL2 format file that contains a list of dummy
atoms at the desired fitting-point locations. The supplied fitting points should sample all regions
of interest in the cavity, so that the docking algorithm has sufficient alternatives for placement of
hydrophobic ligand atoms within the cavity. GOLD uses gridded points that are spaced by 0.25
; for a speed-up in calculation, higher values could be used.
To make GOLD use a customised fitting-point file, click on the Fitness & Search options button
in the GOLD front end, then switch on the Read hydrophobic fitting points check box in from the
Fitness Function and Search Options window. Finally, hit the Fit point file... button to open a file
selection window from which your customised file can be located.
Customised fitting points can, for example, be generated by the CCDC program SuperStar,
which offers the possibility of writing out a file of GOLD fitting points in the appropriate format
(see SuperStar manual sections on SAVE_GOLD_FITTING_POINTS and
GOLD_MIN_PROPENSITY).
GOLD User Guide 93
11. Balancing Reliability and Speed
11.1 Number of Dockings (see page 93)
11.2 Early Termination (see page 93)
11.3 Controlling Reliability and Speed with GA Parameters (see page 94)
11.1 Number of Dockings
GOLD will dock each ligand several times starting each time from a different random
population of ligand orientations. The results of the different docking runs are ranked by fitness
score.
The number of dockings to be performed on each ligand is set when the ligand file is defined
(see Section 4.5, page 32).
By default the number of dockings to be performed on each ligand is 10.
The total time spent docking a ligand obviously depends on the number of docking runs, so you
can make GOLD go faster by reducing this number. However, it is useful to perform at least a
few docking runs on each ligand. This increases the chances of getting the right answer. Also, if
the same answer is found in several different docking runs, it is usually a strong indicator that
the answer is correct.
The early termination option (see Section 11.2, page 93) can be used to prevent GOLD wasting
time performing multiple docking runs on easy ligands.
11.2 Early Termination
The early termination option instructs GOLD to terminate docking runs on a given ligand as
soon as a specified number of runs have given essentially the same answer. In this situation, it is
probable that the answer is correct, and GOLD will just be wasting time if it performs more
docking runs on that ligand.
To switch early termination on, click on the Allow early termination check box in the GOLD
front end (i.e. so that the box is coloured red). Then specify the early termination criterion. In the
example below, GOLD has been instructed to stop docking a ligand if it reaches a state in which
the best three solutions found so far are all within 1.5 rmsd of each other:
The rms deviation takes account of any ligand symmetry.
Early termination does not always save as much time as you might think, because it tends to be
invoked for easy (i.e. relatively rigid) ligands, which are quick to dock anyway.
132 GOLD User Guide
TABLE I. Optimal sets (clean lists) with different resolution thresholds of
none, 2.5 , and 2.0
Full set (305 entries)
1a07 1a0q 1a1b 1a1e 1a28 1a42 1a4g 1a4k 1a4q 1a6w
1a9u 1aaq 1abe 1abf 1acj 1acl 1acm 1aco 1aec 1aha
1ai5 1aj7 1ake 1aoe 1apt 1apu 1aqw 1ase 1atl 1azm
1b58 1b59 1b6n 1b9v 1baf 1bbp 1bgo 1bl7 1blh 1bma
1bmq 1byb 1byg 1c12 1c1e 1c2t 1c5c 1c5x 1c83 1cbs
1cbx 1cdg 1cf8 1cil 1cin 1ckp 1cle 1com 1coy 1cps
1cqp 1ctr 1ctt 1cvu 1cx2 1d0l 1d3h 1d4p 1dbb 1dbj
1dbm 1dd7 1dg5 1dhf 1did 1die 1dmp 1dog 1dr1 1dwb
1dwc 1dwd 1dy9 1eap 1ebg 1eed 1ei1 1ejn 1ela 1elb
1elc 1eld 1ele 1eoc 1epb 1epo 1eta 1etr 1ets 1ett
1etz 1f0r 1f0s 1f3d 1fax 1fbl 1fen 1fgi 1fig 1fkg
1fki 1fl3 1flr 1frp 1ghb 1glp 1glq 1gpy 1hak 1hdc
1hdy 1hef 1hfc 1hiv 1hos 1hpv 1hri 1hsb 1hsl 1htf
1hti 1hvr 1hyt 1ibg 1icn 1ida 1igj 1imb 1ivb 1ivc
1ivd 1ive 1ivq 1jao 1jap 1kel 1kno 1lah 1lcp 1ldm
1lic 1lkk 1lmo 1lna 1lpm 1lst 1lyb 1lyl 1mbi 1mcq
1mcr 1mdr 1ml1 1mld 1mmb 1mmq 1mnc 1mrg 1mrk 1mts
1mtw 1mup 1nco 1ngp 1nis 1nsd 1okl 1okm 1pbd 1pdz
1pgp 1pha 1phd 1phf 1phg 1poc 1ppc 1pph 1ppi 1ppl
1pso 1ptv 1qbr 1qbt 1qbu 1qcf 1qh7 1ql7 1qpe 1qpq
1rbp 1rds 1rne 1rnt 1rob 1rt2 1sln 1slt 1snc 1srf
1srg 1srh 1srj 1stp 1tdb 1tka 1tlp 1tmn 1tng 1tnh
1tni 1tnl 1tph 1tpp 1trk 1tyl 1ukz 1ulb 1uvs 1uvt
1vgc 1vrh 1wap 1xid 1xie 1xkb 1ydr 1yds 1ydt 1yee
25c8 2aad 2ack 2ada 2ak3 2cgr 2cht 2cmd 2cpp 2ctc
2dbl 2er7 2fox 2gbp 2h4n 2ifb 2lgs 2mcp 2mip 2pcp
2phh 2pk4 2plv 2qwk 2r04 2r07 2sim 2tmn 2tsc 2yhx
2ypi 3cla 3cpa 3erd 3ert 3gch 3gpb 3hvt 3mth 3nos
3pgh 3ptb 3tpi 4aah 4cox 4cts 4dfr 4er2 4est 4fab
4fbp 4lbd 4phv 4tpi 5abp 5cpp 5er1 5p2p 6abp 6cpa
6rnt 6rsa 7cpa 7tim 8gch
Clean list (224 entries)
1a28 1a42 1a4g 1a4q 1a6w 1a9u 1aaq 1abe 1abf 1acj
1acl 1acm 1aco 1aec 1ai5 1aoe 1apt 1apu 1aqw 1ase
1atl 1azm 1b58 1b59 1b9v 1baf 1bbp 1bgo 1bl7 1blh
1bma 1bmq 1byb 1byg 1c12 1c1e 1c5c 1c5x 1c83 1cbs
1cbx 1cdg 1cil 1ckp 1cle 1com 1coy 1cps 1cqp 1cvu
1cx2 1d0l 1d3h 1d4p 1dbb 1dbj 1dd7 1dg5 1dhf 1did
1dmp 1dog 1dr1 1dwb 1dwc 1dwd 1dy9 1eap 1ebg 1eed
1ei1 1ejn 1eoc 1epb 1epo 1eta 1etr 1ets 1ett 1f0r
1f0s 1f3d 1fax 1fen 1fgi 1fkg 1fki 1fl3 1flr 1frp
1glp 1glq 1hak 1hdc 1hfc 1hiv 1hos 1hpv 1hri 1hsb
1hsl 1htf 1hvr 1hyt 1ibg 1ida 1imb 1ivb 1ivq 1jap
1kel 1lah 1lcp 1ldm 1lic 1lna 1lpm 1lst 1lyb 1lyl
1mbi 1mcq 1mdr 1mld 1mmq 1mrg 1mrk 1mts 1mup 1nco
1ngp 1nis 1okl 1okm 1pbd 1pdz 1phd 1phg 1poc 1ppc
1pph 1ppi 1pso 1ptv 1qbr 1qbu 1qcf 1qpe 1qpq 1rds
1rne 1rnt 1rob 1rt2 1slt 1snc 1srj 1tdb 1tlp 1tmn
1tng 1tnh 1tni 1tnl 1tpp 1trk 1tyl 1ukz 1ulb 1uvs
1uvt 1vgc 1wap 1xid 1xie 1ydr 1ydt 1yee 25c8 2aad
2ack 2ada 2ak3 2cht 2cmd 2cpp 2ctc 2dbl 2fox 2gbp
2h4n 2ifb 2lgs 2mcp 2pcp 2phh 2pk4 2qwk 2r07 2tmn
2tsc 2yhx 2ypi 3cla 3cpa 3erd 3ert 3gpb 3hvt 3tpi
4aah 4cox 4cts 4dfr 4est 4fbp 4lbd 4phv 5abp 5cpp
5er1 6rnt 6rsa 7tim
Clean list, resolution threshold 2.0 (92 entries)
1a28 1a4q 1a6w 1abe 1abf 1aec 1aoe 1apt 1apu 1aqw
1atl 1b58 1b59 1bma 1byb 1c1e 1c5c 1c5x 1c83 1cbs
1cil 1coy 1d0l 1d3h 1ejn 1eta 1f3d 1fen 1flr 1glp
1glq 1hfc 1hpv 1hsb 1hsl 1hvr 1hyt 1ida 1jap 1kel
1lcp 1lic 1lna 1lst 1mld 1mmq 1mrg 1mrk 1mts 1nco
1phd 1phg 1ppc 1pph 1qbr 1qbu 1rds 1rnt 1rob 1slt
1snc 1srj 1tmn 1tng 1tnh 1tni 1tnl 1tpp 1tyl 1ukz
1vgc 1wap 1xid 1xie 2ak3 2cmd 2cpp 2ctc 2fox 2gbp
2h4n 2qwk 2tmn 2tsc 3cla 3ert 3tpi 4dfr 4est 5abp
6rnt 7tim
GOLD User Guide 131
16.1.3 Validation using the CCDC/Astex Test Set
CCDC/Astex Validation Overview (see page 131)
GOLD Performance (see page 133)
GOLD Performance as a Function of Protein Type (see page 134)
Influence of Mediating Water Molecules on GOLD Results (see page 135)
CCDC/Astex Validation Overview
The CCDC/Astex test set of protein-ligand complexes was used to determine the GOLD success
rates (see http://www.ccdc.cam.ac.uk/products/life_sciences/validate/). The set consists of 305
protein-ligand complexes. All complexes have had their protonation states set manually, and
have been checked extensively. It is a considerably extended version of the original GOLD
validation test set.
From this set, a set of 224 reliable complexes was selected. This clean set excluded all
complexes that might be unreliable. Complexes were considered to be unsuitable if they did not
pass the following checks:
Involvement of crystallographically-related protein units in ligand binding.
Identification of bad clashes between protein side chains and the ligand.
Presence of structural errors, and/or inconsistency of ligand placement with crystal structure
electron density.
Limiting the clean list to resolutions better than 2.0 left 92 entries, for which results will also
be shown.
In addition, the set has been pruned to assure diversity in terms of protein-ligand structures.
The full list of 305, the clean list of 224, and the limited clean set of 92 entries list are shown in
Table I.
94 GOLD User Guide
11.3 Controlling Reliability and Speed with GA Parameters
11.3.1 Relationship between GA Parameters and Speed (see page 94)
11.3.2 Using Automatic GA Parameter Settings (see page 94)
11.3.3 Using Pre-Defined GA Parameter Settings (see page 96)
11.3.4 Benchmarking of Reliability/Speed for Pre-defined GA Parameter Settings (see page
97)
11.3.5 GA Parameter Settings for Virtual Screening (see page 98)
11.3.1 Relationship between GA Parameters and Speed
The time taken by GOLD to dock ligands can be controlled by altering the values of the genetic
algorithm (GA) parameters (see Section 10., page 89).
GOLD runs for a fixed number of genetic operations (crossover, migration, mutation). The
easiest way to make GOLD go faster is to reduce the number of GA operations performed in the
course of a run. This is done through the Number of Operations variable (this parameter is called
maxops in the configuration file).
A reduction in Number of Operations is likely to change the optimum values of several other GA
parameters, particularly popsize, van der Waals and Hydrogen Bonding.
GOLD manipulates a pool of chromosomes of size popsize * Number of Islands. The size of this
pool should be such that the optimisation converges within the specified maximum number of
operations, Number of Operations. If the pool size is too small for a given value of Number of
Operations, the algorithm will converge prematurely. Conversely, if the pool size is too large the
algorithm will terminate before it has converged.
The annealing parameters van der Waals and Hydrogen Bonding allow poor hydrogen bonds to
occur at the beginning of a genetic algorithm run, in the expectation that they will evolve to
better solutions. Both the vdw and H-bond annealing must be gradual and the population
allowed plenty of time to adapt to changes in the fitness function.
Because of these factors, it is difficult to set GA parameters by hand and you are recommended
to use automatic (ligand dependent) GA parameter settings (see Section 11.3.2, page 94), or one
of the default parameter sets offered in the GOLD front end (see Section 11.3.3, page 96).
11.3.2 Using Automatic GA Parameter Settings
The number of genetic operations performed (crossover, migration, mutation) is the key
parameter in determining how long a GOLD run will take (i.e. this parameter controls the
coverage of the search space).
GOLD can automatically calculate an optimal number of operations for a given ligand, thereby
making the most efficient use of search time, e.g. small ligands containing only one or two
rotatable bonds will generally require fewer genetic operations than larger highly flexible
GOLD User Guide 95
ligands.
The criteria used by GOLD to determine the optimal GA parameter settings for a given ligand
include: the number of rotatable bonds in the ligand, ligand flexibility, i.e. number of flexible
ring corners, flippable nitrogens, etc. (see Section 7., page 64), the volume of the protein binding
site, and the number of water molecules considered during docking (see Section 3.4, page 16).
The exact number of GA operations contributed, e.g. for each rotatable bond in the ligand, are
defined in the gold.params file (see Section 6.3, page 48).
To enable automatic GA settings, click on the Select GA Presets and Automatic Settings button in
the Genetic Algorithm Parameters panel (or hit Settings in the Control panel) then, in the
Settings selector window, click on Use automatic settings:
GOLD runs for a fixed number of genetic operations, limiting this number will result in an
increase in docking speed, however the search space will be less well explored (see Section 11.3,
page 94). The Search efficiency can be used to control the speed of docking and the predictive
accuracy (i.e. the reliability) of the results. With the Search efficiency set at 100% GOLD will
attempt to apply optimal settings for each ligand. For a ligand with five rotatable bonds this will
be around 30,000 GA operations. If the Search efficiency were set to 50%, then GOLD will
perform around 15,000 operations thereby speeding up the docking by a factor of two. Similarly,
by setting a Search efficiency greater than 100%, it is possible to make the search more
exhaustive (but slower).
The Minimum number of operations in run will be updated automatically according to the
Search efficiency that is set. The automatic preset can be overridden to ensure that every ligand is
subjected to at least a user-specified number of operations. Similarly, The Maximum number of
130 GOLD User Guide
genetic algorithm parameters.
Results:
GOLD failed to produce an answer for 1ACL because the ligand contains no hydrogen-bonding
atoms (this problem is since fixed). The subsequent analysis was therefore based on results for
99 complexes.
In summarising the results, the GOLD prediction is defined as the best of the 20 dockings
according to the GOLD fitness score and not the docking that is closest to the experimental
result.
Each GOLD prediction was assigned to one of 4 subjective categories: good, close, errors or
wrong. Each prediction was also ranked by its rms with respect to the observed ligand position.
GOLD achieved a 71% rate of successful predictions (good or close).
3D plots of individual predictions are available on the CCDC web page.
Detailed tabulations of the predictions are in Appendix C: GOLD Predictions in First Series of
Validation Tests (see page 153).
16.1.2 Follow-Up Validation of Docking Results
The GOLD algorithm was improved in various ways following the first set of validation tests. A
second set of tests was then performed on 34 additional complexes in order to ensure that GOLD
had not been over-trained on the original set. The method used was the same as in the first set of
validation tests.
Results:
GOLD achieved a 74% rate of successful predictions (good or close).
3D plots of individual predictions are available on the CCDC web page.
Detailed tabulations of the predictions are in Appendix D: GOLD Predictions in Second Series
of Validation Tests (see page 160).
GOLD User Guide 129
16. Accuracy of Predictions
16.1 Correlation between Predicted and Observed Ligand Positions (see page 129)
16.2 Correlation between Fitness Function and Biological Activity (see page 137)
16.1 Correlation between Predicted and Observed Ligand Positions
NOTE: This section and Appendix B summarise validation tests done when GOLD was first
developed using the GoldScore fitness function. Recently (2001-2), we have significantly
expanded the size of the test set and done comparisons between GoldScore and ChemScore. The
new validations do not change the basic conclusions outlined below in any major way and give
preliminary indications that GoldScore and ChemScore have comparable overall success rates.
A simple test of the effectiveness of a docking program is to take a protein-ligand complex from
the Protein Data Bank and extract the ligand. The docking program can then be used to predict
the binding mode of the ligand and a comparison made with the crystallographically observed
position. This methodology has been used to validate GOLD. Tests were done in two phases:
first, on a test set of 100 complexes; later, on an additional 34 complexes as a check against over-
training.
16.1.1 Initial Validation of Docking Results (see page 129)
16.1.2 Follow-Up Validation of Docking Results (see page 130)
16.1.3 Validation using the CCDC/Astex Test Set (see page 131)
16.1.4 Examples of GOLD Dockings (see page 136)
16.1.1 Initial Validation of Docking Results
The method used for each test calculation was as follows:
100 protein-ligand complexes were selected from the Protein Data Bank.
Parts of the protein remote from the binding site were deleted. Enough of the protein was
retained to ensure that all residues were present that might reasonably interact with the ligand.
The ligand was extracted from the protein binding site.
Hydrogen atoms were placed on both the protein and the ligand in order to ensure that ionisation
and tautomeric states were defined unambiguously. This involved making hypotheses about the
protonation states of residues such as His, Glu and Asp.
The ligand was minimised into a low-energy conformation.
The atom types of both the protein and ligand were checked for accuracy.
In almost all test runs, all water molecules were deleted from the protein structure. This is not
strictly defensible since water molecules often mediate protein-ligand binding. However, if more
careful judgements were made on which waters to remove, the effect would be to improve the
accuracy of the GOLD predictions. Hence, the deletion of all waters is a conservative strategy
which will make GOLD look less reliable than it really is, rather than more reliable.
20 docking runs were performed on each test complex, using the slowest default setting of the
96 GOLD User Guide
operations in run can be set manually.
When using automatic GA parameter settings, the parameters controlling the precise operation
of the genetic algorithm (population size, selection pressure, Niche size, etc.) will be set to auto
in the Genetic Algorithm Parameters panel. The actual GA settings used will be reported in the
ligand log file (see Section 14.10, page 118).
11.3.3 Using Pre-Defined GA Parameter Settings
To use one of the pre-defined GA parameter settings click on the Select GA Presets and
Automatic Settings button in the Genetic Algorithm Parameters panel, or hit Settings in the
Control panel, to open the Settings selector window:
Select Choose presets and choose from one of the pre-defined GA parameter settings listed.
The Default settings deliver high predictive accuracy but are relatively slow. Default settings are
recommended for use with large highly-flexible ligands, or for research applications where
speed of docking is not an issue and optimal accuracy is required.
The 2 times speed-up or 3 times speed-up settings are progressively quicker (predictive
reliability will fall off, but quite slowly). These setting are recommended for use with
compounds containing up to six flexible bonds and/or ring corners (see Section 7.1, page 64).
The 7-8 times speed-up settings will give comparable predictive accuracy to the slow, Default
settings when docking small ligands. These settings are recommended for use with ligands
containing one or two rotatable torsions and for virtual screening work (see Section 11.3.5, page
98).
GOLD User Guide 97
It is possible to create your own default GA settings. To do this, you must edit the file
gold_preferences (see Section 15.4, page 127)
Individual GA parameter settings can be specified in the GOLD front end by typing directly into
the input boxes in the Genetic Algorithm Parameters panel (see Section 2.4, page 7). However,
it is recommended that you use one of the pre-defined GA parameter settings as opposed to
altering individual GA parameters, because the optimum values of the parameters are highly
correlated.
11.3.4 Benchmarking of Reliability/Speed for Pre-defined GA Parameter Settings
We have performed a great many experiments with different genetic algorithm (GA) settings.
Three such settings are summarised below:
We used GOLD with each of these settings to dock 100 ligands into their binding sites, using a
test set of 100 protein-ligand complexes selected from the PDB. 20 docking runs were done on
each ligand with each GA set. The rms deviations were computed between the experimental
result and the GOLD solution ranked top by fitness function. Root mean square deviations
(rmsd) were also calculated between the experimental result and the closest of the 20 dockings
(i.e. not necessarily the top-ranked solution). Results were:
GA Parameter Set A Set B Set C
Number of Operations 100000 10000 1000
Population Size 100 100 50
Selection Pressure 1.1 1.1 1.125
Number of Islands 5 1 1
Crossover 95 100 100
Mutate 95 100 100
Migrate 10 0 0
Niche Size 2 2 2
Hydrogen Bonding 2.5 2.0 5.0
van der Waals 4.0 10.0 10.0
128 GOLD User Guide
Edit into the file a line such as:
default_ga_setting /home/golduser/configfiles/myconfig.conf my
protein
and create a configuration file (called /home/golduser/configfiles/
myconfig.conf in the above case) containing the desired GA settings.
The settings will appear in the Settings Selector window next time GOLD is opened:
GOLD User Guide 127
15.2 Customising Fitness Function Parameters
GOLD parameters are stored in the gold.params file in the GOLD distribution directory. It
can be customised by copying it, editing the copy, and instructing GOLD to use the edited file.
Parameters specific to GoldScore are stored in files of the type
goldscore.p450_<csd|pdb>.params (see Section 6.3, page 48).
The ChemScore fitness-function parameters are stored in the ChemScore file, which can
also be customised (see Section 6.5, page 58).
15.3 Customising the Torsion Angle Distribution File
It is possible to customise torsion distribution information by copying one of the standard torsion
distribution files, editing it, and instructing GOLD to use the edited file (see Section 9.3, page
84).
15.4 Creating Customised Default Genetic Algorithm Parameter Settings
A number of pre-defined genetic algorithm (GA) settings are offered when GOLD is opened:
It is possible to add your own default GA settings to this window.
To do this, you must edit the file .gold_preferences in your home directory. This file will
be created the first time you run GOLD, and will look something like this:
98 GOLD User Guide
rmsd < n = number of predictions out of the 100 within n rmsd of observed result.
In the GOLD front-end, the GA parameter set called Default settings corresponds to Set A
above; 7-8 times speed-up corresponds to Set B; and library screening settings corresponds to
Set C.
For careful work, we recommend the slow standard setting A, which typically finds correct
solutions in 70-80% of cases. Set C, which is fast enough for virtual library-screening, is
inevitably less accurate, but still finds the correct solution 60-70% of the time.
11.3.5 GA Parameter Settings for Virtual Screening
Existing GOLD users may have library screening settings available as one of the default
preferences. However, due to general advances in processor speed we would now recommend
using 7-8 times speed-up for virtual screening work in order to take advantage of the associated
improvement in accuracy (see Section 11.3.1, page 94):
Note: If library screening settings are not available as a default preference you can re-enable
these by editing the gold_preferences file (see Section 15.4, page 127).
top-ranked;
rmsd < 2
top-ranked;
rmsd < 3
closest;
rmsd < 2
closest;
rmsd < 3
Set A 70 79 83 88
Set B 64 77 79 89
Set C 62 68 72 86
GOLD User Guide 99
12. Running GOLD
12.1 Required Input Files (see page 99)
12.2 Starting GOLD (see page 99)
12.3 Running Interactively; Interactive Diagnostics (see page 100)
12.4 Submitting a GOLD job to the Background from the Front End (see page 100)
12.5 Running GOLD from the Command Line (see page 100)
12.6 Running in Parallel (see page 101)
12.1 Required Input Files
The following files must be available before a GOLD job can be run:
One or more files containing the ligand(s) to be docked, in MOL2, MOL, SD or PDB format (but
PDB format is not recommended for ligand files) (see Section 4., page 30).
A file containing the protein (or the part of a protein) into which the ligand is to be docked. This
may be in PDB or MOL2 format (see Section 3., page 9)
GOLD also needs a configuration file, which contains the names of the protein and ligand files,
and all the user-defined parameters such as genetic algorithm parameter settings, fitness flags,
etc. The configuration file can be created manually, but it is usually easier and preferable to
create it with the GOLD graphical front end (the file is written automatically when the Run, Save
& Exit or Submit & Exit buttons are hit) (see Section 2.1, page 3).
In addition, GOLD uses a parameter file (see Section 6.3, page 48) and (optionally) a torsion
distribution file (see Section 9., page 83). If the ChemScore fitness function is selected, it will
also use a ChemScore file (see Section 6.5, page 58). All these files are supplied in the GOLD
distribution and, by default, will be found automatically by the program. If required, any of the
files can be copied to a users directory and edited, and GOLD can then be directed to use the
edited file.
12.2 Starting GOLD
GOLD opens output log files so each GOLD run should be performed in a separate directory.
Create a directory in which to run GOLD and copy the protein and ligand files into it.
You can also write each set of ligand output files to its own sub-directory.
GOLD can be run from the command line or via the graphical front end. The easiest way to get
started is to use the front end (see Section 2., page 3).
From the front end, you can run a GOLD job interactively (see Section 12.3, page 100), submit it
to the background (see Section 12.4, page 100), or save the configuration file so that GOLD may
be started from the command line (see Section 12.5, page 100).
126 GOLD User Guide
15. Saving and Reusing Program Settings
15.1 Saving and Re-using Program Settings in Configuration Files (see page 126)
15.2 Customising Fitness Function Parameters (see page 127)
15.3 Customising the Torsion Angle Distribution File (see page 127)
15.4 Creating Customised Default Genetic Algorithm Parameter Settings (see page 127)
15.1 Saving and Re-using Program Settings in Configuration Files
The configuration file is a text file which specifies the GOLD calculation that is to be run,
including details of the ligand, the protein binding site, the fitness-function parameter file to be
used, the torsion distribution file to be used, and the genetic algorithm parameters. Although the
file can be generated with a standard text editor, the easiest way to create it is to use the GOLD
front end (see Section 2.1, page 3).
Any settings that have been defined in the GOLD front end can be saved as a configuration file
by selecting the button Save & Exit. Alternatively, the file will be saved automatically if you
start a GOLD job from the front end with the Submit & Exit or Run buttons.
By default, the configuration file will be saved in the directory from which GOLD was opened
and will be called gold.conf. Use the entry box next to the Configuration File button to
change the file name and/or directory (any file name can be used).
Once a configuration file has been created, it can be re-used, either as a quick way of reading
program settings into the GOLD front end or to run GOLD from the command line (see Section
12., page 99).
To load a previously created configuration file into the front end, enter the file name into the box
next to the Configuration File button and hit return. The parameters read in from the
configuration file will overwrite any parameters that have already been set in the GOLD front
end.
If you have a valid configuration file (i.e. one that completely specifies a GOLD job), you can
run GOLD from the command line by using a simple command available in $GOLD_DIR/bin.
For example, if the configuration file is gold.conf, the command is:
% gold_auto gold.conf &
If you find yourself using a configuration file over and over again, you may want to add it to the
options listed in the GOLD start-up window (the Settings Selector window). This is done by
editing the file .gold_preferences in your home directory (see Section 15.4, page 127).
GOLD User Guide 125
Note: SGI users running IRIX will also be given the option to use grommitt for simple
visualisation of docking results (see Section 18.1, page 142).
14.14 Exporting Fitness-Function Data to SILVER
It is possible to write additional information to docked solution files.
This information includes the values of the individual fitness-function components and is written
to SD file tags; for MOL2 files, these tags are written to comment blocks (see Section 14.2, page
111).
This information can be utilised by SILVER (supplied with GOLD). SILVER allows you to
define and calculate a wide variety of descriptors (parameters that describe dockings) which may
be used to analyse the results of a docking run. For further information, refer to the SILVER
User Guide.
100 GOLD User Guide
12.3 Running Interactively; Interactive Diagnostics
GOLD can be run interactively by hitting the Run button in the front end. However, since
docking often takes several minutes or even hours, it is usually better to run the job in the
background.
If GOLD is run interactively, output that is written to the log files is also displayed in a window:
The parallel version only gives a summary as it is not possible to track multiple files.
You can use the Interrupt GA button to interrupt GOLD and terminate the docking run.
If any error conditions are encountered, they will be displayed in another window. Note that only
fatal errors are reported for the parallel version.
When GOLD is being run interactively, SILVER can be used to display the current top solution
from a genetic algorithm run (see Section 18.1, page 142). To do this, click on the Display
options button in the GOLD front end.
12.4 Submitting a GOLD job to the Background from the Front End
You can submit a GOLD job the background by using the Submit&Exit button in the front end,
having first specified all the required information, such as protein and ligand file names,
parameter settings, etc.
12.5 Running GOLD from the Command Line
Unix platforms:
GOLD can be run directly in the background by using a simple command available in:
$GOLD_DIR/bin:
GOLD User Guide 101
% gold_auto gold.conf &
where gold.conf is the name of a configuration file.
Windows:
GOLD can be run on Windows by starting a command prompt, navigating to the directory
containing the gold.conf file and running the following command:
"C:\Program
Files\CCDC\gold_v3.1\gold\d_win32\bin\gold_win32.exe"
The above command assumes that GOLD is installed in the default installation directory and
that the configuration file is called gold.conf. If another name has been used for the gold.conf,
(e.g. new_conf_filename.conf), this will have to be specified:
"C:\Program
Files\CCDC\gold_v3.1\gold\d_win32\bin\gold_win32.exe"
new_conf_filename.conf
12.6 Running in Parallel
12.6.1 Parallel Virtual Machine (PVM) (see page 101)
12.6.2 Using the PVM Console (see page 102)
12.6.3 Diagnosis of PVM Problems (see page 103)
12.6.4 Selecting and Deselecting Machines (see page 104)
12.6.5 Setting the Maximum Number of Processes (see page 105)
12.6.6 Using GOLD with your own PVM Installation (see page 105)
12.6.1 Parallel Virtual Machine (PVM)
The parallel version of GOLD uses PVM (Parallel Virtual Machine) in its operation. PVM is a
3rd party public-domain library of routines that allows a program to schedule and harvest results
across a network of machines and/or processors.
PVM is supplied with GOLD for UNIX-based platforms only (parallel versions can only be run
on Windows with third party applications) and allows users to distribute jobs over their network,
across a virtual cluster of machines in order to harness the processing power of multiple
machines concurrently.
If PVM is not installed, GOLD disables the parallel version. There is also an option, -np, which
allows you to disable the parallel version, if required:
UNIX: $GOLD_DIR/bin/gold -np
Windows: <InstallDir>/bin/gold -np
124 GOLD User Guide
Cluster 1: bestranking structure is gold_soln_ligand_m1_8.mol
Cluster 2 : bestranking structure is gold_soln_ligand_m1_10.mol2
Cluster 3 : bestranking structure is gold_soln_ligand_m1_4.mol2
Cluster 4 : bestranking structure is gold_soln_ligand_m1_9.mol2
14.11 File Containing Error Messages
The file gold.err lists any errors found by the program. These are generally fatal and cause
the program to stop. It is a good idea to check gold.err if something goes wrong.
Errors found by the atom-type checker are written to gold.err. If you are unsure about your
atom typing you should therefore check this file. For example:
In the parallel version, warning messages are logged in individual error files - one for each
process. They are not sent back to the central parallel scheduling process.
gold.err is line buffered so errors are logged immediately. If you are running GOLD
interactively, the contents of gold.err will appear in a separate window.
14.12 Process File
The file gold.pid records the user, host and process number of the GOLD job. It is deleted
when GOLD exits. Its purpose is to stop the user running two GOLD jobs in the same directory.
If the machine goes down, or GOLD crashes or is killed with signal 9, you will need to remove
gold.pid before you can run another GOLD job in the same directory.
14.13 Viewing Docked Solutions in SILVER
To visualise docked solutions in SILVER click on Display options, then select either Show in
SILVER to view all results after a docking run has completed, or Show in SILVER now in order
to visualise current results immediately.
GOLD User Guide 123
In the above example, at a clustering distance of 0.75 , there are four different clusters of
solutions:
0.90 | 1 2 3 5 | 4 7 | 6 9 10 | 8 | files (d= 0.75 )
Note: Clusters are separated by the | symbol and rankings are used rather than run numbers (see
Section 14.5, page 112).
The first cluster contains four solutions ranked numbers 1, 2, 3 and 5, the bestranking structure
in this cluster is ranked_structure_m#_1.mol2 which corresponds to the docked
solution gold_soln_ligand_m1_8.mol2. Likewise, the second cluster contains two
solutions ranked numbers 4 and 7, the bestranking structure in this cluster is
ranked_structure_m#_4.mol2 which corresponds to the docked solution
gold_soln_ligand_m1_10.mol2, and so on for the fourth and fifth clusters.
Symbolic links will be generated in the output directory which will link to the top-ranked
solution in each cluster:
102 GOLD User Guide
Parallel GOLD dockings are distributed over a PVM at the ligand level such that each ligand is
assigned to a particular node within the PVM and then docked. Results are returned to the PVM
Master machine whilst new ligands are distributed amongst idle machines within the PVM until
the GOLD job is completed.
PVM works by using daemons. When you start PVM, a daemon will be created on the machine
you are using (we will call this machine the master). You can then add further computers (which
we will call slaves) to the virtual machine (see Section 12.6.4, page 104). Adding each new
machine will start a slave daemon on that machine.
You can only use each host as a member of one virtual machine. This is because a user can only
have one daemon running on a given machine.
When using GOLD with PVM, it is strongly recommended that you pick one machine as master
and always use that machine for setting up and starting GOLD jobs.
To run parallel GOLD using PVM, passwordless shell access (either RSH or SSH) must be set
up between all of the machines that you wish to use in your PVM cluster. Your systems
administrator should be able to set this up for you. To get PVM to work with SSH you need to
set a global environment variable $PVM_RSH to ssh on all systems that you intend to use in the
PVM cluster.
PVM user manual pages can be found in $PVM_ROOT/man. For more information, see the
PVM home page at http://www.netlib.org/pvm3.
12.6.2 Using the PVM Console
The PVM software provides a command line console. Once you have set the environment
variable $PVM_ROOT you can start it by typing:
$PVM_ROOT/lib/pvm
at the command line.
The setenv command in the console will generate a listing of the local environment set in
PVM.
The conf command will tell you which hosts are currently present within your virtual
machine.
The PVM console allows you to add machines to and delete machines from your virtual machine
using add and delete, as well as view details about PVM. If there are problems with a specific
node, or machine, try the command:
add <node-name>
GOLD User Guide 103
and see if it generates any useful information as to why there may be a problem.
GOLD provides a simple interface to PVM that allows you to add machines (see Section 12.6.4,
page 104); however you should use the console to remove them. If you delete them in the GOLD
interface, they are just flagged as do not use. The reason for this is that we cannot guarantee that
a user is not using PVM for other purposes.
Adding a machine will not affect any other software, but deleting a machine might.
12.6.3 Diagnosis of PVM Problems
If you are having difficulty getting PVM running correctly on your system, in the first instance
please check the following:
1. Check that the environment variable
$PVM_ROOT
is set correctly and globally on all machines within the PVM cluster.
2. Check that your system temporary area is not full. We have occasionally heard of cases where
PVM could not start correctly because /tmp on the user's machine was full.
Once you have performed these checks, you can begin diagnosing the root cause of the problem.
The UNIX GOLD distribution includes a PVM diagnostics script called test_pvm.sh. To run this
script, please execute the command:
$GOLD_DIR/bin/test_pvm.sh
and follow the on-screen instructions. If you are unable to interpret the information generated by
this script, please send the entire output by email to support@ccdc.cam.ac.uk and we will
diagnose any PVM problems you may have.
Additional diagnostic information can be obtained from various files that can be found on the
machines within your PVM cluster. In particular, the PVM log files are often very useful. Each
daemon generates its own log file. They take the form:
/tmp/pvml.<user id>
and are generated on both the PVM master machine and the PVM slave machines. They can
contain relevant information (or sometimes lack expected lines) that indicates the source of the
problem.
For example, if PVM is configured correctly you should expect to see the text line
Running on <platform type>
122 GOLD User Guide
14.10.3Identification of Different Binding Modes (Clustering of Ligand Poses)
GOLD clusters docked solutions according to how similar the poses are in terms of their RMSd
(see Section 14.10.2, page 120). A link can be generated to the top ranked solution from each
distinct cluster. This can be useful in identifying different ligand binding modes. Considering
solutions from different clusters is often more relevant than taking the top n ranked poses since
these will often be very similar (i.e. all from the same cluster of solutions).
Open the Output Preferences window by hitting the Output button in the GOLD front end. Then,
switch on the Create links for different binding modes check-box, and specify an RMSd
clustering distance (this determines how similar the poses are in each cluster of solutions). By
default the clustering distance is 0.75 :
A clustering report is given at the end of the ligand log file. The clusters themselves and the
individual solutions within each cluster are in ranked order (i.e. the first member of the first
cluster is always the top-ranked solution). For example, output from a run of 10 GA dockings
may look like:
GOLD User Guide 121
In this case, solution number 4 had the largest fitness score (this solution will be in
gold_soln_ligand_m#_4.mol2, which will be symbolically linked to
ranked_ligand_m#_1.mol2), while solution number 3 had the worst fitness.
The numbers in the matrix of rms deviations refer to the rankings, not the run numbers (e.g. row
1 of the above matrix refers to the solution with the best fitness score, contained in
ranked_ligand_m#_1.mol2).
Finally, the rms deviations are used as input to a hierarchical cluster analysis, using the complete
linkage algorithm. Each line shows one iteration of the clustering algorithm, the distance
between the clusters that were merged at that step, and the contents of the current set of clusters.
Clusters are separated by the | symbol and rankings are used rather than run numbers. For
example, the solutions ranked_ligand_m#_2.mol2 and
ranked_ligand_m#_4.mol2 were merged in the first step of the following cluster
analysis:
Final Ranking 4 2 5 1 3
_______________________________
RMSD Matrix of RANKED solutions
2 3 4 5
1: 4.8 4.7 5.1 10.1
2: 4.0 3.1 10.9
3: 4.1 10.4
4: 11.0
Clustering using complete linkage.
Structure ids are RANKING
Dist Clusters...
3.14 | 4 2 | 3 | 5 | 1 |
4.06 | 4 2 3 | 1 | 5 |
5.07 | 4 2 3 1 | 5 |
10.95 | 4 2 3 1 5 |
104 GOLD User Guide
somewhere in the PVM log file on the PVM master machine.
For further information, please consult the PVM troubleshooting guide:
http://www.netlib.org/pvm3/book/node1.html
12.6.4 Selecting and Deselecting Machines
Click on Choose machines in the GOLD front end to launch the parallel process scheduling
window:
The scheduling window allows you to select a set of machines, across which a parallel GOLD
job will be distributed. A GOLD job may be distributed across multiple processors on a single
machine, or across several single-processor machines, or across several multiple-processor
machines.
The process scheduler allows you to add suitable hosts into the schedule for use in docking a
ligand.
By clicking on the Add button you can add new machines to your schedule:
Type in a host name (your administrator must install GOLD so that it knows the names of
available host machines).
For each host chosen, you need to specify a value for Number of Processes. This tells GOLD
GOLD User Guide 105
how many separate docking runs to start on that machine. For single processor machines,
Number of Processes should usually be set to 1; on machines with more than one processor, it
should usually be greater than one, depending on how many of the machines processors you
wish to use.
The Host file name button allows you to read a file that contains a host configuration previously
created when using parallel GOLD. If you click on this button, GOLD then prompts you for a
file to read. It will read hosts and numbers of processes from this file, and attempt to add these
hosts to your configuration.
12.6.5 Setting the Maximum Number of Processes
The entry box labelled Maximum number of distributed processes allows specification of the
maximum number of GOLD processes that can run simultaneously. This should normally be set
equal to the number of processors available for the GOLD job to run on.
Note: If the maximum number of distributed processes is set to a number greater than the total
no. of processes listed for each individual host in the PVM configuration, GOLD will spawn
more jobs than specified on each machine until the total no. set in the maximum number of
distributed processes are being run. i.e. a discrepancy between the no. of processors listed and
the maximum number of processes can lead to more or less processes than intended being run on
each machine.
12.6.6 Using GOLD with your own PVM Installation
In some circumstances, users may prefer to run parallel GOLD using a pre-existing installation
of PVM rather than the version packages within the UNIX GOLD installer. However, this can
cause difficulties since the parallel components of GOLD are compiled against the version of
PVM packaged with GOLD using specific compiler flags.
If the users version of PVM is significantly different, parallel GOLD may not function correctly
in its default configuration. The solution is for the user to re-compile the PVM parts of GOLD
on their system. For this reason, the UNIX GOLD distribution is packaged with a tar-gzip patch
file for the PVM part of GOLD on their system. It also recompiles the front end and the PVM
shared object used in the main GOLD process.
If you would like to try recompiling the parallel components of GOLD on your own system, you
will find the required patch file here:
$GOLD_DIR/gold_pvm_patch.tar.gz
Please unpack this file and consult the ReadMe file for further details.
120 GOLD User Guide
14.10.2Comparison of Docking Solutions
Following the completion of all docking runs on a ligand, the results from the different runs are
compared in the ligand log file.
The file will include a matrix of rms deviations between the various docked ligand positions.
The rms deviation algorithm takes account of symmetry effects, using a graph isomorphism
algorithm. For example:
GOLD User Guide 119
The progress of each docking run (see Section 14.10.1, page 119).
A comparison of the various docking solutions found (see Section 14.10.2, page 120).
Clustering of ligand poses, for identification of solutions with different binding modes (see
Section 14.10.3, page 122).
You can choose not to save ligand log files if you prefer (see Section 14.1, page 109).
14.10.1Information on the Progress of Docking Runs
As each docking run is performed on a ligand, the progress of the genetic algorithm is recorded
in the ligand log file.
The best (most fit) individual at any time is listed. The total fitness and its component terms are
also displayed.
For GoldScore, the internal vdw energy includes the ligand torsional energy. The external vdw
energy is normally scaled by a factor of 1.375 and summed with the other components to give
the total fitness (this is to encourage hydrophobic contact between the protein and ligand).
During a docking run, the fitness score may appear to get worse as the docking proceeds. This is
due to the fact that the effects of poor H-bond geometry and close nonbonded contacts are
artificially down-weighted at early stages of the docking (annealing). Only the final fitness score
(i.e. from the completed docking) has any meaning.
The message Reordering... refers to a re-ranking of the GA populations caused by the annealing
process.
At the end of the GA run, the solution is output and summarised.
Here is an example output:
106 GOLD User Guide
13. Rescoring
Different scoring functions may perform better for selected cases. You may find, for example,
that ChemScore outperforms GoldScore in ranking actives or one protein class, whereas the
reverse will apply for other classes.
Therefore, when screening large numbers of compounds, rescoring docking poses with
alternative scoring functions and considering the best results from each (consensus scoring) can
have a favourable impact on the overall rank ordering of ligands.
13.1 Rescoring Overview (see page 106)
13.2 Setting Up a Rescoring Run (see page 106)
13.1 Rescoring Overview
It is possible to rescore a single ligand or a set of ligands in one or more files.
Typically, a user will rescore GOLD solution files with an alternative scoring function.
However, it is also possible to score a known ligand pose from an alternative source (for
example, from a known crystal structure or a solution from another docking program).
Note: when docking from a source other than a GOLD solution file it will not be possible to use
the optimised positions of polar protein hydrogen atoms (see Section 13.2, page 106).
Rescoring, like docking, requires a prepared protein input file and a fully defined binding site
(preferably the same definition that was used for the original docking). The ligand file, scoring
function and output preferences must also all be specified (see Section 13.2, page 106).
GOLD can perform a local optimisation of the ligand conformation that is to be rescored. This is
important because if the pose is tweaked only slightly (via a simple minimization in an
appropriate force field) one finds that the fitness score can greatly increase.
When rescoring a GOLD solution file is it possible to use the positions of the rotatable protein
hydrogens that were generated during the original docking as a starting point for the
minimisation. If these are not available then the default hydrogen atoms positions specified in
the protein input file will be used.
Rescored solution files can be written out that will contain the new scoring function terms and
can be used with SILVER (see Section 13.2, page 106).
It is not possible to use the rescore feature if GOLD is being run in parallel (see Section 12.6,
page 101).
13.2 Setting Up a Rescoring Run
Rescoring requires essentially the same information as a normal docking run. You will therefore
need to:
Provide a prepared protein input file (see Section 3.10, page 29).
Define the binding site (preferably the same definition that was used for the original docking),
i.e. you must specify the approximate centre and extent of the binding site (see Section 13.1,
GOLD User Guide 107
page 106).
Use the ligand selection dialog to specify the ligand file you wish to rescore.
Note: When the Rescore check-box is switched on, the ligand selection dialog will contain an
additional option. Hit the Add all solutions in directory button to automatically add all GOLD
solution files (i.e. all files named gold_soln_*) in the specified directory to the Current
Ligand File Selection.
Specify the fitness function to be used for the rescoring (see Section 6.1, page 46).
Switch on the Rescore check-box in the Fitness Function Settings section of the GOLD front-
end. To specify the settings to be used for the rescoring run hit the Options button. This will open
the Rescoring Settings dialog:
The following Calculation Options are available:
Perform local optimisation (simplexing)
Enable this check-box to minimise the docked ligand pose before rescoring. Simplexing is
important if you are to obtain meaningful scores. Due to the nature of scoring functions, one
finds that small changes in location or conformation of the pose can have large effects on the
calculated score.. Note: simplexing can also affect rotatable protein hydrogen atoms (see
Section 14.6, page 115).
Retrieve rotatable H positions from file if available
When rescoring a GOLD solution file it is possible to use the optimised positions of the polar
protein hydrogen atoms that were generated during the original docking (see Section 14.6,
page 115). If this option is not switched on (or no rotatable H positions are available) then the
default hydrogen atoms positions specified in the protein input file will be used.
118 GOLD User Guide
For example, if concatenated_output = Myfile.mol2 the log file will be named
Myfile.rescore.log.
For each rescored ligand a total fitness score and the component scoring terms are listed.
Status gives an indication of whether or not there were any errors during the rescoring run.
Simplex indicates whether or not a locally optimised ligand pose was used for the rescoring. 1
indicates that the minimised pose was used, 0 indicates that the minimised pose was not used
and - indicates that simplexing was not switched on (see Section 13.2, page 106).
Note: When Perform local optimisation (simplexing) is switched on the minimised conformation
will only be used for the rescoring if this results in an improvement to the fitness score.
When a minimised ligand pose is used for the rescoring an RMSd measure is given of the final
minimised orientation with respect to the input ligand conformation.
The example file below was generated by rescoring the best solution found (m2) for the second
ligand in the solution file results.mol2:
14.9 Protein Log File
The protein log file gold_protein.log details the parameterisation of the protein and the
determination of the binding site.
The cavity volume, as determined by the cavity detection algorithm, can also be output to the
gold_protein.log file (see Section 3.8, page 24).
The file is line buffered, so you can see how the algorithm is progressing even when GOLD is
run in the background.
14.10 Ligand Log File
The progress of each genetic algorithm run is listed in the ligand log file
gold_<ligand_file_name>_m#.log. Here, m# is an index to the number of the ligand
in the input file, e.g. m3 indicates that the log file refers to the third ligand in the input ligand file
(remember that an input file may contain more than one ligand).
The log files are line buffered, so you can see how the algorithm is progressing even when
GOLD is run in the background.
The parallel version of GOLD creates several temporary log files for each ligand, named
gold_soln_<ligand_file_name>_m#_<N>.log where <N> is a docking-run number.
Once all the docking runs for the ligand have been completed, these files are concatenated
together into the single log file gold_soln_<ligand_file_name>_m#.log.
The ligand log file contains information on:
GOLD User Guide 117
14.8 Files Containing the Results of Rescoring
GOLD writes two types of file which contain the results of a rescoring run:
A structure file containing the docked ligand pose after rescoring (see Section 14.8.1, page 117)
A log file containing the scoring function terms obtained for the rescoing run (see Section
14.8.2, page 117)
14.8.1 Rescore Solution File
A file containing the docked ligand solution(s) after rescoring can be written. You can control
whether or not this file is written from within the Rescoring Settings window (see Section 13.2,
page 106).
If specified, solutions will be written with the default filename rescore.mol2 (MOL2 or SD
output can be selected (see Section 14.5, page 112)). To specify an alternative filename (for both
the rescore solution and log files), add the following line to the gold.conf file:
concatenated_output = <filename.mol2>
For example, if concatenated_output = Myfile.mol2 the rescore mol2 file will be
named Myfile.mol2.
Solution files will contain the new scoring function terms and the positions of rotatable protein
hydrogen atoms generated during rescoring (see Section 13.2, page 106).
A full description of the additional tags written to solution output files is available in Appendix
B: Additional Tags in Output Files (see page 151).
14.8.2 Rescore Log File
The rescore log file rescore.log summarises the outcome of the rescoring run. To specify an
alternative filename (for both the rescore solution and log files), add the following line to the
gold.conf file:
concatenated_output = <filename.mol2>
108 GOLD User Guide
The following Output options are available:
Write structures to file for SILVER
Enable this check-box to write out docked ligand solutions after rescoring. Solutions will be
written to the file rescore.mol2 (to specify an alternative filename (see Section 14.8.1,
page 117), MOL2 or SD output can be specified (see Section 14.5, page 112)). Solution files
will contain the new scoring function terms and can be used with SILVER.
Note: If writing of this file is switched off, only the rescore.log file will be written (see
Section 14.8, page 117).
Replace relevant tags in file
When rescoring a GOLD solution file enable this check-box to overwrite the list of active
residues and the rotated protein hydrogen atom positions generated during the original
docking with those resulting from the rescoring run. If you select not to replace relevant tags
then rescore.mol2 will contain both the binding site definition of the original docking
and that of the subsequent rescoring run.
Hit Done to close the Rescoring Settings dialog and start the GOLD job in the usual way (see
Section 12., page 99).
Output that is written to the rescore.log file is also displayed in the GOLD Output window.
Note: To specify an alternative rescore log filename (see Section 14.8.2, page 117).
GOLD User Guide 109
14. Output Options
14.1 Controlling the Amount of Output (see page 109)
14.2 Controlling the Information Written to Output Files (see page 111)
14.3 Specifying Directories for Output Files (see page 112)
14.4 Files Containing the Initialised Protein and Ligand (see page 112)
14.5 Files Containing the Docked Ligand(s) (see page 112)
14.6 Files Containing Protein Binding-Site Geometry (see page 115)
14.7 Files Containing Fitness Function Rankings (see page 115)
14.8 Files Containing the Results of Rescoring (see page 117)
14.9 Protein Log File (see page 118)
14.10 Ligand Log File (see page 118)
14.11 File Containing Error Messages (see page 124)
14.12 Process File (see page 124)
14.13 Viewing Docked Solutions in SILVER (see page 124)
14.14 Exporting Fitness-Function Data to SILVER (see page 125)
14.1 Controlling the Amount of Output
GOLD can produce a lot of output and you may wish to cut it down.
To do this, hit the Output... button in the GOLD front end to open the Output Preferences
window.
Use the File and Format Options to specify whether you want files listing fitness-function
rankings (see Section 14.7, page 115), ligand log files (see Section 14.10, page 118), and/or links
for different binding modes (see Section 14.10.3, page 122). For example, the settings below
will produce log files but not ranking files or links for different binding modes:
Use the Selecting Docked Solutions options to specify whether you want to save:
All docking solutions:
116 GOLD User Guide
gold_soln_ligand_file_m5_8.mol2, which is symbolically linked to
ranked_ligand_file_m5_2.mol2, since it is the second best of the docking attempts for
this molecule:
You can choose not to save ligand rnk files if you prefer (see Section 14.1, page 109).
14.7.2 File Containing Ranked Fitness Scores for a Set of Ligands
A file called bestranking.lst is written for batch jobs on multiple ligands. This gives a
continuous summary of the best solution that has been obtained for each completed ligand.
To specify an alternative filename, add the following line to the gold.conf file:
bestranking_list_name = <filename.lst>
The file gives total fitness scores and a breakdown of the fitness into its constituent energy
terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand),
an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand
intramolecular). The external vdw term is scaled by a factor of 1.375 in constructing the total
fitness score (this is an empirical correction to encourage protein-ligand hydrophobic contact).
Note: by default the file will contain a single internal energy term S(int) which is the sum of the
internal torsion and internal vdw terms (see Section 6.2, page 46).
The example file below was generated from a ligand input file containing 5 ligands. The listed
file names correspond to the names of the files containing the best solution found for each
ligand, e.g. gold_soln_ligs_m1_3.mol2 contains the best answer found for the first ligand in the
input file.
GOLD User Guide 115
N-phosphonacetyl-L-aspartate
the line SET_UNIQUE_SOLN_TITLES = 0 in the gold.params file should be changed to read
SET_UNIQUE_SOLN_TITLES = 1.
A description of the various other tags available can be found in Appendix B: Additional Tags
in Output Files (see page 151).
14.6 Files Containing Protein Binding-Site Geometry
During docking, GOLD will keep the protein geometry fixed except that it will optimise
hydrogen-bond geometries by rotating groups such as serine OH and lysine NH
3
. This means
that the coordinates of polar hydrogen atoms such as these will change.
Files can be written out that contain the conformation of the cavity residues around the docked
ligand (and, specifically, the optimised positions of the protein H-bonding hydrogen atoms) for
each docking. To do this, you need to edit the gold.params file and add the command
SAVE_CAVITY = 1.
The optimised positions of polar protein hydrogen atoms that are generated during docking can
also be written to the docked solution file. This information can be written to SD file tags; for
MOL2 files, these tags are written to comment blocks (see Section 14.2, page 111).
14.7 Files Containing Fitness Function Rankings
GOLD writes two types of file which summarise the fitness-function scores of docked ligands:
One pertains to an individual ligand (see Section 14.7.1, page 115).
The other pertains to a set of ligands (see Section 14.7.2, page 116).
14.7.1 File Containing Ranked Fitness Scores for an Individual Ligand
A file called <ligand_file_name>_m#.rnk is written for each ligand (m# refers to the
position of the ligand in the input file - remember that a given ligand input file may contain more
than one ligand). This file contains a summary of the fitness scores for all the docking attempts
on that ligand. The docking attempts are listed in decreasing order of fitness score, so the best
solution is placed first.
The file gives total fitness scores and a breakdown of the fitness into its constituent energy
terms. For GoldScore, these are the two vdw energy terms (protein-ligand and internal ligand),
an internal ligand torsion term, and two hydrogen-bonding terms (protein-ligand and ligand
intramolecular). The external vdw term is scaled by a factor of 1.375 in constructing the total
fitness score (this is an empirical correction to encourage protein-ligand hydrophobic contact).
The example file below corresponds to the five ligand in the input file ligand_file.mol2
and is therefore called ligand_file_m5.rnk. The solution Mol No 8 corresponds to the file
110 GOLD User Guide
or just the n best solutions for each ligand, where n is a user-specified number (e.g. n = 5 in
the screenshot below):
or just the top solution, and for only those m ligands with the best fitness scores, where m is
user specified (e.g. m = 100 in the example below):
In addition, you can filter out all solutions with fitness scores lower than a specified value by
switching on the button labelled Reject solutions with fitness lower than and typing in the
required value. For example, the settings below will save a maximum of 3 solutions for each
ligand and will not keep any solution with a fitness lower than 50:
GOLD User Guide 111
14.2 Controlling the Information Written to Output Files
It is possible to write additional information to docked solution files. This information is written
to SD file tags; for MOL2 files, these tags are written to comment blocks.
For post-processing docking results with SILVER it is particularly important that the scoring
function terms and the rotated protein hydrogen atom positions are saved.
Hit the Output... button in the GOLD front end to open the Output Preferences window. Use the
Information in File options to control what information is written to docked ligand files (see
Section 14.5, page 112).
The following options are available:
Save lone pairs in files
Some 3rd-party programs have difficulty reading files which contain lone pairs. You can stop
GOLD including lone pairs when it writes docked solution files by switching off this check-
box.
Save rotated hydrogens in file
SILVER uses the optimised positions of polar protein hydrogen atoms that are generated
during docking (these will usually be different for each docked ligand pose). Enable this
check-box to save the positions of rotated protein hydrogen atoms to docked solution files.
Save score in output file
Enable this check-box if you want the docked solution files to include the docking-score
terms, i.e. the total GoldScore or ChemScore value for each docking, and its components such
as protein-ligand H-bond energy, internal ligand strain energy, etc.
Output weighted SF terms
Certain docking scoring function terms are the product of a term dependent on the magnitude
114 GOLD User Guide
Output files for the docked ligand(s) may also contain additional information such as the scoring
function terms and the rotated protein hydrogen atom positions specific to that solution.
This information can be written to SD file tags; for MOL2 files, these tags are written to
comment blocks. It is possible to control the information written to solution files from the
Output Preferences window (see Section 14.2, page 111).
Solution file title strings take the form
<file_basename>|<p>|[cov<r>|]dock<q>
where
<file_basename> is the base name of the ligand input file
<p> is the molecule number in the file
<q> is the number of the docking
<r> is the covalent attachment atom. This part is only printed for covalent dockings.
For example (mol2 file):
ligand|mol2|1|dock4
where the ligand filename is ligand.mol2, the structure is number 1 in the molecule input
file, and the solution is from the fourth docking (dock4). The format for the output of the
equivalent sd input file would be the following:
ligand|sd|1|dock4
To revert to the historic output i.e. to output only the structure name e.g.
GOLD User Guide 113
Each ligand will normally be docked several times, so a given input ligand will produce a set of
files, each containing the results of a separate docking attempt.
Suppose that the original ligand file is structure.mol2. (this can contain more than one
ligand, in which case each will be docked). As the GOLD job progresses, the result of each
docking attempt is written out as gold_soln_structure_m#_n.mol2, where n is the
solution number 1,2,3 ... and m# is the number of the ligand, i.e. m1 for the first ligand, m2 for
the second, etc.
Note that the file gold_soln_structure_m1_1.mol2 is not the best GOLD prediction, it
is just the solution found in the first docking attempt. However, as GOLD proceeds, symbolic
links are created: ranked_structure_m#_1.mol2 will always point to the current top-
ranked solution, ranked_structure_m#_2.mol2 will point to the second-best solution,
and so on.
Alternatively, you can specify that all saved docking solutions for all ligands are to be
concatenated and written to a single file. To do this, open the Output Preferences dialogue by
hitting the Output... button in the GOLD front end. Then, switch on the Save solutions to one file
check-box, hit the Solutions file name button, and specify the required file name in the resulting
pop-up, e.g.
112 GOLD User Guide
of a particular physical contribution (e.g. hydrogen bonding) and a scale factor determined
e.g. by a regression coefficient. The docking scoring function terms included in the output file
can therefore consist of weighted terms, non-weighted terms or both. To include weighted
terms enable this check-box.
Output non-weighted SF terms
Enable this check-box to include non-weighted scoring function terms in the output file.
No SD-style tags in mol2 files
Enable this check-box to prevent SD-style tags being written to comment blocks in MOL2
solution files.
14.3 Specifying Directories for Output Files
Hit the Output... button in the GOLD front end to open the Output Preferences window.
Use the Output directory... entry box to specify the directory to which output files will be
written.
When more than one ligand is being docked, switch on the Create output sub-directories check
box if you want results for each ligand to be written to a separate sub-directory.
14.4 Files Containing the Initialised Protein and Ligand
GOLD produces the following output files:
gold_ligand.mol2 is the original ligand datafile with lone pairs added and the sets
DONOR_HYDROGENS and LONE_PAIRS defined.
gold_protein.mol2 is the original protein datafile with lone pairs added to binding site
atoms and the sets DONOR_HYDROGENS and LONE_PAIRS defined. The binding site is
defined in the set CAVITY_ATOMS.
Note: these set-definitions in the gold_protein.mol2 file are only accessible (i.e. visible)
through SYBYL.
14.5 Files Containing the Docked Ligand(s)
By default, docked ligands will be written out in the same format as was used for input. To
change this, hit the Output... button in the GOLD front end to open the Output Preferences
window. Then use the File and Format Options to specify the required output format. For
example: