28sici 291097 0134 2819991115 2937 3A3 3C404 3A 3aaid Prot8 3e3.0.co 3B2 2

PROTEINS: Structure, Function, and Genetics 37:404416 (1999)
Validation of Nuclear Magnetic Resonance Structures

of Proteins and Nucleic Acids: Hydrogen Geometry
and Nomenclature
Jurgen F. Doreleijers,1 Gerrit Vriend,2 Mia L. Raves,1 and Robert Kaptein1*
1Bijvoet Center for Biomolecular Research, Utrecht University, The Netherlands
2European Molecular Biology Laboratory, Heidelberg, Germany
ABSTRACT
A statistical analysis is reported of
1,200 of the 1,404 nuclear magnetic resonance (NMR)derived protein and nucleic acid structures deposited in the Protein Data Bank (PDB) before 1999.
Excluded from this analysis were the entries not yet
fully validated by the PDB and the more than 100
entries that contained F 95% of the expected hydrogens. The aim was to assess the geometry of the
hydrogens in the remaining structures and to provide a check on their nomenclature. Deviations in
bond lengths, bond angles, improper dihedral angles,
and planarity with respect to estimated values were
checked. More than 100 entries showed anomalous
protonation states for some of their amino acids.
Approximately 250,000 (1.7%) atom names differed
from the consensus PDB nomenclature. Most of the
inconsistencies are due to swapped prochiral labeling. Large deviations from the expected geometry
exist for a considerable number of entries, many of
which are average structures. The most common
causes for these deviations seem to be poor minimization of average structures and an improper balance between force-field constraints for experimental and holonomic data. Some specific geometric
outliers are related to the refinement programs
used. A number of recommendations for biomolecular databases, modeling programs, and authors submitting biomolecular structures are given. Proteins
1999;37:404416. r 1999 Wiley-Liss, Inc.
Key words: proteins; hydrogens; NMR; nucleic acids; PDB; stereochemistry; validation
INTRODUCTION
Biomolecular structures are at the basis of many studies
in a number of research fields such as drug design and
functional genomics. This research critically depends on
the quality of the coordinates. In the course of our work on
the validation of macromolecular structures13 we endeavored to assess the geometrical aspects of hydrogens of all
protein and nucleic acid coordinates that were solved with
nuclear magnetic resonance (NMR) and deposited in the
Protein Data Bank (PDB) before December, 24, 1998.4,5
Protons provide the most important information for solution structure determination by NMR spectroscopy. Hence
the correct naming of hydrogen atoms and their local
r 1999 WILEY-LISS, INC.
geometry are of paramount importance for solving, refining, and comparing NMR-derived structures.
We recently analyzed the experimental data and the
coordinates of 97 NMR-solved proteins,1 and the software
used, called AQUA, is available from the world wide web
(WWW).6,7 Unfortunately, experimental data are only available for about one third of all NMR-related PDB files,
which precludes a rigorous validation of all NMR structures against their experimental data. Here, we concentrate on a series of nomenclature and geometrical checks
that are independent of the availability of experimental data.
The calculations were performed by using new routines
implemented in the WHAT IF program,8 which can be used
as WWW servers at http://swift.embl-heidelberg.de/servers2. The data underlying this study can be found at
http://swift.embl-heidelberg.de/service/counting/nmr.
In this study the nomenclature recommended by the
IUPAC by Markley et al.9,10 is used. Many software
packages use proprietary nomenclature rules, and also the
PDB does not adhere to the IUPAC rules. The new
nomenclature replaces the older IUPAC-IUB recommendations11 for peptides and proteins that were never widely
adopted by the protein structure community. This is
probably because no software that uses this nomenclature
was available. The 1998 recommendations detail the names
for hydrogens, whereas the nomenclature for the heavier
atoms has been described earlier.12,13 The nomenclature of
heavy atoms in proteins is routinely checked by programs
such as PROCHECK14 and PROCHECK_NMR.7 WHAT IF
can also check the recently recommended nomenclature of
hydrogens in proteins and nucleic acids.
Thorough studies of the geometrical aspects of heavy
(nonhydrogen) atoms in proteins and nucleic acids are
available in the literature. Engh and Huber15 and Parkinson et al.16 derived ideal bond lengths and bond angles
Abbreviations: CSD, Cambridge Structural Database; IUPAC, International Union of Pure and Applied Chemistry; IUB, International
Union of Biochemistry; NMR, nuclear magnetic resonance; NOE, nuclear Overhauser enhancement; PDB, Protein Data Bank; rms, root mean
square; WWW, world wide web.
Grant sponsor: BIOTECH program of DGXII of the Commission of
the European Union; Grant number: BIO4-CT960189.
*Correspondence to: Robert Kaptein, Bijvoet Center for Biomolecular Research, Utrecht University, Padualaan 8, 3584 CH Utrecht, The
Netherlands. E-mail: kaptein@nmr.chem.uu.nl
Received 22 February 1999; Accepted 28 June 1999
HYDROGEN GEOMETRY IN PROTEIN NMR STRUCTURES
from a study of small-molecule crystallographic data in the

Cambridge Structural Database (CSD). The protein values
were recently confirmed by using data from eight protein
structures that were solved at atomic resolution.3,14,17
Hooft et al.18 performed a similar study to determine
standard deviations from planarity. All these studies were
based on small-molecule structures solved at very high
resolution, and the parameters obtained provide little
room for discussion. The major problem in obtaining ideal
hydrogen bond lengths, bond angles, etc., is that even in
high-resolution X-ray structures, hydrogens are not yet
observed with sufficient accuracy. Only neutron diffraction
studies can provide this information, but not many of these
studies have been performed on peptides or amino acids.
Force fields used for NMR structure determination show
large differences for some parameters. For example, differences up to 30% in bond length of the sulfhydryl group can
be observed between the AMBER force field19 and the
parallhdg.pro force field of X-PLOR.20
For our analyses, a set of reference values and standard
deviations for hydrogen bond lengths, bond angles, and improper dihedral angles was estimated. The observed differences from these values will be expressed by a root mean
square (rms) Z score, with the Z score being equal to the
number of standard deviations from the reference values.
Our interest lies not so much in the absolute Z scores but
more in the outliers and their possible relation to the refinement protocols used. This study shows that improvements
can be made in NMR structure determination protocols,
and our data allow us to make several recommendations.
MATERIALS AND METHODS
The discrepancies between PDB and IUPAC atom nomenclature could in most cases easily be taken into account by
a simple one-to-one mapping table (see Table I). For
instance, for a ribose, the PDB uses an asterisk, whereas
the IUPAC uses a prime (H2* versus H28). It was equally
straightforward to convert atom designators, such as HB1
and HB2, into H2 and H3. Special software was written
within the WHAT IF program8 to ensure that all prochiral
groups are renamed according to IUPAC rules. Sometimes,
the hydrogen nomenclature was found to be mixed up, i.e.,
a valid atom designator was used, but the name did not
correspond to its location in the molecule (e.g., H2 bound
to C). In these cases, the orientation of the hydrogen
relative to the heavy atoms was used to determine the
proper atom designator.
Sporadically, atoms were observed with identical names
but different coordinates, or with different names but
identical coordinates. For some of these cases, dedicated
software was written to solve the nomenclature problems,
but because these problems occurred infrequently, not
much effort was made in this direction. Occasionally,
groups are attached to residues, e.g., N- or C-terminal
blocking groups, sugars, hydrogens on Asp, Glu, or the
N-terminus, etc. In those cases in which the atom names
did not permit automatic detection of the atom types of
these attached groups, the program recorded that something was bound and did not attempt to resolve the nature
405
of the attached group. In these cases the missing atoms

and the residues to which these groups are bound were not
included in the analyses. The 58- and 38-terminal phosphate groups in deoxyribonucleic acid (DNA) and ribonucleic acid (RNA) were also excluded from the analyses.
The present study does not consider the pseudo atoms that
were found in 50 PDB entries, although the WHAT IF
software is capable of detecting and handling them.
At present, insufficiently reliable experimental data are
available to derive a set of standard values for the hydrogen geometry in macromolecular structures. Thus, to
make meaningful comparisons, ideal values were estimated. Of course these parameters are somewhat arbitrary, but because the interest lay mainly in the relationship between refinement protocols and geometry, the
arbitrary nature of the parameters does not pose a serious
problem.
All hydrogen bond lengths were assumed to be 1.00
0.05 , except for the sulfur-bound hydrogens that were
assumed to have a bond length of 1.30 0.05 . For
hydrogens connected to sp2- and sp3-hybridized atoms the
expected bond angles were assumed to be 120.0 5.0 and
109.5 5.0, respectively. The bond angles in fivemembered aromatic rings were assumed to be 125.0
5.0. The deviation from ideal planarity for hydrogens is
expressed here as the rms distance to the plane defined by
the heavy atoms. The tetrahedral geometry of heavy atoms
with attached hydrogens in amino acids, as measured with
improper dihedral angles, was derived from amino acids
that were energy-minimized by using WHAT IF. The
estimated standard deviation was set to 5 for all improper
dihedral angles. Improper dihedral angles were calculated
as the torsion angle from the central atom over the first
two bound atoms to the third bound atom. To define first,
second, and third bound atoms, the atoms are sorted by
using the IUPAC priority rules. With use of this scheme,
the value of the improper dihedral angles falls between
36 and 28 or 28 and 36 for ideal geometry. Only one
improper dihedral angle was used for each hydrogencontaining nonplanar group. Aliphatic groups with only
one hydrogen (Val-C, Leu-C, non-Gly-C, Ile-C, and
Thr-C) have not been checked because their impropers
are always defined without using the hydrogens. Atom
names were checked and corrected if necessary to ensure
correct prochiral labeling before the improper dihedral
angle was calculated.
The parameters for hydrogen geometry that are used
here are only approximate and neglect features known to
exist. The effect of hydrogen bonding and differential
electronegativity of the heavy atom are just two effects
that modify the positions of hydrogen atoms but are not yet
incorporated in our validation software.
Normally outliers were defined by being more than four
standard deviations (4) from the mean, but for planarity
deviations, the cutoff was simply set at 0.25 . For many of
the normality analyses presented in this report, the rms Z
score is used. The Z score is the number of standard
deviations that an observation deviates from the mean.
406
J.F. DORELEIJERS ET AL.
TABLE I. Comparison of IUPAC and Consensus

PDB Atom Nomenclatures
IUPAC
PDB

PDB Atom Nomenclatures (Continued)
Sta
Amino acid
terminal amine
H1
H2
H3
H1
H2
H3
1H
2H
3H
HN
HA
HB1
HB2
HB3
HB2
HB3
HG2
HG3
HD2
HD3
HE
HH11
HH12
HH21
HH22
HB2
HB3
HD21
HD22
HB2
HB3
HD2
HB2
HB3
HG
HB2
HB3
HG2
HG3
HE21
HE22
HB2
HB3
HG2
O
O2
HXT
H
HA
1HB
2HB
3HB
1HB
2HB
1HG
2HG
1HD
2HD
HE
1HH1
2HH1
1HH2
2HH2
1HB
2HB
2HD2
1HD2
1HB
2HB
HD2
1HB
2HB
HG
1HB
2HB
1HG
2HG
2HE2
1HE2
1HB
2HB
1HG
Glycine
Histidine
Terminal carboxyl
Alaninec
Arginine
Asparagine
d
d
Aspartate
Cysteine
Glutamine
d
d
Glutamate
O8
O9
H9
HN
H
H1
H2
H3
H2
H3
H2
H3
H2
H3
H
H11
H12
H21
H22
H2
H3
H21
H22
H2
H3
H2
H2
H3
H
H2
H3
H2
H3
H21
H22
H2
H3
H2
Isoleucine
pro-R
pro-S
pro-S
pro-R
pro-S
pro-R
ZZ
ZE
EZ
EE
pro-S
pro-R
E
Z
pro-R
pro-S
Lysine
pro-R
pro-S
pro-S
pro-R
Methionine
The Z score for a single observation, e.g., a hydrogen bond

angle, with the individual value xij for type i and model j is
defined as:
Zij
Leucine
pro-S
pro-R
pro-R
pro-S
pro-S
xij i
i
where i and i are the idealized average and standard

deviation, respectively. The Z score can be positive or
negative, depending on whether the observed value is
larger or smaller than the average value, respectively.
Assuming a Gaussian distribution, an absolute Z score of
more than one and more than four should be found for 32
PDB
Sta
2HG
HE2
1HA
2HA
1HB
2HB
HD1
HD2
HE1
HE2
HB
1HG1
2HG1
1HG2
2HG2
3HG2
1HD1
2HD1
3HD1
1HB
2HB
HG
1HD1
2HD1
3HD1
1HD2
2HD2
3HD2
1HB
2HB
1HG
2HG
1HD
2HD
1HE
2HE
1HZ
2HZ
3HZ
1HB
2HB
1HG
2HG
1HE
pro-R
IUPAC
H3
H2
H2
H3
H2
H3
H1
H2
H1
H2
H
H12
H13
H21
H22
H23
H11
H12
H13
H2
H3
H
H11
H12
H13
H21
H22
H23
H2
H3
H2
H3
H2
H3
H2
H3
H1
H2
H3
H2
H3
H2
H3
H1
HG3
HE2
HA2
HA3
HB2
HB3
HD1
HD2
HE1
HE2
HB
HG12
HG13
HG21
HG22
HG23
HD11
HD12
HD13
HB2
HB3
HG
HD11
HD12
HD13
HD21
HD22
HD23
HB2
HB3
HG2
HG3
HD2
HD3
HE2
HE3
HZ1
HZ2
HZ3
HB2
HB3
HG2
HG3
HE1
pro-R
pro-S
pro-S
pro-R
pro-R
pro-S
pro-R
pro-S
pro-R
pro-R
pro-R
pro-S
pro-S
pro-S
pro-R
pro-S
pro-R
pro-S
pro-S
pro-R
pro-S
pro-R
pro-S
pro-R
pro-S
pro-R
and 0.01% of all observations, respectively. The rms Z score

is defined as:
rms Z
n,m
(Z )
n m i,j1
ij
and for a distribution with the expected average and

standard deviation the rms Z score is 1 by definition.
For all entries, the refinement programs were extracted
from the PDBFINDER database.21 If more than one keyword was given, the last one was used. Entries with the
last keyword CHARMM were, nevertheless, classified as
X-PLOR because they were almost all refined with the
407


IUPAC
Phenylalanine
Prolined
d
Serine
Threonine
Tryptophan
Tyrosine
Valine
All
PDB
H2
H3
H2
H3
H1
H2
H1
H2
H
H2
H3
H2
H3
H2
H3
H2
H3
HE2
HE3
HB2
HB3
HD1
HD2
HE1
HE2
HZ
H2
H3
HB2
HB3
HG2
HG3
HD2
HD3
2HE
3HE
1HB
2HB
HD1
HD2
HE1
HE2
HZ
H2
H1
1HB
2HB
1HG
2HG
1HD
2HD
H2
H3
H
H
H1
H21
H22
H23
H2
H3
H1
H1
H3
H3
H2
H2
H2
H3
H1
H2
H1
H2
H
H
H11
H12
H13
H21
H22
H23
HB2
HB3
HG
HB
HG1
HG21
HG22
HG23
HB2
HB3
HD1
HE1
HE3
HZ3
HH2
HZ2
HB2
HB3
HD1
HD2
HE1
HE2
HH
HB
HG11
HG12
HG13
HG21
HG22
HG23
1HB
2HB
HG
HB
HG1
1HG2
2HG2
3HG2
1HB
2HB
HD1
HE1
HE3
HZ3
HH2
HZ2
1HB
2HB
HD1
HD2
HE1
HE2
HH
HB
1HG1
2HG1
3HG1
1HG2
2HG2
3HG2
Sta
IUPAC
Sta
PDB
Nucleic acid
pro-R
pro-S
HO58
HO38
H5T
H3T
OP1
OP2
O1P
O2P
C18
H18
C28
H28
H29
O28
HO28
C38
H38
O38
C48
O48
H48
C58
O58
H58
H59
C1*
H1*
C2*
1H2*
2H2*
O2*
2HO*
C3*
H3*
O3*
C4*
O4*
H4*
C5*
O5*
1H5*
2H5*
H2
H61
H62
H8
H41
H42
H5
H6
H1
H21
H22
H8
H3
H6
C7
H71
H72
H73
H3
H5
H6
H2
1H6
2H6
H8
1H4
2H4
H5
H6
H1
1H2
2H2
H8
H3
H6
C5M
1H5M
2H5M
3H5M
H3
H5
H6
Phosphate
e
e
pro-R
pro-S
-D-2-(Deoxy)ribose
pro-R
pro-S
pro-R
pro-S
pro-S
pro-R
pro-S
pro-R
f
f
pro-S
pro-R
pro-R
pro-S
Purines and
adenine
e
e
Cytosinee
e
pro-R
pro-S
Guanosine
e
e
Thymine
Uracil
pro-S
pro-R
pro-S
pro-R
Z
E
Z
E
Z
E
hydrogen and deviating heavy atom names are given. The backbone hydrogens of amino acids are only given for alanine. In cases in which
IUPAC names contain Greek or superscripted characters the name was repeated with Roman characters without superscripting.
aThe stereochemistry is indicated as a reference: the pro-R/pro-S nomenclature27,28 for prochiral tetrahedral groups and the Z/E nomenclature for
planar groups.29
bNot present in any of the studied structures.
cAccording to the IUPAC nomenclature, H is a valid atom designator, but HN is generally preferred by the NMR community for clarity.
dThe consensus PDB atom nomenclature deviates from the IUPAC nomenclature by interchange of the designators for the side-chain amide hydrogens of
Asn and Gln and the sec-amino hydrogens of N-terminal Pro.
eThe amino hydrogens and the phosphorous oxygen atoms in nucleic acids are labeled in the PDB naming scheme without clear stereochemical
preference.
fFor brevity, only RNA is included. In DNA sugars the 28-hydroxyl group (O28 and HO28) should be replaced by a hydrogen (H29).
408
TABLE II. Reasons for Excluding NMR Entries

Reason
All entries
Layer 1a
Lacking Hydrogensb
5 residuesc
Poorly or not minimized average structuresd
Remaining entries
No. of entries
1,404
79
109
12
4
1,200
The
list was compiled by using the 3DB software from the PDB on
December 24, 1998.
aLayer 1 entries had not been fully validated by the PDB to yield Layer
2 (normal) entries.
bEntries with 95% of the expected hydrogens were excluded.
cIf a model contained fewer than five of the common residues (20 amino
acids and 5 nucleic acid residues), the entry was excluded.
dFour entries (1BBA, 1COD, 1HDP, and 1NIL) were found to be
inadequately minimized average structures as evidenced by an rms Z
score of the heavy-atom bond angles 7 .
X-PLOR program (present as another keyword) and presented no significant differences with the other X-PLOR
entries. The five most commonly used refinement programs for NMR structures are X-PLOR,20 DISCOVER
(Molecular Simulations, San Diego, CA), AMBER,19
DIANA,22 and DGII.23 For some entries, the refinement
program was unavailable in the PDBFINDER database
because the PDB headers do not always contain this information, or the program was not scored as one of the keywords,
as is the case for the entries refined with DYANA. These
entries, along with those refined with less common refinement programs, were classified in a separate set.
RESULTS AND DISCUSSION
Test Set of Protein and Nucleic Acid Structures
We tried to be as inclusive as possible in our test set of
protein and nucleic acid structures but had to reject 204
from the 1,404 entries from further analysis (see Table II).
Excluded were entries that were only partly validated by
the PDB (layer 1 release), entries with too many missing
hydrogens, entries containing inadequately minimized
structures, and those with fewer than 5 normal residues.
From the remaining 1,200 entries, 820 multimodel entries
consisted on average of 19 10 models with a maximum of
80 models in one ensemble. Each model contains on
average 84 55 amino acids (1,010 entries containing
amino acids) or 21 8 nucleotides (234 nucleic acidcontaining entries) and 2 3 residues that differed from
the 20 common amino acids or 5 nucleotides (179 entries).
These nonstandard residues are not discussed here. Each
protein or nucleic acid model contains on average 1,230
847 atoms. The total number of atoms in the data set was
just over 20 million, of which approximately half were
hydrogens. In the following paragraphs the results of the
individual checks are discussed.
Missing hydrogens
Missing hydrogens present a serious problem, especially
when the local geometry deviates significantly from ex-
pected values, so that it is difficult to recalculate their

positions. The IUPAC NMR task group recommended that
the coordinates of all atoms, expected to be present in the
structure, should be submitted to the archives of the data
banks.9,10 For 109 entries, 5% of the expected hydrogens
was absent. From this set of 109 entries, 64 had no
hydrogens at all (3 entries consisted of only C coordinates), 21 entries contained only polar hydrogens (e.g.,
structures refined with GROMOS), and 24 entries had
between 0 and 95% of the expected hydrogens present
(e.g., structures that contained only one hydrogen on each
methylene carbon). These entries have been excluded from
the analyses as shown in Table II.
Average structures
The PDB contains 380 single-model NMR entries, 342 of
which are energy-minimized average structures. Entries
containing average structures were detected by searching
the PDB records COMPND, TITLE, and EXPDATA for the
keywords average or mean. Additional average-structure entries were identified among the single-model entries by manual inspection of the PDB headers. The
ensemble that is available for many average structures is
included as separate entries in our analyses. Four of the
average structures had an overall rms Z score for the bond
angles involving heavy atoms larger than 7 and interatomic distances as small as 0.1 can be found. These
anomalies most likely are the result of inadequate energy
minimization. These four entries were omitted from our
analyses because it does not make sense to evaluate the
quality of hydrogen-related geometry if the heavy-atom
geometry has such serious defects (see Table II).
Protonation state of amino acids
In addition to the analysis of missing atoms, the presence of hydrogens on polar atoms in the side chains of
amino acids was investigated. The observed protonation
states of the amino acids as found in the PDB file was
compared with those under common NMR conditions
(neutral or slightly acidic pH). The overall frequencies of
protonation states for each amino acid are listed in Table
III. In total, 112 entries contain amino acids that have an
anomalous protonation state. For some of the entries it
seems that a nonphysiological state was introduced to
maintain neutral amino acids. This was performed because charges can distort the structure in some molecular
dynamics protocols if no solvent is present. In many other
cases, justification for the nonstandard protonation state
is not clear from the information available in the PDB
entry. In all cases of anomalous protonation states fewer
hydrogens are present than expected. Therefore, it is not
surprising that there is a clear correlation between the
percentage of amino acids with deviating protonation state
and the percentage of hydrogens that are present in the
structure (see Fig. 1). The labeled outlier, 1MNB, is the
structure of a DNA-bound Arg-rich peptide that lacks
the H of each arginine.
409
TABLE III. Protonation States of Individual Amino Acids

Amino acid
Arginine
Asparagine
Glutamine
Aspartate
Glutamate
Cysteinec
Histidine
Lysine
Serine
Threonine
Tryptophan
Tyrosine
Total
Labile
hydrogens
Hydrogens
lacking
5
2
2
0/1
0/1
1
0/1/2
3
1
1
1
1
1,644b
742
3,224
79
5,689
Normal residues
Total residues
41,929
38,997
35,998
46,237/1,333
63,870/1,925
37,241
281/16,493/1,530
69,385
51,013
48,174
11,109
28,711
494,226
43,573
38,997
35,998
47,570
65,795
37,983
18,304
72,609
51,092
48,174
11,109
28,711
499,915
Anomalous
entriesa
38
22
46
6
112
Only
amino acids that contain a polar side-chain atom that can be protonated under physiological conditions are
listed. The N- and C-terminal groups have not been checked here.
aThe number of entries containing one or more anomalous protonation states is listed.
bOf these 1,644 arginines, 62, 17, and 1,565 had only 1, 3, and 4 of the expected hydrogens, respectively.
cCysteines for which the S is close to a cation or other sulfur atom were not considered to have a missing
hydrogen.
Fig. 1. Correlation between the percentage of residues with unexpected protonation states and the percentage of expected hydrogens
present. All deviating states have fewer hydrogens than expected, which
caused the two quantities to be correlated. The protonation state of
nucleotides has not been studied, and entries containing only nucleotides
are not shown in this figure. Polar atoms, such as Cys S that are
coordinated by a cations other than H are counted as if a hydrogen is
present. The symbols open circle and cross represent entries that contain
protein and protein-nucleic acids complex, respectively. The labeled entry,
1MNB, is described in the text. There is a clustering of a large number of
entries with all hydrogens present and no deviating protonation states.
Nomenclature
Nomenclature differences between IUPAC and PDB
consensus
The IUPAC nomenclature was described extensively
elsewhere.9,10 Some hydrogen nomenclature rules have
already been described in the older IUPAC study12 to
which the 1998 recommendations refer (e.g., the numbering of hydrogens within a methyl group). Sample coordinate files containing IUPAC atom designators for the
common amino acids and nucleotides are available from
WWW address (http://swift.embl-heidelberg.de/service/
names).
There are two main differences between the IUPAC
nomenclature and the consensus naming scheme that is in
use at the PDB (see Table I). The first difference concerns
the numbering of the methylene hydrogens (in a CH2
group), which starts with 2 in the IUPAC nomenclature
and with 1 in the PDB nomenclature. The second difference is that the IUPAC always uses a suffix number, but
the PDB sometimes uses a prefix. For example, the first
methyl hydrogen on C2 of threonine is called H21 and
1HG2 in the IUPAC and PDB nomenclature, respectively.
There are a few other differences between the IUPAC
and the PDB nomenclature. In the IUPAC nomenclature
the stereochemical numbering of the carboxylic oxygen
atoms depends on the presence of a hydrogen, the oxygen
atom bearing the hydrogen being numbered 2. When there
is no hydrogen, the original convention that the orientation of the group determines the numbering is applied. The
PDB nomenclature uses only this latter rule even if a
hydrogen is present. The IUPAC name for the methyl
carbon in thymine is C7, but this atom is named C5M in
PDB files, and the names of the methyl-bound hydrogens
differ accordingly. The oxygen atoms on the phosphorous
atom of the nucleic acid backbone are named OP1 and OP2
in the IUPAC nomenclature instead of the older names
O1P and O2P that are in use by the PDB. The PDB
nomenclature of the side-chain amide hydrogens of asparagine and glutamine and the sec-amino group of N-terminal
proline is inverted with respect to the IUPAC nomenclature. The amino hydrogens and the phosphorous oxygen
atoms (OP1 and OP2) in nucleic acids display no clear
410
TABLE IV. Deviations From the PDB Consensus

Nomenclature for NMR-Derived Structures
consensus for prochiral labeling in the PDB files as was

previously noted by Feigon and Schultze.24
Deviations from consensus PDB nomenclature
Our data set contains approximately 20 million atoms of
which almost 15 million have been checked for nomenclature and geometry. In the implementation used for this
study, WHAT IF can read up to 500 molecules, 4,000
residues (amino acids or nucleotides), and 32,000 atoms.
Large ensembles of models that exceeded one or more of
these numbers have been truncated, resulting in a total of
5 million atoms that were omitted from our analyses. The
nomenclature in the PDB entries showed a clear consensus for most atom types and is given in Table I. Overall,
247,000 inconsistencies were found, which represents
1.7% of the total number of checked atoms, and most of
these arise from inconsistencies between the coordinates
and the labeling of prochiral hydrogens (240,000). The
stereochemical inconsistencies can be divided over 14
stereochemical groups as listed in Table IV. Inconsistencies were observed in the nomenclature for all but one of
these, but some differences are more common than others.
The disparities are not evenly distributed over the
entries, but clusters can be observed that share common
nomenclature problems. For example, in 172 entries the
prochiral labeling of one or more methylene groups is
inverted with respect to the consensus. It seems that some
of these entries entered the database without the usual
check of nomenclature by the PDB. For some of these
entries a naming scheme that is reminiscent of X-PLOR
was used in which the 1/2 labeling was consistently
swapped (e.g., entry 1BBO). The DIANA naming scheme
that uses the 2/3 numbering as in the IUPAC nomenclature is found for one entry: 1ATX.
Except for the stereochemical inconsistencies, some
other, less frequent, aberrations were observed. A digit
added or left out was one of the more common nonstereochemical discrepancies. The PDB atom name 2HO* found
in entry 2STW for the H29 in a DNA is rather strange
because DNA does not have an O28. The same atom name
is sometimes used twice in one residue, e.g., 2HB for Cys57
in entry 2BUS.
Type
Description
1
2
3
Methylene
Methyl
Side-chain amide
(Asn and Gln)
Heavy atoms (Arg)a
Hydrogens (Arg)b
Amino (nucleic acids)
Amino (Lys)
Amino (N-terminal)
Sec-amino N-terminal
(Pro)
Iso-propyl (Leu and
Val)
Aromatic - and
-atoms (Phe and
Tyr)
Oxygen atoms on
phosphate
Oxygen atoms
(Asp and Glu)
Oxygen atoms
(C-terminal)
Stereochemicalc
Nonstereochemicalc
Total inconsistencies
Total checkedd
4
5
6
7
8
9
10
11
12
13
14
Affected
atoms
Inconsistently No. of
labeled
entries
atoms
involved
2
3
2
66,994
32,563
7,064
172
214
405
6
2
2
3
3
2
41,086
3,227
26,508
6,675
1,207
425
155
174
78
63
2,242
6,201
35
29,936
144
6,908
72
9,671
637
240,282
6,855
247,137
14,761,635
1,059
264
1,109
1,200
Fourteen
Geometry
Bond lengths
types of stereochemical groups in proteins and nucleic acids

are given in the first column. The number of atoms in the group
affected by the stereochemistry is listed in column 3. The total number
of inconsistently labeled atoms and entries containing at least one
inconsistency are listed in the last two columns, respectively. Inconsistencies are counted only once for each atom, e.g., if the PDB labels H11
of Leu as 3HD2, then this is categorized as an inconsistency of the
iso-propyl group only.
aIn the guanidinium group of arginine a swap in the N1 and N2 labels
also affects the four hydrogens.
bHere only the 1/2 labeling is considered, e.g., H11/H12.
cThe isotope type deuterium instead of proton was changed to proton
for computational reasons. Only the leucine methyl group of one entry
(1PG1) in our current set actually contains deuterium, so this effect is
not significant for the outcome of the analysis of the current set. As
deuterium-labeled samples become more common, future WHAT IF
versions will support this isotope as a hydrogen alternative.
dThe total number of atoms in the data set was 20,596,428, but some
ensembles were truncated because the remaining models exceeded the
capacity of the version of WHAT IF that was used.
Figure 2 shows a correlation plot of the rms Z score of the

hydrogen versus heavy-atom bond lengths. The rms Z
score is 1 if the observed values have the same distribution
as the Gaussian distribution defined by the reference
value and standard deviation (). For heavy-atom bond
lengths, the average rms Z score was 1.1, with a range of
0.26.3, and for hydrogen bond lengths this value was 1.5
). Averaged NMR
with a range of 1.23.0 ( 0.05 A
structures are relatively overrepresented in the outliers in
this figure. Figure 3 shows an expansion of the same
correlation, with clusters of different symbols representing
different types of molecules (a) and refinement programs
(bd). Nucleic acid structures are, with only a few excep-
tions, solved with heavy atom bond lengths that do not

comply with the reference values for small molecules.16
Most structures (66 %) were solved by using the X-PLOR
program and are shown in Figure 3b. The bulk of the
X-PLOR entries have a heavy-atom rms Z score of 1.0 and
a hydrogen rms Z score of just over 1.4 (corresponding to
0.07 ). Higher rms Z scores for hydrogen bond lengths are
observed for all other common refinement programs except
for a few entries using DIANA or the AMBER force field
(shown in Figure 3c and d). The largest deviations from
our reference parameters for hydrogen bond lengths are
observed for entries solved by DISCOVER and DGII.
411
Fig. 2. Correlation between the rms Z score of bond lengths for

hydrogens and for heavy atoms. The boxed section is also shown
enlarged in Figure 3. The 342 average NMR structures (described in the
text) are labeled by filled squares; other entries are labeled by open
circles.
The rms Z scores over all bond lengths provide interesting differences between types of molecules and refinement
programs. Because only the correct reference values for
heavy atoms are precisely known, one should be careful in
evaluating the hydrogen values for which the focus is only
on identifying outliers. Figure 4 shows the percentage of
highly distorted ( 4 ) bond lengths and bond angles
involving hydrogens. The bond angles are discussed below.
The labeled entries, many of which are average structures,
contain various errors. In the worst case, the four average
structures 1SJL, 1SJK, 1AGU, and 1A9I, contain thymine
methyl groups that have carbon-hydrogen bond lengths as
small as 0.6 . This bad local geometry is likely the result
of inadequate energy minimization after averaging coordinates. Entry 1D83 has the highest percentage of both bond
length and bond angle outliers of all nonaveraged structures. The outliers in this entry are the bond lengths of
C28-H28 (sugar) and C8-H8 (Gua). In entry 1D69, Thy2
and 10 in both chains have three methyl hydrogens that
have the exact same coordinates as the position of the
pseudoatom M7. Entry 1RCS has a bond length of C38-H38
(sugar) of 1.2 in all occurrences and entry 2NR1
contains - and -hydrogens (Met) with 1.2 bond lengths
as well. Similarly, entry 1D3X has bond lengths for C28H28 (sugar) of Cyt20 of 1.2 in all models but not in any of
the other residues. The entry 1NCV has few hydrogen
bond length outliers, but the bond lengths of Phe15 C1-H1
and Tyr28 C2-H2 are 1.4 in the last model in which both
residues are highly distorted. The most common outlier is
found in many X-PLOR entries. X-PLOR uses for the
cysteine S-H a bond length of 1.0 , whereas 1.3 would
be more correct. This difference causes the only outliers for
Fig. 3. Correlation between the rms Z score of bond lengths for

hydrogens and heavy atoms. a: A clear clustering of values observed for
nucleic acids. bd: The observed values are shown for the five most
commonly used refinement programs: X-PLOR, AMBER, DIANA,
DISCOVER, and DGII. The remaining 283 entries have either not been
categorised (183 entries) or were solved with less common refinement
programs (100 entries). The number of entries for the five refinement
programs are listed in the legends enclosed in parentheses.
many entries. There are 189 entries with outliers and 15

entries in which 1% of the bond lengths are outliers.
More than 80% of all observed outliers arise from the
underestimated bond length in the sulfhydryl group by
X-PLOR.
Bond angles
Figure 5 shows the correlation between the rms Z score
for bond angles of heavy atoms and hydrogens. The
average structures dominate the outlier regions as was
also observed for bond lengths. In contrast to the values for
bond lengths, the deviations in bond angles for hydrogens
are much smaller. This indicates that the reference values
used here for the bond angles compare well with those
used in the community, and the variation in most entries is
smaller than the estimated 5. Again, the nucleic acid
structures have a systematically poorer score than the
protein structures, especially for heavy atoms (b).
The percentage hydrogen bond angle outliers can be
read from Figure 4. There are 185 entries with outliers,
and in 24 of these 1% of the bond angles are outliers,
which is about the same frequency as for bond lengths. In
general, entries with outliers for bond angles and bond
412
Fig. 4. Correlation plot of the percentage outliers of hydrogen bond

lengths and bond angles. The labeled entries are discussed in the text.
Symbols represent averaged and nonaveraged structures as in Figure 2.
lengths are uncorrelated, as can be seen by the clustering

along the two axes. In entry 1D83, the bond angle outliers
of many groups vary between 80 (H58-C58-H59 of Cyt16B)
and 139 (H28-C28-H29 of Ade12B). The outliers present in
entry 1NCV and 1GN7 seem to be randomly distributed
over the different angles, whereas those in entries 156D
and 1P3X are predominately found in sugar rings of the
nucleic acid structures. Many outliers (mainly solved with
the program DSPACE) originate from threonine C-O1H1 angles that are smaller than 90. Another substantial
group of outliers are average structures in which the
methyl or amino groups are distorted as was discussed in
the previous section. In general, average structures have a
slightly higher percentage of outliers for bond angles than
for bond lengths. Bond angles are probably more difficult
to correct than bond lengths by simple energy minimization.
The entries with an rms Z score for the methylenes 1.0

that are nonaveraged structures were refined with different programs and showed no trends in the residues that
were reported. Entries 2HIR and 4HIR have distorted
tetrahedral geometry for many Gly-Cs, and entry 1DEF
has the same problem for His132 C and His136 C.
Tetrahedral geometry
Peptide planarity
The rms Z score from tetrahedral geometry is shown in

Figure 6 for methylene and methyl (and Lys amino) groups
( 5). For comparison, the standard deviation in the
improper dihedral angle measuring the C tetrahedral
geometry, C-N-C-C, was 4.17 in a set of high-resolution
X-ray structures.25 Good agreement with the standard
values assumed here is observed for most amino acidcontaining entries. The rms Z score is on average 0.3 for all
entries. Entry 1NCV, which was solved with reduced
energy terms for the improper dihedral angles, exhibits
severely distorted geometry. (The authors have indicated
that they will replace entry 1NCV for this reason [Dr. S.
Meunier, personal communication].) The group of average
structures, highlighted in Figure 6, suffers from averaging
of the methyl group as previously described. Other outliers
for the methylenes are also mostly in average structures.
Generally, the deviation from planarity of hydrogen

atoms in planar groups was measured as the distance to
the plane defined by the heavy atoms expected to be
planar. In case of peptide planarity these heavy atoms are
Ci, Ci, Oi, Ni1, and Ci1 of residues i and i1. The
observed distribution of the dihedral angle for proteins solved at atomic resolution is 179.0 5.6.3 If the
amide hydrogen has the same deviation as that measured by the angle, and if both out-of-plane distributions are the same, then 0.1 out-of-plane corresponds
to a 1 deviation. The rms deviation from this plane
for heavy atoms and hydrogens is clearly correlated
as can be observed in Figure 7a. The deviation of heavy
atoms can intrinsically be lower than the deviation of
hydrogens because the reference plane only includes
heavy atoms. For peptide planarity there is little differ-
Fig. 5. Correlation between the rms Z score of bond angles for

hydrogens and for heavy atoms. Symbols represent entries as in Figure 2
and 3a. The entries are categorized on the basis of being an average
structure or not in (a), whereas in (b) the entries are categorized as
protein, nucleic acid, or protein-nucleic acid complex.
Fig. 6. Rms deviation from tetrahedral geometry of methylene and

methyl and/or protonated amino groups. Entries enclosed in the dashed
box are average structures that have large deviations of methyl and/or
amino groups. Three nonaverage structures are labeled and are discussed in the text. Entries with hydroxy-proline have been excluded (17
entries) for practical reasons. Entries without amino acids (190 entries)
are not included here.
ence between average structures and the other structures. Figure 8 shows the rms deviations from planarity
for the peptide bond versus the side-chain planarity discussed below. The X-PLOR entries and half of the entries
solved with DIANA are too restricted in the peptide
planarity with respect to the amide hydrogen. More than
90% of the X-PLOR entries have an rms deviation of the
hydrogens planarity of 0.05 . Entries solved with
AMBER, DISCOVER, or DGII show higher rms deviations.
In Figure 9, the percentage of peptide hydrogen outliers
( 0.25 ) is shown. A deviation for a hydrogen of 0.25 , at
perfect planarity of the five heavy atoms, corresponds to a
rotation out of the plane by 14. For the dihedral angle
this rotation is 2.5 away from the mean which is expected
for 1% of the residues. There are 188 and 95 entries that
have a percentage of outliers 1 and 5%, respectively.
Relatively many of these entries were solved with AMBER,
DISCOVER, or DGII. Thus, for a considerable number of
NMR-solved proteins the distribution of the amide hydrogen deviation from planarity differs substantially from the
distribution of the angle of the atomic resolution protein
structures.
Planarity of amino acid side chains
The combined planarity distortions of side chains of the
amino acids Arg, Asn, Asp, Gln, Glu, His, Phe, Trp, and Tyr
is shown in Figure 7b. Hooft et al.18 derived the heavyatom planarity standard deviations from the CSD with
values ranging from 0.0046 (His) to 0.037 (Arg). For
aromatic groups, these numbers were derived without the
413
atoms that are directly bound to the aromatic group; C of

Phe, His, Trp, and Tyr, and Tyr-O. For these atoms, called
group 2 atoms by Hooft et al., the values range from
0.040 (Tyr-O) to 0.074 (Trp-C); thus, these atoms are
much more out of plane than the ring atoms. Although no
data are available for hydrogen deviations, we have used
the same cutoff value as for the peptide bond planarity
(0.25 ) for all planar groups to denote outliers. This value
is approximately a factor four larger than the largest value
found for the group 2 atoms (0.074 for Trp-C).
In this study, the observed range of rms deviations per
entry spans from 0.0001 to 0.30 and from 0.0003 to 0.87
for heavy atoms and hydrogens, respectively. Entries
highlighted in Figure 7b have arginines in which the H
(and C) is almost perpendicular to the rest of the guanidinium group. In fact, this is the most common outlier; 549
entries, solved by different programs, have one or more
cases in which the Arg hydrogen rms deviation is 0.25 .
Outliers for the other amino acid types are present in
significantly fewer entries, with the maximum of 81 entries for Asn outliers.
Other outliers in Figure 7b are 1MHU and 2MHU with
only a single Gln or Asn, respectively, and 2HIR and 4HIR
that both have several Asn and Gln amide groups in which
the hydrogen bonds are nearly perpendicular to the heavyatom plane. Figure 8 shows the rms deviations from
planarity for the peptide bond versus the side-chain planarity, and Figure 9 shows the percentage of planarity outliers. Six entries in which each planar group is an outlier are
highlighted in Figure 9. Except for entries 1MHU and
2MHU, discussed above, the other entries contain nucleic
acid-bound arginine (1AJU, 1AKX, and 1ARJ) or a nucleic
acid-bound peptide that is rich in arginine (1GIB). Overall,
a much higher percentage of outliers is observed for
side-chain group planarity than for peptide bond planarity.
The X-PLOR refined structures showed severe distortions
of planar groups other than the peptide bond discussed
above. Of the X-PLOR entries, 65% had 2.5% planarity
outliers. On the other hand, many entries exhibited variations much smaller than expected. There are 216 and 127
entries that had hydrogen rms deviations from planarity of
0.01 and 0.005 , respectively. This is considerably
smaller than the 0.040 nonplanar deviation of Tyr-O,
which was the minimum value for group 2 atoms in the
CSD. Figure 10a shows Arg1 in model 1 of entry 1GIB, in
which all the heavy atoms are planar except N and the
hydrogens have a high rms deviation of 0.8 to the
heavy-atom plane because of H.
Distortions of geometry by nuclear Overhauser
enhancement (NOE) restraints
A detail of the HU protein is shown in Figure 10b as an
example of how NOE distance restraints can distort the
local geometry. A sequential NOE distance restraint between H21 of Asn2 and HN of Lys3 has been observed
experimentally.26 The NOE, drawn as a dashed line in
Figure 10b, was stereospecifically assigned to H21. However, during the high-temperature stage of the molecular
dynamics simulation of some models the NH2 group of
414
Fig. 7. Correlation between the rms deviation of

planarity for hydrogens and heavy atoms. A distinction was made between the peptide bond planarity
and planarity of groups in amino acid side chains.
Some entries containing amino acids do not have a
single peptide bond (6 entries) or planar group (8
entries) that could be checked. The labeled entries
are discussed in the text. Symbols represent averaged and nonaveraged structures as in Figure 2.
Asn2 flipped. After this flip, the nomenclature of the

hydrogens failed to switch, and the NOE restraint then
referred to the wrong proton. Thus, the planarity of both
amide groups became distorted because of this flip. The
small violation of 0.08 in the case of Figure 10b would
have been 0.88 if proper geometry had been maintained.
CONCLUSIONS
We have studied aspects of nomenclature and geometry
of hydrogens in 1,200 of the available 1,404 PDB entries
that were solved by NMR. In addition to a large number of
nomenclature problems, we detected severe deviations
from standard geometry.
A set of slightly more than 100 entries contained 95%
of the expected hydrogens and was the largest set that had
to be rejected from our analyses. About 100 entries that
were included in the analysis contained at least one amino
acid per entry that had a protonation state that is unlikely,
under common NMR conditions. Differences between the
IUPAC nomenclature and the consensus of the naming
schemes in use at the PDB have been described. Apart

from these differences in atom names 247,000 atoms
(1.7%) in the set were found to have geometrically incorrect names, with most of them having inverted prochiral
labeling (240,000). Average structures have been identified within the set, and some of them were found to yield
the predominant outliers in bond lengths, bond angles,
and impropers. Outliers of backbone and side-chain planar
groups are distributed over averaged and nonaveraged
structures alike.
More than 80% of the hydrogen bond length outliers
correspond to Cys S-H bond lengths that are 0.3 too
short in entries refined with X-PLOR. Another consistent
outlier is the Thr C-O1-H1 angle that is more than 20
smaller than the 109.5 assumed here; this occurs in many
entries solved with the DSPACE program. Other, less
frequent, outliers for hydrogen bond lengths and bond
angles have been described for a few entries. With the exception of a few, nonaveraged entries, all entries display a
reasonable variation in the improper dihedral angle values.
415
Fig. 8. Planarity deviations of different refinement programs. Entry 1MHU, which was solved
with X-PLOR, is located at 0.005, 0.72 and is
off-graph to allow a better separation of the other
entries. The classification is as in Figure 3. Entries
whose refinement method was not classified or
was less common and entries without a peptide
bond or amino acid planar group were omitted.
peptide bonds in which the amide hydrogen lies 0.25

out of the peptide plane.
In conclusion, we arrive at a series of recommendations
for biomolecular databases, authors of modeling programs,
and authors submitting biomolecular structures.
Fig. 9. Distribution of the outliers of backbone peptide bond and

side-chain planarity. The first bin, from 0 to 2.5%, contains 853 and 460
entries for the peptide and combined other planar groups, respectively,
but is off-graph as indicated by the arrow. The indicated six entries have
planar groups that are all outliers (rms deviation 0.25 ).
Many of the X-PLOR-solved entries have smaller than

expected variations in amide-hydrogen planarity and too
high variations on the other planar groups present in
amino acids. Relatively many entries solved by AMBER,
DISCOVER, and DGII have a considerable percentage of
1. We suggest that preferably the IUPAC nomenclature9,10

should be used, but if that is not feasible, the nomenclature, especially that of the prochiral groups, should be
defined and applied consistently.
2. Coordinates for all hydrogens expected to be present in
the molecule should be submitted to the data banks.
Hydrogens, for which the presence depends on the pH,
should be present or absent in agreement with the
observations or expected pK under the experimental
conditions and should not depend on what is computationally convenient.
3. During structure calculations it often happens that
forces associated with experimental NOE restraints
cause large distortions in the geometry for which there
is no explicit experimental evidence. Therefore, it is
important to find the proper balance between forces for
experimental and holonomic constraints.
4. Average structures should be optimized so that a reasonable local geometry is maintained. Alternatively, the
model from the ensemble that is closest to the average
could be taken to represent the structure.
ACKNOWLEDGMENTS
Discussions with Ton Rullmann have been helpful and
stimulating throughout the project. Furthermore, we thank
416
Fig. 10. a:The geometric distortion of the guanidinium group of Arg1 in

model 1 of entry 1GIB. The H (and N) is oriented almost perpendicular to
the plane defined by the five heavy atoms, including N represented by a
yellow disk, which results in an rms deviation of 0.8 for all hydrogens.
Carbons, nitrogens, oxygens and hydrogens are colored gray, blue, red,
and light-gray, respectively. b: Large deviations of planarity caused by a
violating NOE restraint in the protein HU (entry 1HUE)26 are shown. The
sequential NOE distance restraint between H21 of Asn2 and HN of Lys3 in
chain B of model 6 is shown by a yellow dashed line. The green bonds are
drawn for the hydrogens that were recalculated using an ideal geometry.
the members of the Validation Project for a fruitful interaction on many facets of the project.
REFERENCES
1. Doreleijers JF, Rullmann JAC, Kaptein R. Quality assessment of
NMR structures: a statistical survey. J Mol Biol 1998;281:149
164.
2. Hooft RWW, Vriend G, Sander C, Abola EE. Errors in protein
structures. Nature 1996;381:272.
3. Wilson KS, Dauter Z, Lamzin VS, et al. Who checks the checkers?
Four validation tools applied to eight atomic resolution structures. J Mol Biol 1998;276:417436.
4. Sussman JL, Lin, D, Jiang J, et al. Protein Data Bank (PDB):
database of three-dimensional structural information of biological
macromolecules. Acta Crystallogr 1998; D54:10781084.
5. Bernstein FC, Koetzle TF, Williams GJ, et al. The Protein Data
Bank: a computer-based archival file for macromolecular structures. J Mol Biol 1977;112:535542.
6. Rullmann JAC. AQUA computer program. ftp://ftp-nmr.chem.uu.nl/
pub/aqua, 1996.
7. Laskowski RA, Rullmann JAC, MacArthur MW, Kaptein R,

Thornton JM. AQUA and PROCHECK-NMR: programs for checking the quality of protein structures solved by NMR. J Biomol
NMR 1996;8:477486.
8. Vriend G. WHAT IF: a molecular modeling and drug design
program. J Mol Graph 1990;8:5256.
9. Markley JL, Bax A, Arata Y, et al. Recommendations for the
presentation of NMR structures of proteins and nucleic acids. J
Biomol NMR 1998;12:123.
10. Markley JL, Bax A, Arata Y, et al. Recommendations for the
presentation of NMR structures of proteins and nucleic acids.
Pure Appl Chem 1998;70:117142.
11. IUPAC-IUB Nomenclature and symbolism for amino acids and
peptides. Recommendations 1983. IUPAC-IUB Joint Commission
on Biochemical Nomenclature (JCBN). Biochem J 219:345373.
12. IUPAC-IUB Abbreviations and symbols for the description of the
conformation of polypeptide chains. Tentative rules (1969). IUPACIUB Commission on Biochemical Nomenclature. Biochemistry
1970;9:34713479.
13. IUPAC-IUB Abbreviations and symbols for the description of
conformations of polynucleotide chains. Recommendations 1982.
IUPAC-IUB Joint Commission on Biochemical Nomenclature
(JCBN). Eur J Biochem 1983;131:915.
14. Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein
structures. J Appl Crystallogr 1993;26:283291.
15. Engh RA, Huber R. Accurate bond and angle parameters for X-ray
protein structure refinement. Acta Crystallogr 1991;A47: 392
400.
16. Parkinson G, Voitechovsky J, Clowney L, Brunger AT, Berman H.
Bond lengths and angles, DNA/RNA. Acta Crystallogr 1996;D52:
5764.
17. MacArthur MW, Laskowski RA, Thornton JM. Knowledge-based
validation of protein structure coordinates derived by X-ray
crystallography and NMR spectroscopy. Curr Opin Struct Biol
1994;4:731737.
18. Hooft RWW, Sander C, Vriend G. Verification of protein structures: side-chain planarity. J Appl Crystallogr 1996;29:714716.
19. Weiner SJ, Collman PA, Case DA, et al. A new force field for
molecular mechanical simulation of nucleic acids and proteins. J
Am Chem Soc 1984;106:765784.
20. Brunger AT. X-PLOR, version 3.1: a system for X-ray crystallography and NMR. New Haven, CT: Yale University Press; 1992.
21. Hooft RWW, Sander C, Scharf M, Vriend G. The PDBFINDER
database: a summary of PDB, DSSP, and HSSP information with
added value. CABIOS 1996;12:525529.
22. Guntert P, Braun W, Wuthrich K. Efficient computation of threedimensional protein structures in solution from nuclear magnetic
resonance data using the program DIANA and the supporting
programs CALIBA, HABAS and GLOMSA. J Mol Biol 1991;217:
517530.
23. Havel TF. An evaluation of computational strategies for use in the
determination of protein structure from distance constraints
obtained by nuclear magnetic resonance. Prog Biophys Mol Biol
1991;56:4378.
24. Feigon J, Schultze P. Chirality errors in nucleic acid structures.
Nature 1997;387:668668.
25. Morris AL, MacArthur MW, Hutchinson EG, Thornton JM. Stereochemical quality of protein structure coordinates. Proteins 1992;
12:345364.
26. Vis H, Mariani M, Vorgias CE, Wilson KS, Kaptein R, Boelens R.
Solution structure of the HU protein from Bacillus stearothermophilus. J Mol Biol 1995; 254:692703.
27. Cahn RS, Ingold CK, Prelog V. Specification of molecular chirality.
Angew Chem Int Ed Engl 1966;5:385415.
28. Prelog V, Helmchen G. Basic principles of the CIP-system and
proposals for a revision. Angew Chem Int Ed Engl 1982;21:567
583.
29. Blackwood JE, Gladys CL, Loening KL, Petraca AE, Rush JE.
Unambiguous specification of stereoisomerism about a double
bond. J Am Chem Soc 1968; 90:509510.

28sici 291097 0134 2819991115 2937 3A3 3C404 3A 3aaid Prot8 3e3.0.co 3B2 2

Hochgeladen von

Dokumentinformationen

Originaltitel

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

28sici 291097 0134 2819991115 2937 3A3 3C404 3A 3aaid Prot8 3e3.0.co 3B2 2

Hochgeladen von

Copyright:

Verfügbare Formate

PROTEINS: Structure, Function, and Genetics 37:404416 (1999)

Validation of Nuclear Magnetic Resonance Structures

HYDROGEN GEOMETRY IN PROTEIN NMR STRUCTURES

from a study of small-molecule crystallographic data in the

of the attached group. In these cases the missing atoms

J.F. DORELEIJERS ET AL.

TABLE I. Comparison of IUPAC and Consensus

TABLE I. Comparison of IUPAC and Consensus

The Z score for a single observation, e.g., a hydrogen bond

where i and i are the idealized average and standard

and 0.01% of all observations, respectively. The rms Z score

and for a distribution with the expected average and

HYDROGEN GEOMETRY IN PROTEIN NMR STRUCTURES

TABLE I. Comparison of IUPAC and Consensus

TABLE I. Comparison of IUPAC and Consensus

J.F. DORELEIJERS ET AL.

TABLE II. Reasons for Excluding NMR Entries

pected values, so that it is difficult to recalculate their

HYDROGEN GEOMETRY IN PROTEIN NMR STRUCTURES

TABLE III. Protonation States of Individual Amino Acids

J.F. DORELEIJERS ET AL.

TABLE IV. Deviations From the PDB Consensus

consensus for prochiral labeling in the PDB files as was

types of stereochemical groups in proteins and nucleic acids

Figure 2 shows a correlation plot of the rms Z score of the

tions, solved with heavy atom bond lengths that do not

HYDROGEN GEOMETRY IN PROTEIN NMR STRUCTURES

Fig. 2. Correlation between the rms Z score of bond lengths for

Fig. 3. Correlation between the rms Z score of bond lengths for

many entries. There are 189 entries with outliers and 15

J.F. DORELEIJERS ET AL.

Fig. 4. Correlation plot of the percentage outliers of hydrogen bond

lengths are uncorrelated, as can be seen by the clustering

The entries with an rms Z score for the methylenes 1.0

The rms Z score from tetrahedral geometry is shown in

Generally, the deviation from planarity of hydrogen

Fig. 5. Correlation between the rms Z score of bond angles for

HYDROGEN GEOMETRY IN PROTEIN NMR STRUCTURES

Fig. 6. Rms deviation from tetrahedral geometry of methylene and

atoms that are directly bound to the aromatic group; C of

J.F. DORELEIJERS ET AL.

Fig. 7. Correlation between the rms deviation of

Asn2 flipped. After this flip, the nomenclature of the

schemes in use at the PDB have been described. Apart

HYDROGEN GEOMETRY IN PROTEIN NMR STRUCTURES

peptide bonds in which the amide hydrogen lies 0.25

Fig. 9. Distribution of the outliers of backbone peptide bond and

Many of the X-PLOR-solved entries have smaller than

1. We suggest that preferably the IUPAC nomenclature9,10

J.F. DORELEIJERS ET AL.

Fig. 10. a:The geometric distortion of the guanidinium group of Arg1 in

7. Laskowski RA, Rullmann JAC, MacArthur MW, Kaptein R,

Das könnte Ihnen auch gefallen