Beruflich Dokumente
Kultur Dokumente
Proteins are the basic unit of life, a scientific research and drug development.
fundamental component of all living Molecular mechanics is one aspect of
cells from our own, to the bacteria that molecular modeling, the benefit of which is
infect us, to the plants and animals that that it reduces the complexity of the system,
we eat. The more we understand of the allowing many more particles (atoms) to
be considered during simulations. This is
structure and function of a protein, the
in contrast to quantum chemistry where
more we understand about how life
each electron is considered.
works, or in some cases how it can go
wrong. Protein molecules are also the The University of Bristol's Biochemistry
target of most drug therapies. Department is one of the top two
bio-chemical research organizations in
To make proteins ribosomes string together the UK. The research of the Protein
amino acids into long, linear chains. Like Folding Group uses molecular modeling
skipping ropes, these chains loop and fold techniques to provide vital input to rational
about each other in a variety of ways, but drug design.
also like skipping rope only one of these
many ways actually allows the protein to Research into peptide-
function properly. Sometimes this folding
goes wrong and, in the worst case, a
based protease inhibitors
misfolded protein within a cell can also One of the many projects at Bristol
prevent the cells around it from functioning. concerns the research of protease
inhibitors. Recent studies indicate that
The amazing thing about proteins is not between 1 and 5% of an organism’s
only that they fold, but they do so to a genomes code for proteases, an
unique three dimensional shape which observation which reinforces the key role
governs their function. that proteases play in many biological
processes and diseases including cell
signaling, pro-enzyme maturation, viral
Introducing molecular infection, blood clotting, hypertension and
modeling Alzheimer's disease to name but a few. A
number of protease inhibitors are already
Molecular modeling combines theoretical available as drugs on the market which
research methods with computational target pathogens and form the key
techniques to reproduce the behavior of component of anti-HIV and blood pressure
molecules at the atomistic level. Molecular medication.
modeling is a subset of Bioinformatics
and, as we pass into the “post-genomic Peptides are simply short lengths of natural
sequence era”, it is thought that this field polypeptide and are typical substrates and
will play an ever more important role in products of proteases.
www.clearspeed.com
Many scientists believe that knowing more about The overall algorithm is composed of the following
peptide/protease combinations is key to treating a wide elements:
variety of medical conditions. At Bristol University, 1. The user defines a discrete search space (a 6-D grid)
scientists are researching protease inhibitors using a around the initial ligand (peptide) pose.
specific type of peptide against human elastase, a
protease which causes extensive scarring of lung tissue 2. The fitness of a pose is evaluated by a novel atom-atom
in emphysema. based empirical free energy force field.
A combination of initial molecular modeling and inspection 3. The grid positions may be evaluated exhaustively or by
of the crystal structures of the peptide identified five amino using a genetic-algorithm-like Monte Carlo search
acid residue positions on the peptide that could be used to method (EMC N. Gibbs, A.R. Clarke & R.B. Sessions,
affect the interaction of the peptide with proteins. Since Proteins 43:186-202 (2001))
each of the five positions could be occupied by one of 4. Ligand flexibility is treated by docking different
twenty amino acids, the total number of possible conformations of the peptide.
compounds that could be synthesized is 205 = 3.2 x106.
However, it is clearly impractical to synthesize and test the 5. Many ligands - In this case study a virtual library of 576
inhibitory properties of each possible peptide sequence as different peptide sequences was generated and each
this is similar to the total number of compounds available docked as a separate BUDE job. Shell scripting is used
to the world's pharmaceutical companies for testing to address this problem of trivial parallelization. Each
against. Therefore the team at Bristol devised a new sequence generates between 80 and 30,420 conformations
approach using an empirical-free-energy based docking to be docked, depending on the number of rotamers
program. associated with each amino acid in the peptide. In total,
1,966,272 docking operations are performed. Since
each docking operation searches some 5% of the grid
A new approach – Bristol University (4,225 poses), the energy of over 8 billion poses must
be calculated to evaluate the whole virtual peptide
Docking Engine (BUDE) library. Each peptide ligand is represented by about 100
For technical reasons related to solubility and concentration, atoms and the protein elastase has 1636 atoms.
it transpires that a library containing about 100 different
peptide sequences in one pot is the maximum convenient
size for testing and identification of a single (or series) Issue – how to accelerate the pace
of inhibitors. Such a library is easily prepared by mixed of discovery?
synthesis.
Using the docking engine requires significant
An experienced molecular modeler can use molecular computational power. Dr. Richard Sessions of the
graphics methods to generate an initial docking position University of Bristol's Biochemistry Protein Folding Group
(initial shape and structure referred to as a pose) of a plays a lead role in enabling ever more sophisticated
generic cyclic peptide (e.g. alanine at each of the 5 variable modeling techniques to be used for research. Dr Sessions
positions). Since evolution has selected the 20 natural explained that;
amino acids to cover a wide range of chemical diversity,
the individual amino acids can be grouped into a variety of
“In order to predict the binding affinities more
types that include large, small, hydrophilic, hydrophobic, accurately we needed a more detailed model to
positively charged and negatively charged. Hence the measure interaction between molecules more
molecular modeler can also make predictions of what type carefully. Unfortunately we were significantly
of amino acid would be best matched to the particular hampered, not by methodology, but by having
environment surrounding the five variable positions in the enough compute power to carry out our
initial pose. The modeler strives to choose an average of 4
or fewer candidate amino acids for each position, yielding
research… we simply didn't have enough floating
a virtual library of peptide sequences of 45 = 1024 point operations available to us.”
members or less. Initial investigation showed that it would take weeks to run
The BUDE computer algorithm described here was the BUDE system and gain a result using the local
designed to bridge the gap between the whole of this department cluster. Using the Universities HPC was an
option, but an expensive one as the majority of the
virtual library and a refined 10% identified as the best
compute power would be consumed with consequences
choice for actual synthesis and testing
2
ACCELERATED SCIENCE APPLICATIONS
Methodology
The molecular modeller provides the receptor and ligand
start positions and defines a 6 dimensional search grid.
This grid is searched via a GA-like EMC (Evolutionary
Monte Carlo) procedure. The “currency” of the EMC is the
pose descriptor; it is the principal task of the docking
engine to translate that pose descriptor into a pose energy.
The BUDE source code is about 5,000 lines of FORTRAN.
Profiling the code shows that, as expected, more than 99%
of the execution time is spent in the energy calculation
routine which is about 500 lines of code. Accelerating the
algorithm required porting the energy calculation and
geometry routines to the ClearSpeed Advance accelerator.
Before the search begins, the initial coordinates of the protein
(elastase), and the ligand (cyclic peptide) are copied to the
Advance accelerator's on-board DRAM. When the search
requires the energies of a set of pose descriptors the program
translates these into a set of transformation matrices.
This set is copied to the Advance accelerator where the
transformations are applied and the energy calculated;
the results are then copied back to the host process. THE CLEARSPEED ACCELERATED TERASCALE SYSTEM (CATS)
3
CLEARSPEED TECHNOLOGY
Results
In the base case on the host, each BUDE run requires 100% CPU loading. However, when
ClearSpeed accelerators are employed, the bulk of the processing is moved to the
accelerator leaving only 2.5% CPU loading per job and therefore freeing the CPU for other
applications.
BUDE is able to scale on both multiple cores in the host Xeon, and on multiple cards in
the CATS node.
The initial measured figures for performance on a single card in the CATS node is
3.41 x speedup over one host core (3.0GHz Xeon Woodcrest). A CATS node performance
compared to 3GHz Xeon 4 host core is therefore (12 * 3.41)/4 = 10.2 x speedup over the
quad core 3GHz host node.
However, performance is only part of the outcome. The CATS system delivers performance
without compromising power consumption. The measured performance results of power
consumption are as follows
CATS = 550W HOST = 300W
Consequently the outcome is a 10.2 x speedup for 1.8x times the power or 5.6x greater
performance per watt.
www.clearspeed.com