Beruflich Dokumente
Kultur Dokumente
ABSTRACT
Bioinformatics is a growing field of study which supplies us with endless computing problems
to solve. Although the definition of the term itself is somewhat arguable, the generally accepted idea
is that bioinformatics is using computers to solve biological issues, or answer questions. One such
problem is to develop an algorithm for comparing lengths of proteins in order to search for protein
keys. A protein key is a protein which sends signals to other cells by means of a chemical reaction
where the binding occurs. Our team has chosen to develop an algorithm for analysis of amino acid
bond lengths of proteins because this analysis will assist in identifying protein keys.
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
Table of Content
1 Problem Description....................................................................................................3
1.1 Introduction...........................................................................................................3
1.2 Background...........................................................................................................4
2 Solution Design...........................................................................................................8
2.2 Classes.................................................................................................................8
2.3 Functions..............................................................................................................9
3 Results......................................................................................................................12
3.2 Output.................................................................................................................15
5 Acknowledgements...................................................................................................19
6 Appendices...............................................................................................................20
A References..............................................................................................................20
C Runtime Instructions...............................................................................................33
1
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
List of Figures
Figure 9. Protein1_clean
List of Tables
2
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
1 PROBLEM DESCRIPTION
computing problems to solve. Although the definition of the term itself is somewhat
arguable, the generally accepted idea is that bioinformatics is using computers to solve
biological issues, or answer questions. One such problem is to develop an algorithm for
comparing lengths of proteins in order to search for protein keys. A protein key is a
protein which sends signals to other cells by means of a chemical reaction where the
binding occurs.
For example, Dutch scientists found a protein produced by glia cells in the central
nervous system that transmit messages between brain cells which control the release of
chemicals that affect memory, attention, and addiction. Acetylcholine affects memory,
and dopamine affects addiction, to name a couple. Scientists anticipate using this
protein key to develop drugs which will influence certain neuronal functions as opposed
to certain others. There are many more protein keys that need to be identified,
however.
1.1 INTRODUCTION
The purpose of this Beowulf class is to demonstrate how cluster computing can be
used to solve large problems which could not otherwise be solved. Our team has
chosen to develop an algorithm for analysis of amino acid bond lengths of proteins
3
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
1.2 BACKGROUND
A protein is made up of amino acids, and therefore amino acids lie at the heart of
bioinformatics. There are approximately 20 amino acids found in the human body.
Each amino acid has unique properties and can be represented by a full name or a
All amino acids are composed of a few atoms of the same type which form its
basic structure, with a central carbon atom or C-alpha, at its center. This carbon atom
has a hydrogen atom, and amino group, and a carboxylic acid group, and a fourth group
known as the variable sidechain connected to it. Sidechains are what differentiate one
amino acid form another. Amino acids are connected by a peptide bond between the
carboxyl group of the first amino acid and the amino group of the second amino acid.
Figure 2 shows the general structure of an amino acid, and Figure 3 shows a peptide
bond.
5
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
6
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
This solution has the potential to drastically increase the ability of the human race to
overcome diseases and illnesses or other conditions. With each discovery of a protein
key, scientists are able to make progress toward curing and/or treating endless causes
of mankind’s suffering. Using the example mentioned in the introduction, that particular
protein key can be used to prevent Alzhiemer’s disease, schizophrenia, or help people
quit smoking or stop other drug additions similarly. This solution could save millions of
lives!
2 SOLUTION DESIGN
Our team has designed a solution at Old Dominion University’s Beowulf Laboratory
where our supercomputer is housed. This solution accepts two ways of inputting data.
Either by giving the program an actual protein file, or by the user first generating protein
files. If the later is chosen, the user is prompted to input the number of protein nodes
and the maximum number of possible connections between nodes. The user also
inputs the sigma value, that is, the maximum deviation between distance comparison.
This code will then generate random proteins based on this criteria and output the
7
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
The intent is that each function can be run as a stand alone function. The purpose of
this is so that the program designer can implement full user input and feed them into the
functions.
2.2 CLASSES
There are two classes used in this solution; the point class and the edge class.
The point class contains the node id and the ‘x’ and ‘y’ coordinate values for the node
object. The edge class thus contains two points and the distance between them.
2.3 FUNCTIONS
There are four major functions in our code; the generate, modify, load, and
algorithm functions. The generate function will be discussed in another section. Figure
3 shows each of these functions being called in Main.cpp and what parameters each
take.
8
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
The modify function, shown in Figure 4, requires a file input name(s) and a file
output name(d). It then reads in data from file s and ouputs it to file d in a format which
The load function requires a file input name(s) and a list type edge(d). It can be
seen in Figure 5 and its purpose is to load data in from file ‘s’ and put the data into the
10
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
The algorithm function, shown in Figure 6, is at the heart of our solution. This
function must have a file output name(s), a list type edge(d), a list type edge(f), and a
delta(x). It compares list lengths in list ‘d’ to list ‘f’ and if those lengths’ difference is
within the delta ‘x’ value then the program will output the matching edges from ‘d’ and ‘f’
3 RESULTS
11
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
Our team has successfully generated two separate protein files (protein1, and
protein2) which mimic actual protein files. We have then used these protein files to
compare protein lengths. Our current solution can perform this on any protein file
The generated protein files are created by the generate function. It requires a file
output name(s), amino acid length (x), number of bonds (y) and will generate a random
protein with the amino acid length x. Each amino acid is connected to anywhere from 1
to y nodes and the function outputs the data in a specified format to a file named s.
12
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
When the protein files are generated by the generate function, the files list the
node, its coordinates, the connecting node number, and the connecting node’s
coordinates. For example, the beginnings of our two generated protein files, Protein1
Figure 9. Protein1_clean.
13
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
3.2 OUTPUT
The output for this project is the comparison of lengths performed on our generated
protein test data. The formatting for the file output in the raw and clean output files
shows data for each node in pairs. In the raw file, the first node is the real node data
A(1) 5, 4
A(4) 10, 11
A(1) 5, 4
14
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
A(5) 13, 15
15
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
The matching file lists all edges, and is shown in Figure 13.
16
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
Our current solution finds all matching protein lengths in a given protein file, but it
does not yet actively search for protein keys. Therefore, further development would
implement an algorithm which would enable this module to identify protein keys. This
5 ACKNOWLEDGEMENTS
Our team would like to thank Professor Jay Morris for the concept of this project
and his invaluable assistance throughout the development of this module. We would
17
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
also like to thank the Computer Science Department for providing the supercomputer for
us work with during this course. Special thank you also to Tihomir Hristov for the
18
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
6 APPENDICES
A REFERENCES
Main.cpp
#include<iostream>
#include<fstream>
#include<time.h>
#include<cmath>
#include<list>
#include "point.h"
#include "Edge.h"
#include "function.h"
int main()
list<Edge> protein1;
list<Edge> protein2;
19
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
list<Edge>::iterator pitr;
int a,b,c,d;
double sigma;
char str;
srand(time(NULL));
cout<<"Enter First Protein length and max connections. <length connection>: ";
cin>>a>>b;
cout<<"\nEnter Second Protein length and max connections <length connection>: ";
cin>>c>>d;
cin>>sigma;
cout<<endl<<endl;
GenerateProteinFile("protein1_raw.txt",a,b);
ModifyProteinFile("protein1_raw.txt","protein1_clean.txt");
LoadProteinFile("protein1_clean.txt",protein1);
GenerateProteinFile("protein2_raw.txt",c,d);
ModifyProteinFile("protein2_raw.txt","protein2_clean.txt");
LoadProteinFile("protein2_clean.txt",protein2);
20
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
algorithm("protein_matchs.txt",protein1,protein2,sigma);
str = toupper(str);
if(str=='Y'){
DisplayProtein(protein1);
DisplayProtein(protein2);
return 0;
Function.h
struct node{
21
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
int label,nlabel;
int x,nx;
int y,ny;
};
//////////////////////////////
//functions
//////////////////////////////
int a=0;
int b=0;
int c=0;
int d=0;
int e=0;
int f=0;
node A[array_size];
//USER INPUT
fstream fout(str,ios::out);
array_size++;
22
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
a=a+1+((rand()+1)%10);
b=b+1+((rand()+1)%10);
A[z].label=z;
A[z].x=a;
A[z].y=b;
c=1+(rand()+1)%node_connection;
d=i;
for(int j=0;j<c;j++)
d=d+1+((rand()+1)%10);
if(d<(array_size-1)){
//to file
//head node
//linked node
if(i<(array_size-2)){fout<<endl;}
23
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
fout.close();
////////////////////////////////////////////////////////////////////////////
ifstream fin;
ofstream fout;
char b;
char * c;
int a;
//USER INPUT
fin.open(infileName);
//USER INPUT
fout.open(outfileName);
while (fin.good())
c = &b;
fin.get(*c);
24
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
a=*c;
if(fin.good())
fout<<"\n";
else
fout<<b;
fin.close();
fout.close();
/////////////////////////////////////////////////////////////////////
//variable declaration
25
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
ifstream fin;
ofstream fout;
char * c;
//USER INPUT
fin.open(infileName);
Edge *amino;
Point *aptr;
Point *bptr;
int i,index=0;
double x1,y1,x2,y2,distance;
//temp code
fin>>i;
while(fin.good()){
fin>>x1;
fin>>y1;
fin>>i;
fin>>x2;
fin>>y2;
26
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
//calculate distance
distance = sqrt(pow((x2-x1),2)+pow((y2-y1),2));
//
index++;
//amino->display();
protein.push_back(*amino);
fin>>i;
//////////////////////////////////////////////////////
list<Edge>::iterator pitr;
int i;
double x,y;
for(pitr=protein.begin(); pitr!=protein.end();pitr++)
pitr->display();
////////////////////////////////////////////////////////
27
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
double delta;
fstream fout(outfileName,ios::out);
list<Edge>::iterator protein1_itr;
list<Edge>::iterator protein2_itr;
for(protein1_itr=protein1.begin(); protein1_itr!=protein1.end();protein1_itr++) {
for(protein2_itr=protein2.begin(); protein2_itr!=protein2.end();protein2_itr++) {
delta=fabs(protein1_itr->getDistance()-protein2_itr->getDistance());
if(sigma>=delta){
fout<<"Protein 1 : ";
protein1_itr->fdisplay(fout);
fout<<"to"<<endl;
fout<<"Protein 2 : ";
protein2_itr->fdisplay(fout);
fout<<endl;
///////////////////////////////////////////////////////
28
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
Point.h
class Point
public:
//Operator=
private:
int name;
double x;
double y;
};
Edge.h
class Edge {
29
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
public:
Edge();
index = i;
distance = dist;
a.setX(x->getX());
a.setY(x->getY());
a.setName(x->getName());
b.setX(y->getX());
b.setY(y->getY());
b.setName(y->getName());
a.display();
b.display();
a.fdisplay(fout);
30
Algorithm for Analysis of Amino Acid Bond Lengths of Proteins
b.fdisplay(fout);
virtual ~Edge(){}
private:
int index;
double distance;
Point a;
Point b;
};
C RUNTIME INSTRUCTIONS
The user will be prompted for two values for protein1, and two more values for
protein2, as well as a sigma value representing deviation. After running, the user will be
asked if they would like the output printed to the screen. If “no” is selected, the output
31