Beruflich Dokumente
Kultur Dokumente
Bioinformatics Exercises:
Bovine Lactate Dehydrogenase (LDH)
BACKGROUND:
Often primary structure (amino acid sequence) is the first piece of experimental information a
biochemist wants to have about a protein s/he is interested in studying since it can be used to make several
predictions about the properties and possible behavior of the protein such as:
Protein molecular weight by adding up the masses of the individual amino acid residues.
Isoelectric point. The isoelectric point is where the protein has no charge. Because of ionizable
functional groups on amino acids, protein charge changes as a function of pH depending on whether
or not these groups are protonated. By knowing the sequence, we know how many of each ionizable
group our protein contains. If we know the pH range where these groups become protonated or
deprotonated, we can estimate the charge of the whole protein as a function of pH. This will be
discussed in more detail below.
Molar extinction coefficient. Tryptophan, Tyrosine and Cysteine residues absorb ultraviolet light at 280
nm. By knowing how many of these amino acids are found in our proteins sequence, we can calculate
how much we expect a solution of our protein to absorb 280 nm light as a function of its concentration.
I say expect instead of determine because the amount of light absorbed by these amino acids is
dependent on their local environment within the protein especially on whether they are on the surface
and exposed to the solution or buried inside the protein.
Sequence similarity to other proteins which suggest homology to proteins of known function and/or
structure.
Other structural predictions based on sequence
o Disulfide bonds. If your protein is from cytoplasm it will likely not have disulfide bonds in its native
conformation because the intracellular environment is reducing. However, if it is an extracellular
protein, disulfide bonds play a critical role in protein stability.
o Secondary structure. Based largely on databases of experimentally-determined protein three
dimensional structures, some sequences and particular amino acid residues are more or less likely to
form particular types of protein secondary structure hydrogen-bonding networks.
o Stability with respect to proteolytic digestion. All cells contain proteases. When cells or tissues are
disrupted to isolate proteins, these previously compartmentalized protein-cutting enzymes are now in
the solution with your protein target. Proteases have different specificities in terms of protein sequence
and some sequences are particularly yummy (likely to be cleaved by these enzymes).
o Hydrophobicity. Once the sequence is known, you can look for the location of all of the hydrophobic
amino acid side chains. If you find a long (~23 residues) linear stretch of sequence containing only
hydrophobic amino acids, this may suggest a region of the protein that spans a lipid membrane.
o Potential post-translational modification sites for glycosylation, biotinylation, binding metal
cofactors, etc.
All of this is very useful information, but much of it is a prediction and may not be true of the
biologically-relevant folded protein (the native structure). Protein sequence cannot yet predict tertiary
structure or association with other subunits (quaternary structure). Even the secondary structure prediction
tools are often inaccurate. Sequence does not tell you about the overall shape of the protein or the
characteristics of its surface such as its charge distribution or whether or not it has hydrophobic patches.
These surface characteristics are important both for the biological function of the protein and for determining
how other molecules may interact with it and currently still need to be determined experimentally.
Once complete, the results page lists the UniProt entries that contain similar sequences; these
results are listed under the headings Overview and Alignments.
Scroll down the Overview listings until you see L-lactate dehydrogenase A chain (Homo
sapiens. Note down its accession number P00338
Find the same sequence under the alignment listing and select/click this entry. Then add it
to the basket.
Go back to your basket, select bovine L-lactate dehydrogenase B and repeat xii-xv.
The accession number for L-lactate dehydrogenase B chain (Homo sapiens) is P07195
Go back to the basket, check all the four entries and click on the Align button at the
bottom left of the window.
After a few moments, the alignment procedure is complete and a page displaying the
arrangement of sequences from all of the entries appears.
To ensure that all the necessary information is displayed, click/check on the Tree and Result
Info options on the top left side of the page, under the heading Display.
As appropriate, use the Annotation tools on the left side of the page (under the Highlight
heading) to selectively highlight amino acids with specific properties (e.g., metal binding,
aromatic, etc.).
Use the combined information on this page to answer the questions below.
Go back to the basket and click on the accession number P19858 to access information about this protein. Take
some time to familiarize yourself with the kinds of information this file contains.
ii. Scroll down to the Sequence section of the file and click the blue button labeled FASTA to download the protein
sequence in the FASTA format. In this format the first line starts with the character > followed by some
informational text, indicating that that line is for informational content only and will be ignored by other programs
running their own algorithms. This line is followed by the single letter amino acid sequence of the protein.
iii.
Copy/paste the sequence here:
iv.
To the right of the Sequence box, there is a scroll down menu (showing BLAST as default).
v.
Click on the arrow to activate the menu and select ProtParam and click GO
vi.
Click submit at the bottom of the page that opens.
Exercise 3:
In this exercise:
You will use a program called Jpred to predict the secondary structures in bovine LDHA. This exercise will
help you see how the primary structure of a protein can be used to PREDICT secondary structures within
a protein and will allow you to compare predictions to actual structural findings of experimental data.
i.
ii.
iii.
iv.
Go to http://www.compbio.dundee.ac.uk/jpred/
Paste the LDH sequence (Exercise 2, part: iii) into the appropriate field and click Make a Prediction
It will let you know if theres an experimental crystal structure for your protein that gives far more
accurate structural information than the prediction tool, but for demonstration purposes go ahead and
click on continue to generate the predicted secondary structures based on primary structure. It will
take a few minutes for the computer to do the computation.
th
Use the displayed results (use the 4 line jnetpred), fill out the table below that lists each stretch
of beta strand (E- green arrow) or alpha helix (H-red cylinder) for the first 52 residues of bovine
LDHA.
Residue Range
Sequence
Secondary Structure