Sie sind auf Seite 1von 47

Bioinformatics: The biotyping tools

Ashutosh K. Shukla
Genetic Resources and Biotechnology Division Central Institute of Medicinal and Aromatic Plants
Monday, February 6, 2006

SDS-PAGE
It is the basic technique for analysis of proteins SDS dissociates proteins into their constituent polypeptide chains At saturation, approximately 1.4 g of SDS is bound per gram of polypeptide PAGE separates polypeptide chains according to their molecular weights The size of the polypeptide chains of a given protein can be determined by comparing their electrophoretic mobility with those of marker proteins with well characterised polypeptide chain molecular weights Protein samples must be prepared in a way that allows complete denaturation and reduction of any disulfide bonds Precautions must be taken to prevent proteolysis due to impurities in the sample or due to inherent activity of the protein itself SDS-PAGE is independent of protein sequence The size of pores in the SDS-PAGE gel decreases as the bisacrylamide:acrylamide ratio increases, reaching a minimum when the ratio is 1:20. Most gels are cast with a ratio of 1:29, which has been shown empirically to be capable of resolving polypeptides that differ by as little as 3% in size

The seiving properties of the gel are determined by the size of the pores, which is a function of the absolute concentrations of acrylamide and bisacrylamide used to cast the gel. The following table shows the linear range of separation obtained with gels cast with concentrations of acrylamide that range from 5% to 15%. Acrylamide concentration (%) 15 10 7.5 5.0


Linear range of separation (kd) 12-43 16-68 36-94 57-212

Some marker proteins used are as follows Approx. Mol. Wt. (kd) 66 45 36 29 24 20 14.2

Protein Bovine albumin Egg albumin Glyceraldehyde-3-P-dehydrogenase Bovine carbonic anhydrase Bovine pancreas trypsinogen Soyabean trypsin inhibitor Bovine milk -lactalbumin

Polypeptides separated by SDS-PAGE can be simultaneously fixed with methanol:glacial acetic acid and stained with Coomassie Brilliant Blue R250, a triphenylmethane textile dye also known as Acid Blue 83. Destaining of the gel is carried out in methanol/acetic acid After destaining the gel may be stored indefinitely in water in a sealed plastic bag without any dimunition in the intensity of the stain. However, to avoid swelling of stored gels, 20% glycerol is added in the storage aqueous solution. Silver staining methods give a higher sensitivity of detection Dried gels can be stored, covered in Saran wrap In 2D protein gel electrophoresis isoelectric focussing of proteins is combined with SDS-PAGE. In the first dimension proteins are separated according to their isoelectric pH (or pI) or charge and in the second dimension the isoelectrically focussed proteins are separated according to their mass, which is simply a SDSPAGE.

Agarose gel electrophoresis


Suitable for separating nucleic acid fragments ranging in size from a few hundred to about 20kb. The larger the pore size of the gel, the greater the ball of DNA , which can pass through and hence the larger the molecules that can be separated. Once the globular volume of the DNA molecule exceeds the pore size, the DNA molecule can only pass through by reptation. This leads to size-independent mobility and loss of separation. In addition to resolving DNA fragments of different lengths, agarose gel electrophoresis can be used to separate different molecular configurations of a DNA molecule. It can also be used for investigating protein-nucleic acid interactions in the socalled gel retardation or band shift assay.

The nucleic acid detection is done by using ethidium bromide, which is a fluorescent dye that intercalates between the bases of nucleic acid. For detection it can be incorporated into the gel or added to nucleic acid samples before loading on gel. It can also be present in the tank, where the gel is immersed after the run and before visualization on a transilluminator. The last option is recommended because the binding of EtBr alters the mass and rigidity of nucleic acid and hence its mobility. Because it binds to DNA, ethidium bromide is a very strong mutagen, and may possibly be a carcinogen or teratogen. Trace amounts of ethidium bromide in gels should not pose a hazard. Higher concentrations, e.g., when the color of the gel is dark pink or red, should not be placed in laboratory trash. The following is recommended: Less than 0.1% ethidium bromide: place in laboratory trash More than or equal to 0.1%: place in biohazard box for incineration.

If very large DNA molecules (1000-2000 kb) are to be separated by electrophoresis, it is necessary to use pulsed-field gel electrophoresis (PFGE), Here the separation seems to depend upon the electrical perturbation of the orientation of the DNA, and on the degree of extension of long DNA molecules. However, in orthogonal PFGE the DNA samples do not migrate in straight-line tracks. The disadvantages of PFGE have been removed by field-inversion gel electrophoresis (FIGE). This produces good resolution upto 2000 kb without the need for a complicated perpendicular-field gel apparatus. FIGE uses a conventional electrophoresis apparatus with an electrical field that pulses forward-reverse combined with a pause between each phase.

Blotting Techniques
Southern Hybridization (Southern 1975): Electrophoresed DNA is transferred to membrane and detected using nucleic acid probes Northern Hybridization (Alwine et al., 1979; Thomas, 1980): Electrophoresed RNA is transferred to membrane and detected using nucleic acid probes Western Blotting (Burnette, 1981): Electrophoresed proteins are transferred to membrane and detected using protein-ligand interactions (eg; antibodies, lectins, sandwich reactions) South-western Blotting (Vinson et al., 1987; Staudt et al., 1988): For screening and isolation of clones expressing fusion proteins where the foreign sequence encodes a DNA-binding protein (eg; transcription factors) that binds specifically to a particular DNA sequence. Probing is done using the duplex DNA oligonucleotide containing the sequence for which the DNA-binding protein is specific. Plaque Hybridization (Jones & Murray, 1975; Kramer et al., 1976; Benton & Davis, 1977): Recombinant phages are screened using nucleic acid probes. Colony Hybridization (Grunstein & Hogness, 1975): Screening procedure to detect DNA sequences in transformed colonies by hybridization in situ with radioactive nucleic acid probe.

Southern Blotting: Gel Transfer


Legend: Detection of specific DNA fragments by gel-transfer hybridization (Southern blotting). (A) The mixture of double-stranded DNA fragments generated by restriction nuclease treatment of DNA is separated according to length by electrophoresis. (B) A sheet of either nitrocellulose paper or nylon paper is laid over the gel, and the separated DNA fragments are transferred to the sheet by blotting. The gel is supported on a layer of sponge in a bath of alkali solution, and the buffer is sucked through the gel and the nitrocellulose paper by paper towels stacked on top of the nitrocellulose. As the buffer is sucked through, it denatures the DNA and transfers the single-stranded fragments from the gel to the surface of the nitrocellulose sheet, where they adhere firmly. This transfer is necessary to keep the DNA firmly in place while the hybridization procedure (D) is carrried out. (C) The nitrocellulose sheet is carefully peeled off the gel. (D) The sheet containing the bound single-stranded DNA fragments is placed in a sealed plastic bag together with buffer containing a radioactively labeled DNA probe specific for the required DNA sequence. The sheet is exposed for a prolonged period to the probe under conditions favoring hybridization. (E) The sheet is removed from the bag and washed thoroughly, so that only probe molecules that have hybridized to the DNA on the paper remain attached. After autoradiography, the DNA that has hybridized to the labeled probe will show up as bands on the autoradiograph. An adaptation of this technique to detect specific sequences in RNA is called Northern blotting. In this case mRNA molecules are electrophoresed through the gel and the probe is usually a single-stranded DNA molecule.

Hybrid Released Translation (HRT)


A method used to detect the proteins encoded by cloned DNA. The cloned DNA is bound to a nitrocellulose filter and a crude preparation of mRNA is hybridized to the filter-bound DNA. Only mRNA sequences homologous to the cloned DNA will be retained on the filter. These mRNA molecules can then be removed by high temperature or by using formamide. The purified mRNA is then placed in an in vitro translation system and the proteins encoded by the message can be analysed by electrophoresis through a polyacrylamide gel.

Hybrid Arrested Translation (HART)


A method used to identify the proteins encoded by a cloned DNA sequence. A crude cellular mRNA preparation, composed of many individual types of mRNA, is hybridized with cloned DNA. Only mRNA molecules homologous to the cloned DNA will anneal to it. The rest of the mRNA molecules are put into an in vitro translation system and the protein products are compared with the proteins obtained by use of the whole mRNA preparation.

Modifying enzymes
Restriction Enzymes or Restriction Endonuclease, is a protein produced by bacteria that cleaves DNA at specific sites along the molecule. In the bacterial cell, restriction enzymes cleave foreign DNA, thus eliminating infecting organisms. Restriction enzymes can be isolated from bacterial cells and used in the laboratory to manipulate fragments of DNA, such as those that contain genes; for this reason they are indispensible tools of recombinant DNA technology, or genetic engineering. DNA Ligase from E. coli and bacteriophage T4 seals single stranded nicks between adjacent nucleotides in a duplex DNA chain. Although the reaction catalyzed by both E. coli and T4 enzymes are similar, they differ in their cofactor requirements. The T4 enzyme requires ATP, whilst the E. coli enzyme requires NAD+. Secondly, the T4 DNA ligase is capable of blunt-ended ligation apart from the cohesive-ended ligation, whereas the E. coli DNA ligase does not catalyze blunt-ended ligation except under rare cases of macromolecular overcrowding. The optimum temperature for ligation of nicked DNA is 37C, but at this temperature the hydrogen-bonded joint between the sticky ends is unstable. The optimum temperature for ligating the cohesive termini is therefore a compromise between the rate of enzyme action and association of the termini, and has been found by experiments to be in the range 4C-15C.

Terminal deoxynucleotidyl transferase isolated from calf thymus, provides the means by which the homopolymeric extensions can be synthesised. When presented with a single deoxynucleotide triphosphate it will repeatedly add nucleotides to the 3-OH termini of a population of DNA molecules. T4 polynucleotide kinase is a polynucleotide 5'-hydroxyl kinase that catalyzes the transfer of the gama-phosphate from ATP to the 5'-OH group of single- and doublestranded DNAs and RNAs, oligonucleotides or nucleoside 3'-monophosphates (forward reaction). The reaction is reversible. The 5' -> 3' exonuclease activity of E. coli's DNA polymerase I makes it unsuitable for many applications. However, this pesky enzymatic activity can readily be removed from the holoenzyme. Exposure of DNA polymerase I to the protease subtilisin cleaves the molecule into a small fragment, which retains the 5' -> 3' exonuclease activity, and a large piece called Klenow fragment. The large or Klenow fragment of DNA polymerase I has DNA polymerase and 3' -> 5' exonuclease activities, and is widely used in molecular biology. In addition to generating Klenow fragment by proteolysis, it can be expressed in bacteria from a truncated form of the DNA polymerase I gene.

Linkers are short, self-complementary synthetic oligomers which form blunt end duplexes containing a restriction endonuclease recognition sequence. They are used to create cohesive ends on a target DNA molecule. Linkers may be obtained with either phosphorylated or unphosphorylated 5' ends for use in different cloning strategies. Blunt-end ligation of phosphorylated linkers to experimental DNA fragments produces concatamers of linkers at both ends of each fragment. Digestion with the appropriate restriction endonuclease creates unique cohesive ends. Note that after addition of phosphorylated linkers to a DNA fragment, a restriction endonuclease digestion is required to remove linker concatamers and create a cohesive end. Therefore this method can only be used if the DNA fragment of interest does not contain any internal restriction enzyme recognition sites identical to the site contained in the linker. Adaptors are used to interconvert blunt or cohesive ends of DNA molecules to other blunt or cohesive ends, and thus can be used to join non-complementary ends of two DNA molecules. Since short duplexes with one cohesive end and one blunt end or with two cohesive ends can be formed, adaptors complement the cohesive ends produced by one restriction enzyme and extend to complement another cohesive end produced by a different restriction enzyme). Each adaptor sequence requires a complementary sequence to interconvert cohesive ends.

Using PCR to quantify RNA


Semi-quantitative RT-PCR Make cDNA Set up PCR End reactions at different cycles Compare amplicon intensity on agarose gel

Results: From top to bottom


Panel 1: Cycle 14=highest copy number Panel 2: Cycle 20=moderate-high Panel 3: Cycle 24=moderate-low Panel 4: Cycle 28=lowest copy number

PCR-based fingerprinting

RAPD

Amplification of genomic DNA sequences with the help of arbitrary (and even specific) primers & thermostable DNA polymerase. Electrophoretic separation of amplified fragments Detection of polymorphic banding patterns by staining with EtBr

RAPD markers: dominant, quick, sensitive to reaction conditions, DNA quantity /quality, Temp. profile.

To Sum Up
Bioinformatics is the study of inherent structure in biological data and in biological systems in general.

Bioinformatics is an interdisciplinary research area that is the interface between


the biological and computational sciences. The ultimate goal of bioinformatics is to uncover the wealth of biological information hidden in the mass of data and obtain a clearer insight into the fundamental biology of organisms. This new knowledge could have profound impacts on fields as varied as human health, medical, behavioural, agriculture, environment, energy and biotechnology.

Targets for Bioinformatics


The development of new algorithms and statistics with which assess relationships among members of large data sets. The analysis and interpretation of various types of data including nucleotide and amino acid sequences, protein domains and protein structures. The development and implementation of tools that enable efficient access and management of different types of information.

Main Business Promoters of Bioinformatics


International: Netgenics, Compugen, Oxford Molecular Group, Infomax, Accelryl, Celera Genomics, IBM, Life Sciences Solutions, Incyte Genomics, Lion Biosciences, Double Twist etc. National: TCS, Infosys, Wipro, Kshema Technology, Jalaja Technologyis, Mascon, Satyam, Strand Genomics, Ocimum Biosolutions, Sysarris, Scinova India, Reliance, Biocon, Cytogenomics etc.

Sequence and Structure resources


Biological information is quite diverse in nature. It contains data on published literature, taxonomy, images (of plants, animals, two dimensional gels, DNA chips etc.), protein or nucleic acid sequences, structures, metabolic pathways etc. Diverse variety of biological data generates equally diverse number of databases or resources. The databases/resources have been catalogued at INFOBIOGEN at France (http://www.infobiogen.fr/) under various categories. The information in the databases can be browsed through graphical user interfaces that vary from one database to another.

Sequence Databases
Primary: Primary databases tend to be "archival" i.e. they take the data as produced from the wet lab experiments without any additional information e.g. Genbank (NCBI/USA) DNA EMBL (EMBO/Europe) DNA DDBJ (Japan) DNA PDB (RCSB/USA) 3D structure PIR (USA) Protein SWISS-PROT (Switzerland) Protein Secondary: These databases are more annotated and descriptive OMIM (Online Mendelian Inheritance in Man) - Gene and clinical data GDB (Genome Data Base) - human PROSITE, BLOCKS - protein motifs KEGG, EcoCyc - Metabolism

Uses of Sequence Databases


1. Information Retrieval and deposition 2. Analysis: "given a new DNA sequence, what's in it?" (i) Finding Homologues (ii) Finding Genes (iii) Finding Motifs - DNA Binding Sites etc. 3. The databases can also be put under different categories based on the kind of information contained in them such as nucleic acid database or protein database.

GenBank
Accession Number (AC) The accession number is allocated when the record is first entered into the database, and will never be changed. It consists of one letter followed by 5 digits (X12345), or (more recently) two letters followed by six digits (XY123456). This number is also reported in EMBL reports. For examples: In a GenBank Report - ACCESSION: U49897 and In a FASTA Report - gb|U49897

Protein database searching is one of the most important methods and is two to five times more sensitive than DNA database searching, for several reasons as given below:
(i) The number of DNA alphabets are smaller (4 letters), yielding less information for each position (There are 20 possible amino acids at each position in a protein). (ii) The genetic code is redundant there are several variations of most DNA codon triplets that code for an amino acid. Although a protein product may be identical to the query sequence, but an identical match with the DNA may not be obtained. (iii) Also, protein sequence similarity is more conserved through time than is DNA sequence similarity. The search for protein orthologues is becoming increasingly important in molecular biology. Now that the complete Saccharomyces cerevisiae (yeast; a unicellular eukaryote) and Caenorhabditis elegans (nematode; a multicellular eukaryote) genomes have been sequenced, work is well underway to identify orthologous groups. If a novel human protein can be matched with orthologous proteins from yeast or the worm, the investigator is likely to save a lot of time (and money!), having identified a likely function for the protein.

Important web site for genome databases and genome sequencing

The Institute for Genomic Research (TIGR): http://www.tigr.org Celera Sequencing Centre: www.celera.com Sanger sequencing Center: www.sanger.ac.uk

DNA and Protein Structure Databases


NDB: Nucleic Acid Database (NDB) PDB: Protein Databank (PDB )

There are two principle ways of submitting DNA sequences to the GenBank (i) BankIt : World Wide Web-based submission tool and recommended for simple submissions. With BankIt you can indicate coding regions on an mRNA along with a product and gene name. (ii) Sequin : For more control over annotating your entry, segmented records, or very long entries, Sequin, a stand-alone submission tool, is suggested. It is a standalone software tool that works on most computer platforms (e.g. Mac, PC/Windows, and UNIX) and is suitable for a wide range of sequence lengths and complexities.

WEBIN is a web tool for the submission of nucleotide sequences to the EMBL database.

Submission of ESTs, STSs and GSSs Batches of ESTs (expressed sequence tags), STSs (sequence tagged sites), and GSSs (genome survey sequences) can be submitted via special streamlined procedures. Submission of HTGS Records NCBI has developed a protocol for high throughput genome sequencing centers to use when they submit large genomic records (usually Cosmids or BACs). Protein Only Submissions Directly sequenced proteins without accompanying nucleotide sequences are submitted to SWISS-PROT at the EBI. Structure Submissions All X-ray and NMR structures can be deposited using ADIT (AutoDep Input Tool) at PDB.

Application of Multiple sequence alignments (eg. Clustal-W) For both nucleotide and protein sequences multiple sequence alignments are widely used for extensive sequence analysis. The sequence alignments are used to build phylogenetic trees to study evolutionary relatedness of various organisms and provide an estimate of the biological relatedness between them. Multiple sequence alignments also find applications in the detection of homology between newly sequenced genes and existing sequence families, the demonstration of homology in multigene families, the prediction of secondary structures (DNA or protein), comparative homology modeling etc.

Basic local alignment seearch tool (BLAST)


The Blast web server at http://www.ncbi.nlm.nih.gov/ is the most widely used one for sequence databases searches BLAST is basically a heuristic method to find the highest scoring locally optimal alignments between a query sequence and a database. The important simplification that BLAST makes is not to allow gaps, but the algorithm does allow multiple hits to the same sequence. The BLAST algorithm and family of programs rely on work on the statistics of ungapped sequence alignments by Karlin and Altschul.

Questions in your mind

THANK YOU

Das könnte Ihnen auch gefallen