Beruflich Dokumente
Kultur Dokumente
Presented by:
Sudhakar Tripathi
Research scholar Computer Engineering Department IT-BHU
Supervisor:
Prof. R.B.Mishra
Bioinformatics Definition
An interdisciplinary field involving biology, computer science, mathematics and statistics to analyze biological sequence data, genome content, arrangement and to predict the function & structure of macromolecules.
-David C. Mount
What is Bioinformatics?
The creation and development of advanced information and computational technologies for problems in biology, most commonly molecular biology (but increasingly in other areas of biology). As such, it deals with methods for storing, retrieving and analyzing biological data, such as nucleic acid (DNA/RNA) and protein sequences, structures, functions, pathways and genetic interactions.
Need for and Use of Bioinformatics Bioinformatics plays a key role in modern biology and is especially important in: _Molecular biology _Genomics _Functional genomics _Systems biology _Protein design and engineering _Pharmaceutical development _Medicine _Ecology / population genetics Need for and Use of Bioinformatics _Finding genes, locating coding regions, predicting function _Function, Evolution, Sequence, Structure (FESS relationships) _Metabolic genotype, phenotype, redundancy _Genes to Pathways; Genes to Biological Knowledge _Proteomics: Proteome of an Organism _Assigning Gene Sets to different Species: Homologs vs Paralogs _Expression profiles, relation to Metabolic Pathways / Genetic Networks Experimentally Analyse Thousands of Genes simultaneously _Gene Synteny between Species: Gene Adjacency in Genomes _Polymorphisms, Haplotypes, Propensity for Genetic Disease -Searching databases for nucleotide or amino acid sequences that match sequences in unknown samples Inferring a proteins shape and function from a given a sequence of amino acids, Finding all the genes and proteins in a given genome, Determining sites in the protein structure where drug molecules can beattached.
Aim of research in Bioinformatics Understand the functioning of living things - to improve the quality of life. drug design identification of genetic risk factors gene therapy genetic modification of food crops and animals, etc. application to e.g. biotechnology How will this benefit humanity ! Genetically modified crops ! - contamination escapes Genetically modified " & #- whisky? " Genes & behaviour - really? Testing on animals - why? $% Gene therapy &'benefits outweigh dangers? ( Bio weapons? # ) * +
Genetic material Information transfer (mRNA) Protein synthesis (tRNA/mRNA) Some catalytic activity Most cellular functions are performed or facilitated by proteins. Primary biocatalyst Cofactor transport/storage Mechanical motion/support Immune protection Control of growth/differentiation Genome Sequence Finding Genes in Genomic DNA introns exons promotors Characterizing Repeats in Genomic DNA Statistics Patterns Duplications in the Genome
The Complexity of Biological Data Nucleotide sequences Nucleotide structures Gene expressions Protein Structures Protein functions Protein-protein interaction (pathways) Cell Cell signaling Tissues Organs Physiology Organisms
Types of cell
Prokaryotes Single cell No nucleus No organelles Eukaryotes Single or multi cell Nucleus Organelles
Chromosomes
Exons/Introns splicing
Proteins
Proteins are biological molecules of primary importance to the functioning of livingOrganisms Perform many and varied functions Structural Proteins: the organism's basic building blocks, eg. collagen, nails, hair, etc Enzymes: biological engines which mediate multitude of biochemical reactions. Usually enzymes are very specific and catalyze only a single type of reaction, but they can play a role in more than one pathway. Transmembrane proteins: they are the cells housekeepers, eg. By regulating cell volume, extraction and concentration of small molecules from the extracellular environment and generation of ionic gradients essential for muscle and nerve cell function (sodium/potasium pump is an example)
Information flow in the cell - Central Dogma DNA (4 bases, {A,C,G,T}) transcribed into RNA (4 bases, {A,C,G,U}) translated into Protein (20 amino acid residues, {A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}) by triplets (codons) of RNAs UCA -> Serine (S) Start codon AUG -> Methionine (M) 3 stop codons (UGA, UAA,UAG) in most species As always in Biology, there are exceptions! Some species use different stop codons. The codon table (codon -> AA) is not the same for all species, the mitochondria has different codon table.
Protein - protein interaction Database development Modeling genetics History Ancient DNA cDNAs Population Genetics Simulations Finding SNPs Genome wide Association Studies Homology Search The Sequence DB Search problem Efficient searching in large data sets interfacing with data to support genomics research - software, databases, and HGT Analysis
Finding signal in the datasets - statistical and computational methods Need to get more efficient in how the data is processed, organized, and accessed how do we represent the large amount of data? Dynamically and interactively? Gene network reconstruction from time series data Gene function prediction Clustering of Gene Expression Data Characterization of Metabolic Pathways between Different Genomes Organizing biological knowledge in databases Signal transduction and other biochemical pathways Phylogenetics: Predicting the genetic or evolutionary relation of set of organisms.
Alternative splicing Gene disease relationships Microarray data collection, calibration and analysis Polymorphism Analysis and visualization Pathway Analysis:Sequence comparison,Searches in sequence databases Sequence Matching:Tracing Phylogeny,Finding family relationships between species by tracking, similarities between species. Molecular Networks Protein Threading Sequence Comparisons and Sequence-Based Database Searches Clinical Diagnosis Gene Expression Prediction Genetic Linkage Analysis Protein Function Prediction
Simulated annealing
Neural Fuzzy Systems Fuzzy Adaptive Resonance Theory Quantum Computing Data mining Theory of computation Quantum Evolutionary/Genetic Algorithm Artificial Intelligence Identification (Decision) Trees Genetic Algorithms Genetic Programming Cellular Automata Computer Science Algorithms . Evolutionary Computation Optimization Techniques Agent based computing
Gene Prediction
Overview of steps & strategies What sequence signals can be used? What other types of information can be used? Algorithms HMMs, discriminant functions, neural nets Gene prediction software 3 major types many,many programs!
What other types of information can be used? cDNAs & ESTs (experimental data,pairwise alignment)
homology (sequence comparison, BLAST)
3) Combined "evidence-based"
Genomic DNA
Transcription
Cap-Poly(A)
pre-mRNA mRNA
Protein exon
Splicing
Cap-Poly(A)
Translation intron
GT Donor site AG Acceptor site
Splice sites
Genomic DNA
Start codon
Stop codon
mRNA
Cap-
-Poly(A)
5-UTR
3-UTR
Transduction phage, donor/recipient share receptors, closely related bacteria, DNA: amount in phage head Conjugation-plasmids/transposons, cell to cell contact, distant relations, long DNA
PHYLOGENY
Similarity
Genes that share common sequences but are not necessarily related
Phylogenetics
What is Phylogenetics? Science of identifying and interpreting evolutionary relationships between biological entities (species, genes, etc) What is a phylogenetic tree? Dendrogram (tree) composed of nodes and branches representing the putative geneology of the taxonomic units
Phylogenetic Trees
A Graph Representing The Evolutionary History Of Sequences
Relationship of sequences to one another (How everything is connected) Dissect the order of appearance of insertions, deletions, and mutations
Identify Related Sequences, Predict Function, Observe Epidemiology (Analyze changes in viral strains)
Tree Characteristics
Tree Properties
Clade: all the descendants of a common ancestor represented by a node
Distance: number of changes that have taken place along a branch
Phylogram
.035 A .012 .009
Tree Types
Cladogram: shows the branching order of nodes Phylogram: shows branching order and distances
Methods
Distance-based Parsimony Maximum likelihood
Chou-Fasman method Based on the propensities of different amino acids to adopt different secondary structures Predictions are made using a rules-based approach to identify groups of amino acids with shared secondary structure propensities Garnier, Osguthorpe, Robson (GOR) method Statistical method of secondary structure prediction based on informationtheory & Bayesian probability Multiple Sequence Alignment (MSA) methods Performs secondary structure prediction on a multiple sequence alignment as opposed to a single protein sequence Neural network-based methods Example: Profile network from Heidelberg (PHD)
DNA Microarrays
THANKS!