FINALbioinformatics SEMINAR

BIOINFORMATICS & COMPUTING METHODS
Presented by:
Sudhakar Tripathi
Research scholar Computer Engineering Department IT-BHU
Supervisor:
Prof. R.B.Mishra
Bioinformatics Definition
An interdisciplinary field involving biology, computer science, mathematics and statistics to analyze biological sequence data, genome content, arrangement and to predict the function & structure of macromolecules.
-David C. Mount
What is Bioinformatics?
The creation and development of advanced information and computational technologies for problems in biology, most commonly molecular biology (but increasingly in other areas of biology). As such, it deals with methods for storing, retrieving and analyzing biological data, such as nucleic acid (DNA/RNA) and protein sequences, structures, functions, pathways and genetic interactions.
Need for and Use of Bioinformatics Bioinformatics plays a key role in modern biology and is especially important in: _Molecular biology _Genomics _Functional genomics _Systems biology _Protein design and engineering _Pharmaceutical development _Medicine _Ecology / population genetics Need for and Use of Bioinformatics _Finding genes, locating coding regions, predicting function _Function, Evolution, Sequence, Structure (FESS relationships) _Metabolic genotype, phenotype, redundancy _Genes to Pathways; Genes to Biological Knowledge _Proteomics: Proteome of an Organism _Assigning Gene Sets to different Species: Homologs vs Paralogs _Expression profiles, relation to Metabolic Pathways / Genetic Networks Experimentally Analyse Thousands of Genes simultaneously _Gene Synteny between Species: Gene Adjacency in Genomes _Polymorphisms, Haplotypes, Propensity for Genetic Disease -Searching databases for nucleotide or amino acid sequences that match sequences in unknown samples Inferring a proteins shape and function from a given a sequence of amino acids, Finding all the genes and proteins in a given genome, Determining sites in the protein structure where drug molecules can beattached.
Aim of research in Bioinformatics Understand the functioning of living things - to improve the quality of life. drug design identification of genetic risk factors gene therapy genetic modification of food crops and animals, etc. application to e.g. biotechnology How will this benefit humanity ! Genetically modified crops ! - contamination escapes Genetically modified " & #- whisky? " Genes & behaviour - really? Testing on animals - why? $% Gene therapy &'benefits outweigh dangers? ( Bio weapons? # ) * +
Genetic material Information transfer (mRNA) Protein synthesis (tRNA/mRNA) Some catalytic activity Most cellular functions are performed or facilitated by proteins. Primary biocatalyst Cofactor transport/storage Mechanical motion/support Immune protection Control of growth/differentiation Genome Sequence Finding Genes in Genomic DNA introns exons promotors Characterizing Repeats in Genomic DNA Statistics Patterns Duplications in the Genome
The Complexity of Biological Data Nucleotide sequences Nucleotide structures Gene expressions Protein Structures Protein functions Protein-protein interaction (pathways) Cell Cell signaling Tissues Organs Physiology Organisms
Basic cell architecture

Cells are smallest functional units of life
Types of cell
Prokaryotes Single cell No nucleus No organelles Eukaryotes Single or multi cell Nucleus Organelles
One piece of circular DNA (plasmid)

No mRNA post transcriptional modification
Chromosomes
Exons/Introns splicing
Proteins
Proteins are biological molecules of primary importance to the functioning of livingOrganisms Perform many and varied functions Structural Proteins: the organism's basic building blocks, eg. collagen, nails, hair, etc Enzymes: biological engines which mediate multitude of biochemical reactions. Usually enzymes are very specific and catalyze only a single type of reaction, but they can play a role in more than one pathway. Transmembrane proteins: they are the cells housekeepers, eg. By regulating cell volume, extraction and concentration of small molecules from the extracellular environment and generation of ionic gradients essential for muscle and nerve cell function (sodium/potasium pump is an example)
Understanding protein structure is key to understanding function and dysfunction

Amino Acid Sequences
AAs polymerised into Chains (Residues) Gene sequence determines Protein sequence Protein Structure Chains fold into specific compact structures Structure formation (folding) is spontaneous Sequence determines Structure Structure determines function
Information flow in the cell - Central Dogma DNA (4 bases, {A,C,G,T}) transcribed into RNA (4 bases, {A,C,G,U}) translated into Protein (20 amino acid residues, {A,C,D,E,F,G,H,I,K,L,M,N,P,Q,R,S,T,V,W,Y}) by triplets (codons) of RNAs UCA -> Serine (S) Start codon AUG -> Methionine (M) 3 stop codons (UGA, UAA,UAG) in most species As always in Biology, there are exceptions! Some species use different stop codons. The codon table (codon -> AA) is not the same for all species, the mitochondria has different codon table.
How DNA Codes for Protein
Various Problem areas in Bioinformatics

Sequencing Sequence analysis Sequence alignment The RNA Secondary Structure Prediction Identifying Gene Regulatory Networks Protein structure analysis Protein structure comparision Protein folding domain pattern recognition Sequence representation Genotype Analysis Splicing Site prediction
Protein - protein interaction Database development Modeling genetics History Ancient DNA cDNAs Population Genetics Simulations Finding SNPs Genome wide Association Studies Homology Search The Sequence DB Search problem Efficient searching in large data sets interfacing with data to support genomics research - software, databases, and HGT Analysis
Finding signal in the datasets - statistical and computational methods Need to get more efficient in how the data is processed, organized, and accessed how do we represent the large amount of data? Dynamically and interactively? Gene network reconstruction from time series data Gene function prediction Clustering of Gene Expression Data Characterization of Metabolic Pathways between Different Genomes Organizing biological knowledge in databases Signal transduction and other biochemical pathways Phylogenetics: Predicting the genetic or evolutionary relation of set of organisms.
Alternative splicing Gene disease relationships Microarray data collection, calibration and analysis Polymorphism Analysis and visualization Pathway Analysis:Sequence comparison,Searches in sequence databases Sequence Matching:Tracing Phylogeny,Finding family relationships between species by tracking, similarities between species. Molecular Networks Protein Threading Sequence Comparisons and Sequence-Based Database Searches Clinical Diagnosis Gene Expression Prediction Genetic Linkage Analysis Protein Function Prediction
Various Computational methods used in Bioinformatics

Mathematical Computing methods Statistical computing methods Intelligent Computing methods Neural Network Approaches Integrated Differential Fuzzy Clustering Fuzzy Computing Genetic and Evolutionary Computing Algorithms Probabilistic Computing and Belief Networks HYBRID INTELLIGENT SYSTEMS Swarm Intelligence Rough Set Theory Granular Computing Artificial Immune Systems Chaos Theory The Differential Evolution Algorithm Soft Computing Dynamic Programming & various Algorithmic Computations
Simulated annealing
Neural Fuzzy Systems Fuzzy Adaptive Resonance Theory Quantum Computing Data mining Theory of computation Quantum Evolutionary/Genetic Algorithm Artificial Intelligence Identification (Decision) Trees Genetic Algorithms Genetic Programming Cellular Automata Computer Science Algorithms . Evolutionary Computation Optimization Techniques Agent based computing
Gene Prediction
Overview of steps & strategies What sequence signals can be used? What other types of information can be used? Algorithms HMMs, discriminant functions, neural nets Gene prediction software 3 major types many,many programs!
Overview of gene prediction strategies

What sequence signals can be used?
Transcription: TF binding sites, promoter, initiation site, terminator Processing signals: splice donor/acceptors, polyA signal Translation: start (AUG = Met) & stop (UGA,UUA, UAG) ORFs, codon usage
What other types of information can be used? cDNAs & ESTs (experimental data,pairwise alignment)
homology (sequence comparison, BLAST)
Automated gene prediction strategies

1) Similarity-based or Comparative BLAST - Do other organisms have similar sequence?
(Is sequence similar to known gene or protein)
2) Ab initio = from the beginning

Predict without explicit comparison with cDNA or proteins via rulebased gene models - but rules are derived from statistical analysis of datasets Combine gene models with alignment to known ESTs & protein sequences
3) Combined "evidence-based"
BEST RESULTS? Combined
Examples of gene prediction software

1) Similarity-based or Comparative BLAST SGP2 (extension of GeneID) 2) Ab initio = from the beginning GeneID GENSCAN GeneMark.hmm 3) Combined "evidence-based GeneSeqer (Brendel et al., ISU)
BEST? GENSCAN, GeneMark.hmm, GeneSeqer
but depends on organism & specific task
Signals: Pre-mRNA Splicing

Start codon Stop codon
Genomic DNA
Transcription
Cap-Poly(A)
pre-mRNA mRNA
Protein exon
Splicing
Cap-Poly(A)
Translation intron
GT Donor site AG Acceptor site
Splice sites
Post Transcription Splicing

Start codon Stop codon
Genomic DNA
Start codon
Stop codon
mRNA
Cap-
-Poly(A)
5-UTR
3-UTR
Horizontal Gene Transfer

The movement of genetic material BETWEEN prokaryotes Common in prokaryotes. Useful for environmental adaptation (better than point mutations)
Horizontal Gene Transfer

Also called Lateral Gene Transfer HGT and LGT for short 3 ways to do it
Transformation- naked DNA, short pieces, common in bacteria that transform
Clay 28 hrs; ocean surface - 45-83 hrs; ocean sediment-235
Transduction phage, donor/recipient share receptors, closely related bacteria, DNA: amount in phage head Conjugation-plasmids/transposons, cell to cell contact, distant relations, long DNA
PHYLOGENY
Homology & Similarity

Homology
Conserved sequences arising from a common ancestor Orthologs: homologous genes that share a common ancestor in the absence of any gene duplication (Mouse and Human Hemoglobin) Paralogs: genes related through gene duplication (one gene is a copy of another - Fetal and Adult Hemoglobin)
Similarity
Genes that share common sequences but are not necessarily related
Phylogenetics
What is Phylogenetics? Science of identifying and interpreting evolutionary relationships between biological entities (species, genes, etc) What is a phylogenetic tree? Dendrogram (tree) composed of nodes and branches representing the putative geneology of the taxonomic units
Phylogenetic Trees
A Graph Representing The Evolutionary History Of Sequences
Relationship of sequences to one another (How everything is connected) Dissect the order of appearance of insertions, deletions, and mutations
Identify Related Sequences, Predict Function, Observe Epidemiology (Analyze changes in viral strains)
Tree Characteristics
Tree Properties
Clade: all the descendants of a common ancestor represented by a node
Distance: number of changes that have taken place along a branch
Phylogram
.035 A .012 .009
Tree Types
Cladogram: shows the branching order of nodes Phylogram: shows branching order and distances
.057 C .016 .044
Methods
Distance-based Parsimony Maximum likelihood
Levels of Protein Structure

Primary (1) structure: amino acid sequence of protein Secondary (2) structure: local structure (alpha helices or beta strands) Tertiary (3) structure: 3-dimensional structure of protein Quaternary (4) structure: structure of a multiple protein complex
Protein structures Prediction

protein structures can be determined experimentally (in most cases) by x-ray crystallography nuclear magnetic resonance (NMR) but this is very expensive and time-consuming can we predict structures by computational means instead? PDB Content Growth
Chou-Fasman method Based on the propensities of different amino acids to adopt different secondary structures Predictions are made using a rules-based approach to identify groups of amino acids with shared secondary structure propensities Garnier, Osguthorpe, Robson (GOR) method Statistical method of secondary structure prediction based on informationtheory & Bayesian probability Multiple Sequence Alignment (MSA) methods Performs secondary structure prediction on a multiple sequence alignment as opposed to a single protein sequence Neural network-based methods Example: Profile network from Heidelberg (PHD)
Methods for Secondary Structure Prediction
Methods for Tertiary Structure Prediction

Tertiary (3D) Structure Prediction Homology Modeling Fold Recognition Protein Threading Ab initio structure prediction Quaternary Structure
DRUG DISCOVERY & DESIGN

Rational Approach to Drug Discovery
Identify target Clone gene encoding target Express target in recombinant form
DNA Microarrays
THANKS!

FINALbioinformatics SEMINAR

Hochgeladen von

Dokumentinformationen

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

FINALbioinformatics SEMINAR

Hochgeladen von

Copyright:

Verfügbare Formate

BIOINFORMATICS & COMPUTING METHODS

Basic cell architecture

One piece of circular DNA (plasmid)

Understanding protein structure is key to understanding function and dysfunction

How DNA Codes for Protein

Various Problem areas in Bioinformatics

Various Computational methods used in Bioinformatics

Overview of gene prediction strategies

Automated gene prediction strategies

2) Ab initio = from the beginning

BEST RESULTS? Combined

Examples of gene prediction software

BEST? GENSCAN, GeneMark.hmm, GeneSeqer

but depends on organism & specific task

Signals: Pre-mRNA Splicing

Post Transcription Splicing

Horizontal Gene Transfer

Horizontal Gene Transfer

Homology & Similarity

.057 C .016 .044

Levels of Protein Structure

Protein structures Prediction

Methods for Secondary Structure Prediction

Methods for Tertiary Structure Prediction

DRUG DISCOVERY & DESIGN

Das könnte Ihnen auch gefallen