Sie sind auf Seite 1von 94

Glimpses of Human

Genome project
by
N. AdithiSridhar,
Bioinformatics.
Before the HGP
The Human Genome

• Genome: the total genetic information in an organism


• 23 chromosomes (diploid content = 46)
• ~30,000 genes
• ~ 3 billion bases
• just the genome sequence alone would require 3
gigabytes of computer storage space
Single nucleotide polymorphisms (SNPs)
History of the Human Genome
Project (HGP)
The US Department of Energy (DOE)
and the Human Genome
➲ 1983 - National Laboratories of the DOE begin
producing libraries of human chromosomes

➲ 1988 - DOE and US National Institutes of Health (NIH)


sign a memorandum of understanding outlining their
cooperative effort in genome research
History of the HGP

➲ 1988 - HUGO (Human Genome


Organization) founded by genome
scientists
➲ 1989 - DOE and NIH establish a working
group to study the Ethical, Legal and
Social Implications (ELSI) of the HGP
➲ 1990 - DOE and NIH present a 5-year
HGP plan to the US Congress. This marks
the beginning of the 15-year project
• The public consortium management of the Human
Genome Project was entrusted to James Watson, who
was in charge up until 1993, being replaced since then
by Francis Collins

• 20 centres in United Kingdom, France, Germany, Japan


and China and Canada

• In 1998, the private company Celera Genomics,


dependent on the PE Corporation, whose president,
molecular biologist Craig Venter, had previously worked
in the public consortium of the Human Genome Project,
publicly announced that it was going to try the whole
sequencing of the Human Genome
• The most important of them was The Institute for
Genomic Research (TIGR), founded by Craig Venter in
1992, after his resignation as a researcher for the NIH

• 2000 - Craig Venter of Celera and Francis Collins of NIH


(representing the HGP) jointly announce the completion
of a “working draft” of the human genome

• First draft published in Science and Nature in February,


2001
Human Genome Project
Project goals are to

• Identify all the approximate 30,000 genes in human DNA,

• Determine the sequences of the 3 billion chemical base pairs that


make up human DNA,

• Store this information in databases,

• Improve tools for data analysis,

• Transfer related technologies to the private sector, and

• Address the ethical, legal, and social issues (ELSI) that may arise
from the project
During the HGP
Co mp aring th e Hu ma n Genome
wit h other G enome s
➲ Gene numbers of different species
➲ Humans: 31,000
➲ Thale cress: 26,000
➲ Nematode worm: 18,000
➲ Fruit fly: 13,000
➲ Yeast: 6,000
➲ Tuberculosis microbe: 4,000
Co mp aring th e Hu ma n Genome
wi th th at of Mus musculus (mouse)
• The human genome has about 400 million more nucleotides than
the mouse.

• Humans and mice genetically diverged about 75 million years ago

• The human and the mouse genomes both have approximately


30,000 genes . (99% identical)

• There are only three hundred genes unique to either organism


Co mp aring th e Hu ma n Genome
wit h that of Pan troglodytes
(Chimpanzees)
➲ Humans and chimps diverged from a
common ancestor only about 5 million years
ago.
➲ Preliminary sequence comparisons indicate
that chimp DNA is 98.7% identical with
human DNA. If just the gene sequences
encoding proteins are considered, the
similarity increases to 99.2%.
How cou ld t wo species diff er s o much
in bod y a nd be havior , an d y et ha ve
almo st equ iv elen t s ets of ge ne s?
• Observations reveal that chimp and human genomes
show very different patterns of gene transcription
activity, at least in brain cells.

• Humans have one less chromosome than chimpanzees,


gorillas, and orangutans.

• At some point in time, two mid-sized ape chromosomes


fused to make what is now human chromosome 2, the
second largest chromosome in our genome.
How the human genome was
sequenced
The genome is digested and the DNA cut into segments with a range of
different restriction enzymes

Since each restriction enzyme cuts the DNA at slightly different points, the
genome is broken up in such a way that there is a degree of overlap
between adjacent DNA segments – a fundamental requirement for
determining the complete sequence.

The subsequent incorporation of the DNA fragments into living cells, such as
bacteria or yeast, stores the DNA fragments and enables more copies to be
made as the cells reproduce
Organizing mapped large clone contigs
Contig: Joined overlapping collection of
clones or sequences

Shotgun clone

Sequencing

Assembly
Gene mapping
GENETIC LINKAGE
• Law of independent assortment:
Genes are transmitted from parents to offspring independently of
one another.

• Genes that are located on the same chromosome and are described
as link ed genes .

• Linked genes are not always transmitted as block because of the


phenomenon of re comb inatio n.

• cross in g over causes reshuffling of genes


• An example of genetic linkage
The three loci are closely linked i.e. they are situated very close to
one another on the same chromosome. At each locus there are
three possible genotypes:

C locus: C C C c c c
D locus: D D D d d d
E locus: E E E e e e

•The particular combination present in a given individual is called a


haplotyp e
•Let us assume that the haplotype of a particular parent is Cc, dD, eE
Parent Offspring

C c C c

d D d or D

e E
e E

Inheritance of linked genes : Unaltered haplotypes


Parent Offspring

C c C c

d D D or d

E e
e E

Inheritance of linked genes : Recombinant haplotypes


Genetic Distance

• The frequency of recombination is directly proportional to the


distance between two genes.

One centimorgan
• The distance between two genes in which recombination occurs
with a frequency of 1%.

• Measures of genetic distance using recombination frequencies are


accurate only if the genes are closely linked i.e. if the gene distance
is small.
CONSTRUCTING A GENE
MAP
• Studies of genetic linkage and recombination frequencies have been
used to create gene maps

• Drosophila is very suitable for such study because it has very


prominent, easily studied characteristics, has a short life cycle and
produces hundreds of offspring

•In an experiment investigating two characteristics A and B it was


found that their recombinant frequency was 1.0%. In further
experiments it was found that the recombinant frequency for
characteristics A and C was 0.6% and the recombinant frequency
between B and C was 0.4%.
• A genetic map of the three genes responsible for these
characteristics may be constructed as follows. The values are in
centimorgans.

A C B
0.6 0.4
1.0

• Further recombinant studies can be performed to estimate the


gene distances between gene C and other genes D and E, and then
between B and other genes, E,F, G and so on. In this way a larger
and more detailed map is gradually constructed. This approach has
been used to construct an extensive and detailed gene map of
Drosophila
Techniques:
Restriction Fragment Length Polymorphisms
Molecular hybridization

1 2 3

Molecular hybridization
1. Single-stranded DNA is generated

2. A probe is a known sequence of part of gene to be identified tagged with


a radioactive label. Specific probes are synthesised in the laboratory..

3. The probe hybridizes only to the fragment with the corresponding


sequence. This is detected by the label , which gives a fluorescent signal.
RESTRICTION FRAGMENT LENGTH POLYMORPHISMS

•Disease-causing gene could be mapped by linkage and


recombination studies with other known genes. However,
informative families for such studies are rare.

•Numerous markers have been identified throughout the


genome using restriction endonucleases and so it is possible to
construct maps of disease genes in relation to closely linked
markers.

•A particular restriction endonuclease recognises a specific


nucleotide sequences in DNA and cleave it.
A B C D E F
16 4 1 2 8
B D E F
A
5 2 8
16
An example of Restriction Fragment Length Polymorphism generated by
Hind III. A,B,C,D,E and F indicate the sites where DNA is cleaved. The
second individual lacks the restriction site at C and gives a different
pattern of fragments from individual 1.

•If the sequence were missing at site C, there would be 4 fragments


of lengths 16, 5, 2 and 8 units. This variation is referred to as a
res tri ction frag ment length poly morp hism (R FL P).
• Using a large number of restriction endonucleases, it is
likely that one finds one or more RFLPs close to the gene
of interest.

• Such RFLPs are then used as markers for linkage studies


with known genes

• Linkage studies have been one of the most important


tools for gene mapping

• If more than one marker is used the accuracy of the


procedure is further increased.
➲ Finding genes by UCSC Genome
Browser
From Early maps to
. . . to a multi-resolution view .
..
. . . at the gene cluster level . .
.
. . . the single gene level . . .
Location and display of the human gene implicated in Fragile X syndrome.
Sanger’s method
• Dideoxyribose - ribose in which the hydroxyl group is missing from both
the 2’ and the 3’ carbons
• whenever a dideoxynucleotide was incorporated into a polynucleotide,
the chain would irreversibly stop, or terminate
• four separate reactions, each incorporating a different dideoxynucleotide
along with the four deoxynucleotides, would produce a population of
fragments all ending in the same dideoxynucleotide
• primer, Polymerase are needed

ddA ddT ddG ddC


ddTACG
5’ ATGCTGCATGCATGT 3’
ddTACGACGTACG

5’
3’
Sanger’s method
Collins vs. Venter

Collins Venter
IHGSC and Celera
Hi dd en Ma rk ov mo de l
• Hidden Markov Model (HMM) system for segmenting
uncharacterized human genomic DNA into exons, introns, and
intergenic regions.
• Three separate models were designed for each of the three types
of human DNA (exons, introns, and intergenic),
• using biological knowledge about splice junction these models are
tied together
Expr es sed se qu en ce tags
• ESTs are DNA sequences read from both ends of expressed gene
fragments
• The Merck-WashU EST Project and several other public EST projects
are being performed to rapidly discover the complement of human
genes, and make them easily accessible.
• These ESTs are widely used to discover novel members of gene
families
Genome Assembly and
Annotation Process
• The primary data produced by genome sequencing projects are
often highly fragmented and sparsely annotated

• NCBI assimilates data of various types, from numerous sources, to


provide an integrated view of a genome, making it easier for
researchers

• NCBI constantly strives to improve the accuracy of its human


genome assembly and annotation

• Feedback from outside groups and individual users, is used to


improve the process
• Data Fre ez e

The data are “frozen” at the start of the build process by making
a copy of all of the data available for use at that time

Freezing the data provides a stable set of inputs for the remainder
of the build process
• Th e Bui ld Cy cle

A build begins with a freeze of the input data and ends with the
public release of an annotated assembly of genomic sequences

Few months between builds so that the latest build can be


evaluated and improvements can be made
Processing of
Biological Sequence
Data
• The sequence database GenBank is made up of nucleotide sequences
submitted by individual scientists and sequencing centers from around
the world.

• These sequences have been submitted directly to GenBank or are


replicated from one of the collaborating databases

• information management system that consists of two major


components, the ID database and the IQ database

• ID handles incoming sequences and feeds other databases with subsets


to suit different needs

• IQ holds links between sequences from ID and links from these


sequences to other resources.
ASN.1
Abstract Syntax Notation 1 (ASN.1) Is the Data Format Used by the
ID System. ASN.1 is the data description language in which all sequence
data at NCBI are structured
Sources of Seque nce Da ta
GenBank sequences
Reference sequences
Sequences from other databases, such as SWISS-PROT, PIR, PRF, and
PDB
Submiss io n
• large-volume submitters, such as HTGS, use FTP, often after using tools
such as fa2htgs to convert their data to ASN.1 .
• Small-volume submitters typically use either BankIt or Sequin to prepare
the ASN.1 for submission.
• Out put of Da ta from the ID Syst em
After Data Conversion data are then replicated to several different
servers and also transformed into several different formats
• Replication is necessary
It separates the “incoming” data system from the “outgoing” data.
Replicating the data to different servers helps balance the load of
queries.
it protects against data loss.
• ID Dat ab as e
Holds both ASN.1 objects and sequence identifier-related information.
Accession numbers assigned to the sequences.
When the understanding of that sequence changes, the sequence can
have a new version. Gi number is assigned to the sequences which have
version.
DB name Major content Initial Tech. Current Tech. Primary data
types
Genbank DNA/RNA Text files Flat-file/ASN.1 Text, numeric,
sequence, some complex
protein types
OMIM Disease Index Flat-file/ASN.1 Text
phenotypes and cards/text
genotypes, etc. files
GDB Genetic map Flat file Relational Text, numeric
linkage data
ACEDB Genetic map Text, numeric
linkage data,
sequence data
(non-human)
HGMDB Sequence and Flat file- Flat file- Text
sequence application application
variants specific specific
EcoCyc Biochemical Complex types,
reactions and text, numeric
pathways
Types of Data
(Databases)
➲ Taxonomy databases
➲ Genomic databases
-Genomic databases (non vertebrate)
-Human and other vertebrate genomes
• Sequence databases
-Nucleotide sequence databases
-Protein sequence databases
-RNA sequence databases
• Structure databases
• Proteomic databases
➲ Microarray databases
➲ Chemical databases
➲ Expression databases
➲ Enzyme databases
➲ Pathway databases (Metabolic and Signalling
pathways)
➲ Disease databases (Human genes and databases)
➲ Literature databases
➲ Other molecular biology databases
Genes and
Disease
Trisomy of
chromosome 21
Down syndrome
Burkitt lymphoma

•Rare form of cancer affecting young


children in africa

•Associated with Epstein-Barr virus

•Translocation cause cancer

•Translocation of Myc gene takes place


• Changes the pattern of Myc’s Expression disrupt Controlling in
cellgrowth and proliferation

• We are still not sure What cause Chromosomal translocation

• Model organism gives a clue to understanding of how translocation


occurs
Lesch-Nyhan syndrome
•LNS is a rare inherited disease that
disrupts the metabolism of the raw
material of genes
(purines)

•The body can either make purines (de


novo synthesis) or recycle them (the
resalvage pathway)

•When one of the enzymes is missing, a


wide range of problems can occur.

•Mutation in the HPRT1 gene affects the


production of the enzyme hypoxanthine-
guanine phosphoribosyltransferase
•Very low level of the enzyme cannot speeds up recycling of purines
from broken down of DNA and RNA

•The mutation is inherited in an X-linked fashion

•Three main problems

Accumulation of uric acid


Self-mutilation
Mental retardation and severe muscle weakness.

•In 2000 in vitro techniques were identified to treat the LNS

•A virus was used to insert a normal copy of the HPRT1


gene into deficient human cells.

• Medications are used to decrease the levels of uric acid.


Obesity

•obesity has more than one cause:


genetic, environmental, psychological and
other factors

•Subsequently the human Ob gene was


mapped to chromosome 7.

•The hormone leptin, produced by


adipocytes (fat cells), was discovered in
2003

•Leptin is thought to act as a lipostat


• As the amount of fat stored in adipocytes rises, leptin is released
into the blood and signals to the brain that the body has enough to
eat.

• Overweight people have high levels of leptin in their bloodstream,


indicating that other molecules also effect feelings of satiety and
contribute to the regulation of body weight.
Refsum disease

•Rare disorder of lipid metabolism


•Cause peripheral neuropathy, failure of
muscle coordination, vision disorder

•In 1997 the gene for Refsum disease


was identified and mapped to
chromosome 10.

•The protein product of the gene, PAHX,


is an enzyme that is required for the
metabolism of phytanic acid
•Refsum disease is characterized by an accumulation of phytanic acid in
the plasma and tissues. is a derivative of phytol, a component of
chlorophyll.

•Our bodies can not synthesize phytanic acid. we have to obtain all of it
from our food.

•Prolonged treatment with a diet deficient in phytanic


acid can be beneficial.
• Pan cr eat ic can cer • Ad ren ole uko dyst roph
• Ph en ylk et onur ia y
• Pr ad er -Wi ll i syn drome • Di ab et es , type 1
• Po rphy ria • Gau ch er di se ase
• Ta ngi er dis ease • Glu co se gal acto se
• mal ab so rpt ion
Ta y- Sach s dis ease
• Her edi tar y
• Wi lso n's dis ea se
he mo ch ro mat osi s
• Ze ll weg er sy ndr ome
• Men kes sy ndr ome
• The Human Genome Project has already fueled the
discovery of more than 1,800 disease genes.

• There are now more than 1,000 genetic tests for human
conditions.

• These tests enable patients to learn their genetic risks for


disease and also help healthcare professionals diagnose
disease

• At least 350 biotechnology-based products resulting from


the Human Genome Project are currently in clinical trials.
• Biodiversity
Provides genetic measure of biodiversity

• The National Geographic magazine started its “Genographic” project

An ambitious attempt to use data from the human genome to trace


the pathways of human migration.
Related data have shown that early humans migrated out of Africa
along the coastline & finally into Australia around 40-60,000 years
ago.

• Comparative genomics
Genomics
Proteomics
Gene Therapy
Risk assessment
Agriculture, Livestock breeding and Bioprocessing
DNA forensics
•identify potential suspects at crime
scenes

•identify crime and catastrophe victims

•establish paternity and other family


relations

•match organ donors with recipients in


transplant programs
Evolution and Human Migration

•Comparison of sequences of genetically,


racially and culturally diverse peoples

•Comparison of sequences of peoples


geographically apart but apparently related

•Study of evolution of humanoid species


and modern humans
SOME ETHICAL ISSUES

• This means that the person undergoing the test should only do so
on a voluntary basis and with a full understanding of all the
implications.

• Limitati ons of the test need to be discussed prior to testing.


The tests cannot always identify the mutation, even if it is present

• Detection of the change in the gene is not nece ssa ri ly


pred icti ve of f uture sym ptom s

• Determine the sex of a baby by checking the chromosomes. There


are sometimes requests for the use of the technology to ensure that
a couple have a baby of a certain sex, for reasons not necessarily
related to the health or well-being of the child.
• Analysis of an individual’s genetic make-up could also be used in the
future by employers or insurers wishing to know the likelihood of a
potential employee or insurance applicant developing a condition for
which they carry a predisposition; for example, alcoholism, heart
disease or cancer

• Such knowledge could lead to discrimination .


• With the new advances in genetics, as in any powerful new
scientific tool, there is a potential for abuse. The boundaries need to
be considered .
➲ Patent problem
• The policy of the U.S. Patent and Trademark Office
(PTO) “life forms, as products of nature, were
unpatentable. Only products and processes invented by
humans could be patented “
• what about genetically modified life forms: are they
invented or discovered, the product of nature or
humans?
• In 1980, the U.S. Supreme Court issued its 5–4 decision
in Diamond v Chakrabarty that a bacterial strain that had
been genetically modified to clean up oil spills could be
patented since it was “man-made” and not naturally
occurring
• Since this PTO has awarded thousands of patents on
biological products, including patents on genes, SNPs,
ESTs, cell lines, mice, plants, rhesus monkeys, and
human stem cells
Genomes to Life:
A DOE Systems
Biology Program
Exploring Microbial Genomes for Energy and the
Environment

Goals
• identify the protein machines that carry out critical life functions
• characterize the gene regulatory networks that control these machines
• characterize the functional repertoire of complex microbial communities in their
natural environments
• develop the computational capabilities to integrate and understand these data
and begin to model complex biological systems
GTL Applications in
Energy Security and Global Climate
Change
The International HapMap Project
• Although the DNA sequence of any two people is 99.9% identical,
the variations crucially affect an individual’s disease risk
• The points where the sequence differs at a single DNA base are
called single nucleotide polymorphisms (SNPs).
• Sets of SNPs on the same chromosome are inherited in blocks called
haplotypes.
➲ purpose is to enable the study of genetic associations with disease

ATGCATGCAT ATGCAAGCAT
• The project was launched in 2002 with $100 million

• Nine research groups and more than 200 researchers in six


countries; Canada, China, Japan, Nigeria, the UK and the US.

• Samples from people in Nigeria, Japan, and China and from those
with northern and western European ancestry living in the US.

• They mapped the entire genome of 269 people to identify tiny


differences in key areas of DNA.

• The HapMap is publicly accessible


EN CO DE
• The National Human Genome Research Institute (NHGRI) launched
ENCODE, the Enc yclopedia Of DNA Elements, in September 2003, to
carry out a project to identify all functional elements in the human
genome sequence.

• A pilot phase and a technology development phase.

• The pilot phase tested and compared existing methods to analyze a


defined portion of the human genome sequence

• conclusions from this pilot project were published in June 2007 in Nature

➲ All data generated by ENCODE participants released into public


databases
Role of China
➲ The Chinese Human Genome Project started in 1993
major aspects:
➲ The genome resource conservation and genetic
polymorphism studies of multiple Chinese nationalities
➲ The development of an advanced technological system
for genome research
➲ The cloning of some desease-related genes and a large
number of expressed sequence tags(ESTs).
➲ Initiation of the functional genomics studies
➲ Ethical, legal and social issues related to human
genome sciences.
Role of

&
Russia
• In 1988, the USSR council of Ministers adopted a resolution
on the creation of a Human Genome Project

• The project is under the scientific control of the council on the


human genome

• The Engelhardt institute of Molecular Biology is Concerned with


the organism

• Goals: Sequencing. Mapping, investigate model organism,


Functional studies
• Council funds 57 million rubles

• No international agreement on sequencing

• Council funds two online databases

• GE (Gene Express) was founded in 1988 at National institute of


Scientific information

• HGG (Human Genome Guide) is affiliated with the Institute of


Brain Research
• These two databases contain information on DNA Sequences
of Human, Viral, bacterial and mammalian

India
• India play a very significant role, by its special social structure

• It offers a rich resource for studying functional genomics or the


functional aspects of the genetic map.

• With its caste based communities intermarrying among


themselves, India provides rare genetic material in family
pedigrees
• Good genes like mathematical ability and bad genes like for
breast cancer are concentrated in families and communities
because of selective inter-marriage.

• These inbred communities can reveal the inheritance pattern of


genes and functional genomics can reveal what these genes
actually do.

• This is a powerful combination allowing the scientist to


understand how genetic disease is transmitted and how, by
understanding gene function, it can be treated
Biotechnology Mania
“At the moment there are 1,100 companies devoted to the manufacture of
medicines through recombinant techniques, to which we have to add over
700 corporations interested in the sector. On the whole, these companies
employ more than 100,000 people and represent a stock market value near
50,000 million dollars”.

➲ SmithKline Beecham has collaborated with Human Genome Sciences, Eli


Lilly with Millennium Pharmaceuticals and Pfizer with Incyte Genomics17.

➲ In March 2000, President Clindon announced that the Genome


sequence could not be patented, and should be made freely
available to all researchers. The statement sent Celera's stock
plummeting and dragged down the biotechnology. The
biotechnology sector lost about $50 billion in market capitalization in
two days.
What We Sti ll Don ’t Kn ow ??????
• Gene number, exact locations, and Functions
• Gene regulation
• DNA sequence organization
• Chromosomal structure and
organization
• Noncoding DNA types, amount,distribution, information
content, and functions
• Coordination of gene expression, protein synthesis, and
post-translational events
• Evolutionary conservation among organisms Disease-
susceptibility prediction based on gene sequence variation
•Protein conservation (structure and function)

Das könnte Ihnen auch gefallen