Sie sind auf Seite 1von 10


Human Genome Project Overview

The Human Genome Project (HGP) was the international, collaborative research program
whose goal was the complete mapping and understanding of all the genes of human beings.
The Human Genome Project (HGP) refers to the international 13-year effort, formally begun
in October 1990 and completed in 2003, to discover all the estimated 20,000-25,000 human genes
and make them accessible for further biological study. Another project goal was to determine the
complete sequence of the 3 billion DNA subunits (bases in the human genome). As part of the HGP,
parallel studies were carried out on selected model organisms such as the bacterium E. coli and the
mouse to help develop the technology and interpret human gene function. The DOE Human
Genome Program and the NIH National Human Genome Research Institute (NHGRI) together
sponsored the U.S. Human Genome Project.
The Human Genome Project (HGP) was one of the great feats of exploration in history - an
inward voyage of discovery rather than an outward exploration of the planet or the cosmos; an
international research effort to sequence and map all of the genes - together known as the genome -
of members of our species, Homo sapiens. Completed in April 2003, the HGP gave us the ability to,
for the first time, to read nature's complete genetic blueprint for building a human being. The
Human Genome Project international consortium announced (April 2000) that 2 billion of the 3
billion “letters” that constitute the genetic instruction book of humans have been deciphered and
deposited into GenBank. GenBank, the public database of DNA sequence operated by the National
Institutes of Health, is accessible freely and without restrictions to all scientists in industry and
Human Genome Project assembles 12,000 bases every minute. 15 billion raw base pairs
were sequenced to reach the two billion. Each area of a chromosome at least four to five times to
insure that the data deposited is accurate. The “depth of coverage,”
Scientists have been quick to mine this new trove of genomic data, as well as to utilize the
genomic tools and technologies developed by the Human Genome Project. For example, when the
Human Genome Project began in 1990, scientists had discovered fewer than 100 human disease
genes. Today, more than 1,400 disease genes have been identified.
Coordinated Efforts
In 1988 DOE and NIH signed a Memorandum of Understanding in which the agencies
agreed to work together, coordinate technical research and activities, and share results. The two
agencies assumed a joint systematic approach toward establishing goals to satisfy both short- and
long-term project needs.
Early guidelines projected three 5-year phases, for which the first plan was presented to
Congress in 1990. The 1990 plan emphasized the creation of chromosome maps, software, and
automated technologies to enable sequencing.
By 1993, unexpectedly rapid progress in chromosome mapping required updating the goals
which now project through 1998. This plan is being revised again in anticipation of the approaching
high-throughput sequencing phase of the project. Last year marked an early transition to this phase
as many more genome sequencing projects were funded. The second and third phases of the project
will optimize resources, refine sequencing strategies, and, finally, completely determine the
sequence of all base pairs in the genome.
The International Human Genome Sequencing Consortium included hundreds of scientists
at 20 sequencing centers in China, France, Germany, Great Britain, Japan and the United States.
The five institutions that generated the most sequence were: Baylor College of Medicine, Houston;
Washington University School of Medicine, St. Louis; Whitehead Institute/MIT Center for Genome
Research, Cambridge, Mass.; DOE’s Joint Genome Institute, Walnut Creek, Calif.; and The
Wellcome Trust Sanger Institute near Cambridge, England.
Another area of DOE and NIH cooperation is in exploring the ethical, legal, and social
issues (ELSI) arising from increased availability of genetic data and growing genetic-testing
capabilities. The two agencies established a joint working group to confront these ELSI challenges
and have cosponsored joint projects and workshops.
Since October 1990, the project has been supported jointly by DOE and the National
Institutes of Health (NIH) National Human Genome Research Institute (formerly National Center
for Human Genome Research). Together, the DOE and NIH components make up the world's
largest centrally coordinated biology research project ever undertaken. By 1985, progress in genetic
and DNA technologies led to serious discussions in the scientific community about initiating a
major project to analyze the structure of the human genome. After concluding that a DNA sequence
would offer the most useful approach for detecting inherited mutations, DOE in 1986 announced its
Human Genome Initiative. The initiative emphasized development of resources and technologies
for genome mapping, sequencing, computation, and infrastructure support that would culminate in a
complete sequence of the human genome.
The flagship effort of the Human Genome Project has been producing the reference
sequence of the human genome. The international consortium announced the first draft of the
human sequence in June 2000. Since then, researchers have worked tirelessly to convert the “draft”
sequence into a “finished” sequence. Finished sequence is a technical term meaning that the
sequence is highly accurate (with fewer than one error per 10,000 letters) and highly contiguous
(with the only remaining gaps corresponding to regions whose sequence cannot be reliably resolved
with current technology). That standard was first achieved for a human chromosome when a team
of British, Japanese and U.S. researchers produced a finished sequence for human chromosome 22
in 1999. The finished sequence produced by the Human Genome Project covers about 99 percent of
the human genome's gene-containing regions, and it has been sequenced to an accuracy of 99.99
percent. In addition, to help researchers better understand the meaning of the human genetic
instruction book, the project took on a wide range of other goals, from sequencing the genomes of
model organisms to developing new technologies to study whole genomes. As of April 14, 2003, all
of the Human Genome Project’s ambitious goals have been met or surpassed.

Human Genome Project Goals

Begun formally in 1990, the U.S. Human Genome Project was a 13-year effort coordinated by
the U.S. Department of Energy and the National Institutes of Health. The project originally was
planned to last 15 years, but rapid technological advances accelerated the completion date to 2003.
Project goals were to

1. identify all the approximately 20,000-25,000 genes in human DNA,

2. determine the sequences of the 3 billion chemical base pairs that make up human DNA,
3. store this information in databases,
4. improve tools for data analysis,
5. transfer related technologies to the private sector, and
6. address the ethical, legal, and social issues (ELSI) that may arise from the project.

To help achieve these goals, researchers also studied the genetic makeup of several nonhuman
organisms. These include the common human gut bacterium Escherichia coli, the fruit fly, and the
laboratory mouse.

Besides delivering on the stated goals, the international network of researchers has produced an
amazing array of advances that most scientists had not expected until much later. These "bonus"
accomplishments include: an advanced draft of the mouse genome sequence, published in
December 2002; an initial draft of the rat genome sequence, produced in November 2002; the
identification of more than 3 million human genetic variations, called single nucleotide
polymorphisms (SNPs); and the generation of full-length complementary DNAs (cDNAs) for more
than 70 percent of known human and mouse genes.
Timeline & Cost
When the Human Genome Project was launched in 1990, many in the scientific community
were deeply skeptical about whether the project’s audacious goals could be achieved, particularly
given its hard-charging timeline and relatively tight spending levels. At the outset, the U.S.
Congress was told the project would cost about $3 billion in FY 1991 dollars and would be
completed by the end of 2005. In actuality, the Human Genome Project was finished two and a half
years ahead of time and, at $2.7 billion in FY 1991 dollars, significantly under original spending

The completion of the human DNA sequence in the spring of 2003 coincided with the 50th
anniversary of Watson and Crick's description of the fundamental structure of DNA. The analytical
power arising from the reference DNA sequences of entire genomes and other genomics resources
has jump-started what some call the "biology century."
Human Genome Project Completion Dates

Area HGP Goal Standard Achieved Date Achieved

Genetic Map 2- to 5-cM resolution map 1-cM resolution map (3,000 markers) September 1994
(600 – 1,500 markers)

Physical Map 30,000 STSs 52,000 STSs October 1998

DNA Sequence 95% of gene-containing part 99% of gene-containing part of April 2003
of human sequence finished human sequence finished to 99.99%
to 99.99% accuracy accuracy

Capacity and Cost of Sequence 500 Mb/year at < Sequence >1,400 November 2002
Finished Sequence $0.25 per finished base Mb/year at <$0.09 per finished base

Human Sequence 100,000 mapped human 3.7 million mapped human SNPs February 2003
Variation SNPs

Gene Identification Full-length human cDNAs 15,000 full-length human cDNAs March 2003

Model Organisms Complete genome sequences Finished genome sequences of April 2003
of E. coli, S. cerevisiae, C. elegans,
E. coli, S. cerevisiae, D. melanogaster, plus whole-genome
C. elegans, D. melanogaster drafts of several others, including C.
briggsae, D. pseudoobscura, mouse
and rat

Functional Analysis Develop genomic-scale High-throughput oligonucleotide 1994

technologies synthesis

DNA microarrays 1996

Eukaryotic, whole-genome 1999

knockouts (yeast)

Scale-up of two-hybrid system for 2002

protein-protein interaction

Genome Donors
In the IHGSC international public-sector Human Genome Project (HGP), researchers collected
blood (female) or sperm (male) samples from a large number of donors. Only a few of many
collected samples were processed as DNA resources. Thus the donor identities were protected so
neither donors nor scientists could know whose DNA was sequenced. DNA clones from many
different libraries were used in the overall project, with most of those libraries being created by Dr.
Pieter J. de Jong. It has been informally reported, and is well known in the genomics community,
that much of the DNA for the public HGP came from a single anonymous male donor from Buffalo,
New York (code name RP11).
HGP scientists used white blood cells from the blood of two male and two female donors (randomly
selected from 20 of each) -- each donor yielding a separate DNA library. One of these libraries
(RP11) was used considerably more than others, due to quality considerations. One minor technical
issue is that male samples contain only half as much DNA from the X and Y chromosomes as from
the other 22 chromosomes (the autosomes); this happens because each male cell contains only one
X and one Y chromosome, not two like other chromosomes (autosomes)
Although the main sequencing phase of the HGP has been completed, studies of DNA variation
continue in the International HapMap Project, whose goal is to identify patterns of single nucleotide
polymorphism (SNP) groups (called haplotypes, or “haps”). The DNA samples for the HapMap
came from a total of 270 individuals: Yoruba people in Ibadan, Nigeria; Japanese people in Tokyo;
Han Chinese in Beijing; and the French Centre d’Etude du Polymorphisms Humain (CEPH)
resource, which consisted of residents of the United States having ancestry from Western and
Northern Europe.
In the Celera Genomics private-sector project, DNA from five different individuals were used for
sequencing. The lead scientist of Celera Genomics at that time, Craig Venter, later acknowledged
(in a public letter to the journal Science) that his DNA was one of 21 samples in the pool, five of
which were selected for use.
On September 4th, 2007, a team led by Craig Venter published his complete DNA sequence,
unveiling the six-billion-nucleotide genome of a single individual for the first time.

What We've Learned So Far From Human Genome Project

What Does the Draft Human Genome Sequence Tell Us?
By the Numbers
• The human genome contains 3164.7 million chemical nucleotide bases (A, C, T, and G).
• The average gene consists of 3000 bases, but sizes vary greatly, with the largest known
human gene being dystrophin at 2.4 million bases.
• The total number of genes is estimated at 30,000 —much lower than previous estimates of
80,000 to 140,000 that had been based on extrapolations from gene-rich areas as opposed to
a composite of gene-rich and gene-poor areas.
• Almost all (99.9%) nucleotide bases are exactly the same in all people.
• The functions are unknown for over 50% of discovered genes.
The Wheat from the Chaff
• Less than 2% of the genome codes for proteins.
• Repeated sequences that do not code for proteins ("junk DNA") make up at least 50% of the
human genome.
• Repetitive sequences are thought to have no direct functions, but they shed light on
chromosome structure and dynamics. Over time, these repeats reshape the genome by
rearranging it, creating entirely new genes, and modifying and reshuffling existing genes.
• During the past 50 million years, a dramatic decrease seems to have occurred in the rate of
accumulation of repeats in the human genome.
How It's Arranged
• The human genome's gene-dense "urban centers" are predominantly composed of the DNA
building blocks G and C.
• In contrast, the gene-poor "deserts" are rich in the DNA building blocks A and T. GC- and
AT-rich regions usually can be seen through a microscope as light and dark bands on
• Genes appear to be concentrated in random areas along the genome, with vast expanses of
noncoding DNA between.
• Stretches of up to 30,000 C and G bases repeating over and over often occur adjacent to
gene-rich areas, forming a barrier between the genes and the "junk DNA." These CpG
islands are believed to help regulate gene activity.
• Chromosome 1 has the most genes (2968), and the Y chromosome has the fewest (231).
How the Human Compares with Other Organisms
• Unlike the human's seemingly random distribution of gene-rich areas, many other
organisms' genomes are more uniform, with genes evenly spaced throughout.
• Humans have on average three times as many kinds of proteins as the fly or worm because
of mRNA transcript "alternative splicing" and chemical modifications to the proteins. This
process can yield different protein products from the same gene.
• Humans share most of the same protein families with worms, flies, and plants, but the
number of gene family members has expanded in humans, especially in proteins involved in
development and immunity.
• The human genome has a much greater portion (50%) of repeat sequences than the mustard
weed (11%), the worm (7%), and the fly (3%).
• Although humans appear to have stopped accumulating repeated DNA over 50 million years
ago, there seems to be no such decline in rodents. This may account for some of the
fundamental differences between hominids and rodents, although gene estimates are similar
in these species. Scientists have proposed many theories to explain evolutionary contrasts
between humans and other organisms, including those of life span, litter sizes, inbreeding,
and genetic drift.
Variations and Mutations
• Scientists have identified about 1.4 million locations where single-base DNA differences
(SNPs) occur in humans. This information promises to revolutionize the processes of finding
chromosomal locations for disease-associated sequences and tracing human history.
• The ratio of germline (sperm or egg cell) mutations is 2:1 in males vs females. Researchers
point to several reasons for the higher mutation rate in the male germline, including the
greater number of cell divisions required for sperm formation than for eggs.
Benefits of Human Genome Project Research
Rapid progress in genome science and a glimpse into its potential applications have spurred observers to
predict that biology will be the foremost science of the 21st century. Technology and resources generated by
the Human Genome Project and other genomics research are already having a major impact on research
across the life sciences. The potential for commercial development of genomics research presents U.S.
industry with a wealth of opportunities, and sales of DNA-based products and technologies in the
biotechnology industry are projected to exceed $45 billion by 2009.
Molecular Medicine
• Improved diagnosis of disease
• Earlier detection of genetic predispositions to disease
• Rational drug design
• Gene therapy and control systems for drugs
• Pharmacogenomics "custom drugs"
Technology and resources promoted by the Human Genome Project are starting to have profound impacts on
biomedical research and promise to revolutionize the wider spectrum of biological research and clinical
medicine. Increasingly detailed genome maps have aided researchers seeking genes associated with dozens
of genetic conditions, including myotonic dystrophy, fragile X syndrome, neurofibromatosis types 1 and 2,
inherited colon cancer, Alzheimer's disease, and familial breast cancer.

Energy and Environmental Applications

• Use microbial genomics research to create new energy sources (biofuels)
• Use microbial genomics research to develop environmental monitoring techniques to detect
• Use microbial genomics research for safe, efficient environmental remediation
• Use microbial genomics research for carbon sequestration
Risk Assessment
• Assess health damage and risks caused by radiation exposure, including low-dose exposures
• Assess health damage and risks caused by exposure to mutagenic chemicals and cancer-causing
• Reduce the likelihood of heritable mutations
Bioarchaeology, Anthropology, Evolution, and Human Migration
• Study evolution through germline mutations in lineages
• Study migration of different population groups based on female genetic inheritance
• Study mutations on the Y chromosome to trace lineage and migration of males
• Compare breakpoints in the evolution of mutations with ages of populations and historical events
DNA Forensics (Identification)
• Identify potential suspects whose DNA may match evidence left at crime scenes
• Exonerate persons wrongly accused of crimes
• Identify crime and catastrophe victims
• Establish paternity and other family relationships
• Identify endangered and protected species as an aid to wildlife officials
• Detect bacteria and other organisms that may pollute air, water, soil, and food
• Match organ donors with recipients in transplant programs
• Determine pedigree for seed or livestock breeds
• Authenticate consumables such as caviar and wine
Agriculture, Livestock Breeding, and Bioprocessing
• Disease-, insect-, and drought-resistant crops
• Healthier, more productive, disease-resistant farm animals
• More nutritious produce
• Biopesticides
• Edible vaccines incorporated into food products
• New environmental cleanup uses for plants like tobacco

Implications of the Human Genome Project

The effects of the Human Genome Project will be far-reaching. The best minds in commerce
and industry will undertake issues related to patents and licenses. The insurance industry will be
revolutionized by the effect of genetic information on future actuarial tables.
Ultimately, our predisposition to health and disease will be known. Genetic mutations will
no longer be regarded simply as defects but will be used to understand the etiology of disease at the
most basic level. We may incorporate the new genetics into our lifestyle choices. Cloning, a current
controversy may solve the shortage of organs for transplantation.
Finally, health professionals need to become more comfortable and conversant with the
concepts of the new genetics, especially when these concepts relate to how genetic predisposition
affects the risk for developing disease. The National Coalition for Health Professionals Education in
Genetics was formed to address these issues and to assist the education of health professionals in
this area.
The medical industry is building upon the knowledge, resources, and technologies
emanating from the HGP to further understanding of genetic contributions to human health. As a
result of this expansion of genomics into human health applications, the field of genomic medicine
was born. Genetics is playing an increasingly important role in the diagnosis, monitoring, and
treatment of diseases.
The Human Genome Project and the Future of Drug Development
The pharmaceutical industry is anticipating how information from the Human Genome
Project will affect drug development. The potential benefits the new genetics will have on drug
therapy. For example, in the future it may be possible to readily identify patients who rapidly
metabolize a drug so that a higher dose of the drug can be used. On the other hand, a person who
metabolizes a drug slowly or not at all will not be given the drug. At present, pharmacologic
approaches block tissue receptors or inhibit specific enzymes; in the future, specific genes will be
either turned on or off.
Consider the example of hemochromatosis. Today, hemochromatosis is detected when
complications such as diabetes, heart failure, or liver damage occur. Ninety percent of those
affected have 1 or 2 mutations that can be detected by genetic screening. Early detection and
therapy can prevent the complications associated with hemochromatosis. In the future,
pharmacologic agents might be developed to prevent the accumulation of iron that causes tissue
damage and eliminate the need for the cumbersome phlebotomies that are the mainstay of current
Gene-based therapies may be directed either at correcting gene mutations caused by
exposure to injurious substances or regulating the expression cancer causing genes that are not
responsive to lifestyle modifications.