Molecular and Genetic Epidemiology

5
W
i
wst
Molecular and Genetic
Epidemiology

LEARNING OBJECTIVES
By the end of this chapter the reader will be able to:
state the fundamental differences between molecular and genetic
epidemiology
describe the basic principles of inheritance and sources of genetic variation
identify at least three reasons for the familial aggregation of a given
disease
define epidemiologic approaches for the identification of genetic components
to disease
explain the basic principles of segregation and linkage analysis
state research applications of molecular and/or genetic epidemiology in
infectious diseases, cancer, and other chronic diseases
CHAPTER OUTLINE
HR
VII. Genome-Wide Association Studies (GWAS)
VIII. Linkage Disequilibrium Revisited: Haplotypes
IX. Application of Genes in Epidemiologic Designs
X. Genetics and Public Health
XI. Conclusion
XII. Study Questions and Exercises
Introduction
Mapping of the human genome and the subsequent advances in molecular biology forever
changed epidemiologic research on disease etiology. (Refer to Figure 141 for a photograph
of the deoxyribonucleic acid [DNA] helix.) Gone are the days when measurements of
exposure are limited to simple interview data, mailed questionnaires, inspection of
secondary data, and surrogate measures of the primary exposure of interest. The value of
descriptive epidemiology and disease monitoring and surveillance remain important
applications of epidemiology. However, modern epidemiologists find themselves armed
with several new strategies to assess precursors of disease, identify biologic markers of
exposure, and search for the biologic bases for responses. The wide differential in human
responses to the same environmental exposure is an intriguing issue for epidemiology that
may be explored by using these advanced techniques.
The traditional epidemiologic approachcharacterized by examination of the
distribution of health conditions in populations and discerning risk factors for themhas
proved useful for generating hypotheses and unraveling disease etiologies. However,
suppose that it were possible to go beyond rhese methods and look inside the black box of
disease processes. If this black box were to become transparent, epidemiologists would be
able to change the definition of risk factors or clarify their location in a causal model.
1

This chapter presents an overview of fundamental principles of molecular and genetic
epidemiology. For readers without a strong background in biology, we begin with a review
of basic principles of human genetics (Exhibit 141). Next, we will define the terms genetic
epidemiology and molecular epidemiology and distinguish between them. We will then
present several epidemiologic approaches to identify genetic components of disease,
provide an overview of some strategies to identify genes in studies of families, and present a
description and recent results from genome-wide association studies (GWAS). The chapter
includes

FIGURE 14-1 Model of the DNA helix.

Basic Principles of Human Genetics
Readers with a prior background in molecular biology and genetics (MBG) may be tempted to
skip this section. It is primarily intended to be a refresher for readers who have some familiarity
with MBG or who have been away from the topic. However, we include some discoveries about
the nature of the genetic code with which you may not be familiar. For readers with limited
grounding in MBG, this brief review may be insufficient. You are encouraged to pursue additional
details in one of several fine textbooks on human genetics. In order of simplest to most complex,
we recommend the following texts:
Mange EJ, Mange AP. Basic Human Genetics, 2nd ed. Sunderland, MA: Sinauer Associates, Inc;
1999.
Speicher M, Antonarakis SE, Notulsky AG, eds. Vogel and Motulskys Human Genetics:
Problems and Approaches, 4th ed. Heidelberg, Germany: Springer; 2010.
continues
602 C H A P T E R 1 4 MO L E C U L A R A N D GE N E T I C E P I D E MI O L O G Y
continues

EXHIBIT 14-1 continued
Singer M, Berg P. Genes <& Genomes: A Changing Perspective. Mill Valley, CA:
University Science Books; 1991.
The Genetic Code
W t S k % ' > - . - \ V - -
1
f c . -
The genetic code is the blueprint of instructions to make our body. These
instructions, coded in the form of DNA, are passed on from parents to their
children at the time of conception and are recorded using a very simple
alphabet of only four letters: A, C, G, and T. These represent four different
nucleic acidsadenine, cytosine, guanine, and thyminethat are used to spell
words, called codons. Each codon (word) contains only three letters and
represents the codes to construct amino acids.
Although the genetic alphabet contains only four letters (deoxyribonucleic
acids), they can be combined to code for 64 different codons (amino acids).
However, the human body is composed of only 20 amino acids, some of which
can be encoded in more than one way. This possibility means that our genetic
code is degenerate, meaning that most amino acids can be specified in several
ways. The only exceptions to coding in several ways are methionine (TAC) and
the least frequent amino acid, tryptophan (ACC), which are encoded in one
unique way. In fact, nine amino acids can be coded in two different ways, one
can be coded three different ways, five can be coded four different ways, and
three can be coded with six different codons. Three codons do not code for an
amino acid, but rather signal the end of the gene (ACT, ATT, ATC). The amino
acids (words) are strung together in long sequences to form sentences.
These sentences are the complete instructions to make a specific protein in a
part of our body- such as skin, hair, red blood cells, bone, nerve spindlesor
the enzymes, hormones, and growth factors that regulate our body and make
us what we are. A gene is the genetic code corresponding to one sentence.
Physical Arrangement of DNA
Although the coded instructions for making our bodies are contained in linear
sequences of DNA (representing roughly 3 billion bases), these sequences are
not one long garbled string of genes. Rather, the units of DNA themselves are
organized onto chromosomes. Each human should
E X HI B I T 1 4 1 continued
continues

have 23 pairs of chromosomes: one pair to determine sex and 22 pairs of autosomes.
Each chromosome differs in size and is numbered from 1 to 22. Women have
two X chromosomes (one from the mother and one from the father) and men have
one X from their mother and a Y chromosome from their father. Transmission of
chromosomes from parents to offspring occurs through the formation of gametes
during a process called meiosis. For men, meiosis is the production of sperm and
occurs throughout life. For women, meiosis leads to the production of oocytes, or
eggs. As opposed to men, women are born with their full complement of oocytes,
but only one or two are allowed to mature each month during their years of
reproductive potential. Normal sperm and oocytes contain only one copy of each
chromosome, so at the time of conception a full complement of 23 pairs of
chromosomes is formed.
From DNA to Protein
There are several important things to know about the human genome. The first
point is that not all DNA contained in our cells is transcribed into protein. Second,
within a region of DNA on a particular chromosome that codes for a gene, only
certain segments are transcribedthe process by which DNA is copied into RNA
(ribonucleic acid)and translatedthe process by which RNA is read and proteins
are assembled. That is, the sequence of nucleic acids that determine the order and
length of amino acids needed to build a certain protein is not necessarily a straight
run. Certain stretches of DNA will be copied (called exons or expressed sequences)
and other stretches of DNA will be essentially ignored (called introns or intervening
sequences). As an extreme example, the gene that codes for clotting factor VIII (and
is mutated in persons with hemophilia) has 26 exons that code for about 2,000
amino acids. However, these codes represent only about 4% of the total length of
the gene! The final, and most important, point is that individuals differ from one
another in terms of their DNA.
Although all humans can be thought of as having essentially the same number of
genes, they clearly do not have identical sequences of DNA. Several recent
discoveries have revealed that the genome is far more complex.
For example, the amount of DNA individuals carry is not identical, with the
difference in genomic size between two individuals being as large as 9 million base
pairs.
2

E X HI B I T 1 4 1 continued
continues

Genetic Variation
How is DNA different from one person to another? Although the complete story is
beyond the scope of this chapter, suffice it to say that changes can occur in a wide
variety of ways. A mutation is defined as a change in DNA that may adversely
afFect the host. One category of mutations, known as frameshift mutations, is the
result of deletions or insertions of one or more DNA bases. These mutations not
only alter the codon in which they occur, but also may shift the reading frame of all
successive three-letter words.
Another type of mutation is one that changes the chemical structure of one
nucleic acid to that of another. Because most amino acids can be formed from more
than one combination of nucleic acids, sometimes mutations can be "silent and not
result in a change in amino acid. For example, a mutation from AAA to AAG
would still lead to the incorporation of phenylalanine. Thus, although the sequence
of DNA has changed, the amino acid sequence of the transcribed protein has not.
Alternatively, the alteration of a single codon can have a profound effect. For
example, a mutation ofT to A in the middle base of the sixth codon for the P chain
of hemoglobin changes the amino acid from glutamine to valine. The result is a
change in the shape of hemoglobin from smooth and rounded to distorted, and
ultimately a disease known as sickle-cell anemia.
Mutations also can occur within in irons, the noncoding regions of DNA, which
would be expected to have little effect on the protein product of that gene. Recent
evidence suggests that even mutations in introns can sometimes have a profound
effect on the protein product of a gene. The most serious, and easiest to recognize,
mutations are ones in which a mutation in a nucleic acid produces an inadvertent
<{
stop codon that signals the end of transcription before the full-length gene
product can be transcribed. The result is a protein that is shorter than normal, or
truncated, with a corresponding effect on its function or integrity.
Alterations can be much larger in scale than a single base pair. Recent studies
show that alterations may be as large as one thousand to several
thousand base pairs.
3,4
The concept of copy number variations (CNVs) also has
been established recently. The concept of CNV refers to a situation in which the
number of copies of a gene differs between individuals.
Interestingly, many of these CNVs have been identified in human genes that
reflect senses (smell, hearing, taste, and sight) and disease susceptibility. It has been
hypothesized that several thousand of these CNVs occur; their presence raises
EXHIBIT 141 continued

important questions about their public health significance.
Review of Genetic Terminology
Having finished this brief review of DNA and genetic variation, we present a few
more definitions. The basic unit of heredity is a gene, the particular segment of a
DNA molecule on a chromosome that determines the nature of an inherited trait. An
allele is one of two or more alternative forms of a gene that occurs at the same locus.
Of course, we have not yet defined a locus, either. It is the site or location on a
chromosome occupied by a gene (i.e., a particular set of alleles). The genotype of an
individual refers to his or her genetic constitution, often stated in reference to a
specific trait or at a particular locus. The phenotype is the realized expression of the
genotype, or the observable physical appearance or functional expression of a gene.
An important situation in which genetics intersects with epidemiology happens
when a genotype is modified (or interacts with the environment) to affect a
phenotype (disease). Another important term is Mendelian inheritance (named for its
discoverer, the 19th century Austrian monk, Gregor Mendel), which denotes the
transmission ot a disease or trait from parents to offspring according to simple laws
of inheritance.
illustrative published examples and concludes with an overview of how the field of
genetics has and will continue to influence the practice of public health.
Def i ni t i ons and Di st i nct i ons: Mol ecul ar
Versus Genet i c Epi demi ol ogy
It is now quite commonplace to be a hyphcn-epidemiologist. That is, rarelv is it sufficient
to describe oneself as a simple country epidemiologist anymore. Many in the field add
some sort ot modifier to rheir title, for example, pharmaco-epidemiologist,
DE F I N I T I O N S A N D DI S T I N C T I O N S a

behayioral-epidemiologist, or neuro-epidemiologist. To this partial list we must add
molecular-epidemiologist and genetic-epidemiologist, terms that many people use
interchangeably. We describe in which respects these two fields overlap, how they differ, and
how technological advances in high throughput genotyping are reuniting molecular and
genetic epidemiology. (Throughput refers to the amount of work that can be performed in a
given period of time.)
Genetic Epidemiology
The field of genetic epidemiology is devoted to the identification of inherited factors that
influence disease, and how variation in the genetic material interacts with environmental
factors to increase (or decrease) risk of disease. The first textbook on the subject defines it as a
discipline that seeks to unravel the role of genetic factors and their interactions with
environmental factors in the etiology of diseases, using family and population study
approaches.
5
An important premise is that a better understanding of the genetic etiology of
disease can facilitate early detection in high-risk subjects and the design of more effective
intervention strategies.
6
The unifying theme of genetic epidemiology is the focus on genes and
evidence for genetic influences.
Note that to answer questions two through four, families (or at least pairs of relatives) were
historically required. The approach of using related persons is
Genetic Epidemiology Can Be Thought of as a
Collection of Methodologies Designed to Answer Four
Question :
1. Does the disease of interest cluster in families?
2. Is the clustering a reflection of shared lifestyle, common environment, or similar risk
factor profiles?
3. Is the pattern of disease (or risk factor for a disease) within families consistent with the
expectations under Mendelian transmission of a major gene? Described in more detail
later in the chapter, Mendelian transmission refers to the inheritance of characteristics in
accord with Mendels laws of inheritance.
4. Where is the chromosomal location of the putative gene?

DE F I N I T I O N S A N D DI S T I N C T I O N S a

quite different from traditional epidemiology, which assumes that the subjects under study are
independent. When study subjects are biologically related, by definition they are no longer
independent. Lack of independence necessitates special rules in selection of subjects and
analytic approaches. The epidemiologic approach to identify genetic factors that influence
disease does not require prior knowledge about the pathophysiologic process that underlies
the inherited susceptibility. Rather, given that the clustering of a disease in families is not due
to shared environment, and is consistent with Mendelian transmission of a major gene, the
goal is to identify regions of DNA that cosegregate (are inherited in the same pattern) with the
disease of interest. Once the chromosomal region is narrowly defined, the torch is passed to
molecular geneticists to identify the appropriate gene using highly specialized techniques.
This approach, called positional cloning or physical mappings contrasts with a more
traditional laboratory approach called functional cloning or functional mapping. The latter
does not require epidemiology or families and is based instead on identification of proteins
that are involved in a disease process. Once a protein has been identified, scientists can
determine its amino acid sequence. Working backward, the researcher is able to decipher the
DNA code for the sequence of amino acids. Finally, the investigator finds where this DNA
sequence occurs in the human genome. Note that physical mapping has historically only been
applied when the genetic influence on disease is great enough that there will be a Mendelian
pattern of disease in the family. These contrasts in approaches are depicted in Figure 142,
adapted from

FI GURE 14 2 St r at egi es t o i dent i f y human genes. Source: Adapt ed f r om FS Col l i ns,
Posi t i onal Cl oni ng Moves f r om Per di t i onal t o Tr adi t i onal , as publ i shed i n Nature
Genetics, Vol 9, pp. 347- 350, 1995
Disease
Disease
Map Function
Functional cloning
Gene
Positional cloning
608 C H A P T E R 1 4 MO L E C U L A R A N U o t wc n u - .

a review on the subject by Dr. Francis Collins.
7
Collins led the Human Genome Project, an
ambitious public/private venture to determine the full sequence of the roughly 3 billion
nucleotides in our DNA, and served as Director of the National Human Genome Research
Institute. GWAS have clearly demonstrated that they can identify susceptibility loci with
modest effect sizes. Drilling down to find the actual gene in a region of interest has
traditionally been done in families. However, with advances in genotyping abilities, the
ability to carefully refine the causal genetic variant is now possible through the study of
unrelated individuals. Thus, the once clear lines between molecular and genetic
epidemiology are beginning to blur somewhat.
Molecular Epidemiology
Basically, a greater precision in estimating exposure-disease associations can be made by
using molecular biology to improve the measurement of exposures and disease. The term
molecular epidemiology has been attributed to researchers Perera and Weinstein. Molecular
epidemiology has the possibility of providing early warnings for disease by flagging
preclinical effects of exposure.
8
The field is much broader than genetic epidemiology and
includes a wide variety of biologic measures of exposure and disease. As it relates to her
research on the causes of cancer, Perera noted that molecular epidemiology combines
advances in the molecular biology and molecular genetics of cancer with epidemiology
to understand the molecular dose of specific agents, their preclinical effects, and the
biologic factors that modulate susceptibility to their exposure.
9
^
233
' Many definitions of
molecular epidemiology include the concept of biomarkers.
10
Consider the following
examples:
Rather than rely on individual recall of a usual diet to classify individuals
according to intake of fruits and vegetables, assess serum levels of micro- nutrients
to obtain more precise measurements of intake of fruits and vegetables.
Rather than conduct a clinical trial with colon cancer as the end point, use an
intermediate marker (an accepted precursor lesion: the adenomatous polyp).
Rather than treat all cases of breast cancer as the same disease, use tumor markers
to identify potentially more heterogeneous subsets.
In trying to identify whether clusters of cases of infectious disease are from a
common source, characterize the agents according to their DNA fingerprint.
From these examples, the reader should notice that the markers (exposures) are based
on biologic specimens (e.g., blood, tissue, urine, and sputum) rather than questionnaire
or medical records data. As stated earlier, the terms molecular epidemiology and genetic
epidemiology are often used interchangeably. One reason is that molecular
epidemiology commonly measures inherited variation in DNA (as opposed to acquired

variationsomatic mutationsin our DNA) to classify subjects. Thus, when genes are
involved, there is an overlap between molecular and genetic epidemiology. One
distinction between the two is that molecular epidemiology does not involve studies of
biologically related individuals. Another distinction is that most molecular
epidemiologic studies are conducted to evaluate the significance of variation in genes
that would not necessarily manifest as Mendelian patterns of disease in a family. As
molecular biologists and molecular geneticists work to unravel disease processes, they
discover various proteins that are involved. Genes determine these proteins; if
individuals differ from one another in the genetic sequence of a protein that is
functionally involved in the disease process, then evaluation of this genetic variation in
epidemiologic studies could yield important insights into disease etiology.
Molecular Versus Genetic Epidemiology
Genetic epidemiology: concerned with inherited factors that influence risk of
disease
Molecular epidemiology: uses molecular markers (in addition to genes) to establish
exposure-disease associations I
Epi demi ol ogi c Evi dence f or Genet i c Fact ors
If a disease has a genetic component, since close relatives of a case have a certain
probability of sharing rhe same gene that influences risk of disease, there should be an
excess occurrence of disease in that family. From an etiologic perspective, measurement
and evaluation of family history as a risk factor may shed light on the contribution of
familial factors on the pathogenesis of the outcome of interest. A simple definition of a
positive family history is the occurrence of the same disease or trait within a family. A
more precise definition would include the specific types of relatives that will be
considered, for example, first-degree relatives (parents, siblings, and offspring) or
second-degree relatives (grandparents, aunts and uncles, nieces and nephews,
grandchildren) plus specifics on the disease (such as the type of cancer) and their age at
onset (since familial disease is generally held to have an earlier age at onset than disease
unrelated to genetic factors).
Several epidemiologic designs might be employed to evaluate the association of family
610 C h a p t e r 1 4 mo l e c u l a k a i n U

history with disease. A cross-sectional survey of a representative sample of the
population could assess the frequency of respondents with a positive history. However, if
the frequency of the disease of interest was rare, this method would not be very efficient.
A more common strategy is to conduct a case-control study and compare the frequency
of family history in both groups. If there was a genetic component to the disease, one
would expect the odds of a positive family history of the disease to be greater among the
cases than among the controls.
As with any case-control study, recall bias is always a potential problem with family
studies. Family recall bias is the special situation where cases are more likely to be
informed about their family history than are controls.
11
It is not difficult to imagine that,
as a consequence of having the disease, one learns much greater detail about affected
family members than had been known prior to disease onset. An approach to overcome
family recall bias is to perform a cohort study in which assessment of family history
occurs at baseline, prior to the onset of disease. The cohort is then followed
prospectively for the development of the outcome under investigation. A disadvantage,
of course, is that the length of the follow-up period could be extensive before sufficient
cases accrue for meaningful analysis. In addition, family history is not a static
characteristic but a dynamic risk factor that can change with time as unaffected
relatives develop the outcome under investigation. Assessment of family history at a
single point in time cannot capture such changes.
Causes of Fami l i al Aggregat i on ___________
Although demonstration that a disease or trait clusters in families is certainly
acceptable evidence that genetics may be important, several alternative explanations
must be considered. The explanations include the operation of chance and the
influence of environmental factors. Zhao and colleagues have presented the case that
the series of inquiries that characterize genetic epidemiology can be considered as part
of a sequence of studies, built upon a common epidemiologic framework.
12

Bad Luck
The first of the explanations for the familial aggregation of disease is simply chance.
Given that there is a finite probability for the development of a particular disease or
adverse health phenomenon, even in the absence of any genetic contribution at all, the
disease may afflict several members of the same family. This occurrence is especially
prevalent for the common diseases of major public health importance, such as obesity,
mental illness, heart disease, and cancer. For example, an early study of the aggregation of
cancer used cancer mortality rates for the adult British population to show that half of all
families with more than five adults would have at least one case of cancer by chance
alone.
13
This is conceptually similar to the concept of clustering, but with the grouping
based on family rather than space or time.
Two other factorsnot necessarily a simple reflection of bad luckwould affect the
likelihood that someone reports a positive family history: age and family size. Although
every person has at least two first-degree biologic relatives (our parents), the persons
numbers of siblings, aunts, uncles, cousins, and children are clearly random variables. For
example, someone who has 15 close relatives at risk for a common chronic disease is more
likely to have a relative with the disease than a person with only two relatives at risk.
Similarly, because age is the single most important risk factor for many diseases of public
health importance, an older person (with older relatives) is more likely than a younger
person (with corresponding younger relatives) to have a family history for nongenetic
reasons. Consequently, adjustment for age and family size in the analysis is encouraged
when one is trying to assess the association of family history with risk of a particular
disease.
Bad Environment
Epidemiologists historically have devoted their energies and attention to the identification
of nongenetic risk factors for disease. As illustrated throughout the other chapters of this
book, history is replete with many success stories. This information on exposuredisease
relationships must be considered as explanation for any observed clustering of disease in
families. For example, although roughly 95% of all lung cancer cases are current or former
smokers, fewer than 20% of heavy smokers ever develop the disease.
14
This observation
has caused some to hypothesize that host factors might influence response to
environmental agents (tobacco). A complicating factor in the study of a disease with such
a strong environmental risk factor is the extent to which family members also smoke
cigarettes. Twin studies conducted in the 1930s provided strong evidence that smoking
habits clustered in families. Therefore, if family members of cases were more likely to
smoke than family members of controls, a greater proportion of cases would be expected
to report a positive family history of lung cancer. In this situation, however, clustering of
cases in families could be due to shared lifestyle, rather than shared genes.

Studies of migrants from low-risk to high-risk countries clearly support the notion that
aspects of diet are associated with coronary heart disease and cancer. Diets low in
complex carbohydrates and fruits and vegetables are associated with diabetes and cancer.
Several studies suggest the familial nature of dietary intake patterns.
16,17
To the extent
that family members share dietary habits that increase risk for a given disease, familial
clustering of disease may occur.
Similarly, health-related behaviors, such as exercise practices, alcohol intake, and use
of sunscreen, may be learned within a family and indirectly relate to clustering of disease.
Exercise levels are related to avoidance of overweight, promotion of bone strength, and
cardiovascular health. Moderate alcohol consumption, especially intake of red wine, is
thought to be protective for the heart; however, overconsumption is associated with
adverse health outcomes and car crashes among drivers who are driving under the
influence. Finally, avoidance of excessive sun exposure reduces the occurrence of many
forms of skin cancer. The foregoing are examples of some of the many health-related
practices that may be transmitted within the family environment.
Many risk factors shared by family members are a reflection of shared environment,
such as water supply, radon from the soil, air quality, pesticides, lead paints, and even
occupation. A case report of a family with four members with mesothelioma, a rare
cancer of the lining of the peritoneal cavity, was traced back to a common occupational
exposure to asbestos.
18
Infectious diseases also may be included in this category, and
there are a number of published examples in which familial clustering of hepatitis or
tuberculosis occurs from common exposures.
Shared Fami l y Envi ronment and Fami l i al
Aggregat i on ___________________________
A difficulty in the interpretation of family history data is the inability to determine the
influence of nongenetic risk factors on any observed familial clustering. From an etiologic
perspective, measurement and evaluation of risk factors are

Molecular and Genetic Epidemiology

Hochgeladen von

Dokumentinformationen

Originalbeschreibung:

Copyright

Verfügbare Formate

Dieses Dokument teilen

Dokument teilen oder einbetten

Freigabeoptionen

Stufen Sie dieses Dokument als nützlich ein?

Sind diese Inhalte unangemessen?

Copyright:

Verfügbare Formate

Molecular and Genetic Epidemiology

Hochgeladen von

Copyright:

Verfügbare Formate

5

Das könnte Ihnen auch gefallen