Beruflich Dokumente
Kultur Dokumente
Outline
Introduction Basic Statistical Methods of Case-control Study GWAS A Novel Epistasis-testing Procedure
Observational studies
1. Case-control studies 2. Cohort studies
Cohort Studies
A cohort study is a study where a group of individuals are followed. Cohort studies can be either prospective or retrospective
exposed
population
non-exposed
Disease +/-
Case-Control Studies
Case-control studies are used to identify factors that may contribute to a medical condition by comparing subjects who have that condition (the cases) with patients who do not have the condition but are otherwise similar (the controls) Case-control studies are retrospective and nonrandomized
Case-Control Studies
exposed
Disease population
non-exposed
exposed
Disease +
non-exposed
Selection of Cases
Population-based cases: include all subjects or a random sample of all subjects with the disease at a single point or during a given period of time in the defined population.
Selection of Controls
Principles of Control Selection: Study base: Controls can be used to characterise the distribution of exposure Comparable-accuracy: Equal reliability in the information obtained from cases and controls (to avoid systematic misclassification) Overcome confounding: Elimination of confounding through control selection (matching or stratified sampling)
Selection of Controls
General population controls: registries, households, telephone sampling costly and time consuming recall bias eventually high non-response rate Hospitalised controls: Patients at the same hospital as the cases Easy to identify; less recall bias; higher response rate
Case-control study
Not rare exposure Incidence rates cannot be estimated unless the study is population based retrospective, nonrandomized nature limits the conclusions that can be drawn from them.
3
4
0
1
F
F
1
1
2
1
2
2
-9
sample id
case/control
genotypes
Outline
Introduction Basic Statistical Methods of Case-control Study GWAS A Novel Epistasis-testing Procedure
Coding Genotypes
For one marker with two alleles, there can be three possible genotypes:
Genotype AA Aa aa Coding 2 1 0
AA
Aa aa
AA
Aa aa
AA Aa aa
AAaa
Hypothesis: the genetic effects of AA and Aa are the same (assuming A is the minor allele)
AA and Aa vs. aa
AA Aa aa
Aaaa
Hypothesis: the genetic effects of Aa and aa are the same (A is the minor allele)
AA vs. Aa and aa
H 0 : ij i. . j
AA nAA mAA Aa nAa mAa
df = 2
cases controls
aa naa maa
H 0 : ij i. . j
AA+Aa nAA + nAa mAA + mAa
df = 1
cases controls
aa naa maa
H 0 : ij i. . j
AA nAA mAA Aa +aa nAa + naa mAa + maa
df = 1
cases controls
H 0 : ij i. . j
A 2nAA + nAa 2mAA + mAa a nAa + 2naa mAa +2 maa
df = 1
cases controls
Test Statistic
Chi-squared Test Statistic:
(O E ) E all cells
2
O is the observed cell counts E is the expected cell counts, under null hypothesis of independence (row total column tot al) E N
Example
The following table summarize the genotype counts of marker M :
AA Aa aa
cases
controls
36
18
100
84
64
98
Different tests can be performed: - Allelic test - Dominant gene action - Recessive gene action - Genotypic test
R:
print(dominant_table )
chisq.test(dominant_table ,correct=FALSE)
R:
print(recessive_table)
chisq.test(recessive_table,correct=FALSE)
R:
print(genotypic_table)
chisq.test(genotypic_table,correct=FALSE)
R:
print(allelic _table)
chisq.test(allelic_table,correct=FALSE)
0 is the intercept
1, 2 J are the effects of genetic factors X1, X2 XJ are the dummy variables of genetic factors
H 0 : i 0
(i 1, .... , J )
is the estimated odds ratio for Estimator i genetic factor i. determines whether pdisease is The sign of i increasing or decreasing when the effect of genetic factor i exists.
An Example of R output
Other Options
Fishers Exact Test:
When sample size is small, the asymptotic approximation of null distribution is no longer valid. By performing Fishers exact test, exact significance of the deviation from a null hypothesis can be calculated.
a c
b d
Other Options
Cochram-Armitage Trend Test
-- An advantage of the Cochran-Armitage test is that it
does not assume Hardy-Weinberg equilibrium
-- Typically used to test a 2 k contingency table, when the effects of AA, Aa, and aa are thought to be ordered. -- In genome-wide association studies, the additive (or codominant) version of the test is often used.
Outline
Introduction Basic Statistical Methods of Case-control Study GWAS A Novel Epistasis-testing Procedure
Requires little on sample, Case-control data, case-parents trio data are enough. Good for moderate effect sizes ( odds ratio < 1.5). Particularly useful in finding genetic variations that contribute to common, complex diseases.
What Is A SNP?
TTCAGTCAGATCCTAGCCC AAGTCAGTCTAGGATCGGG TTCAGTCAGATCCCAGCCC
Chromosome 2
Chromosome 1
AAGTCAGTCTAGGGTCGGG
Single Nucleotide Polymorphism
Handling GWAS
Storing and converting large amounts of genotype data Quality control Generating initial association analysis Specialized analysis
Genetic Stratification
Assess population structure Adjust both phenotypes and genotypes for possible stratification using
-- principal component analysis (Prices method) -- cluster analyses (Plink)
Genomic Control
Software Demonstration
Plink
-- Case/control, TDT, quantitative traits
Software Demonstration
Haploview:
-- LD and haplotype block analysis -- tag SNP selection algorithm -- visualization and plotting GWAS results from PLINK
http://www.broadinstitute.org/haploview/haploview