Sie sind auf Seite 1von 20

Quantitative Traits

• Vary continuously (e.g.

yield, quality, stress
QTL Mapping tolerance)
• Usually governed by a
number of genes
Violeta I. Bartolome • Loci involved in the
Senior Associate Scientist-Biometrics
Crop Research Informatics Laboratory inheritance of quantitative
International Rice Research Institute traits are called QTL
(quantitative trait loci)

Mapping Populations
QTL Mapping
Objective is to identify QTLs that affect
the quantitative trait of interest.
Methods to Detect QTLS
Data Needed for QTL Mapping

• Assign a trait value for each mapping • Single marker analysis

population member. • Interval mapping
• Allele score for the set of marker loci • Composite interval mapping
distributed throughout the genome. • Multiple QTL mapping

Single Marker Analysis (SMA)

Model for SMA

y = µ + MG i + e

where y = the phenotype

MG = marker genotype
Single Marker Analysis
Single Marker Analysis

• A significant difference between

phenotypic means of the groups indicates
that the marker locus being used to
partition the mapping population is linked
to a QTL controlling the trait.
• The QTL and marker is usually inherited
together and the mean of the group with
the tightly-linked marker will be
significantly different to the mean of the
group without the marker.

Single Marker Analysis

• Advantages • Disadvantages
o Simple o Must exclude
individuals with
o Easily incorporates missing genotype
covariances data R/qtl
o Does not require a o Less precise about
complete genetic the location of the
o The farther away a Data Entry, Data Quality Check,
QTL from a marker
the less likely it is to and Single Marker Analysis
be detected thus the
QTL effect may be
o Only considers one
QTL at a time.
Data analyzable by R/qtl Data not analyzable by R/qtl
• F2
• Outcross data
• Backcross
• Half-sib families
• RILs
• Advanced intercross lines
– class(mydata) [1] <- “riself” # if by selfing
– Class(mydata) [1] <- “risib” # if by sibling mating

Sample Data
Input files • cvs format

• Text file (comma delimited)

• Mapmaker format
• QTL cartographer format

Sample Data Sample Data
• Map maker format – genotype data • Map maker format – phenotype data


Sample Data Sample Data

• QTL Cartographer format – Rcross data • QTL Cartographer format – Rmap data
Reading cross data Reading csv data
• csv file
read.cross(“csv”, file=“csvfile.csv”,
genotypes=c("A","H","B","D","C" ))
• Map maker file
• QTL cartographer

Data Quality Check
• Plot a grid showing
which genotypes
are missing

Note: Genotypes with

Drop markers deviating from the hypothesized ratio missing data are denoted by
using the following statement black pixels. plot.pheno()

• Plot genetic map of • Plots a histogram or

marker locations for barplot of the data
all chromosomes for a phenotype
from an
experimental cross

Note: pheno.col indicates the column

number of the data to be plotted.

plot() est.rf()
• Estimate the sex-averaged recombination
• Plots all graphs fraction between all pairs of genetic
together markers
• For a backcross, one can simply count
recombination events. For an intercross or
4-way cross, recombination fractions must
be estimated.
Plot both rf and lod

• Plot a grid showing the recombination

fractions for all pairs of markers, and/or
the LOD scores for tests of linkage
between pairs of markers
• If both are plotted, the recombination
fractions are in the upper left triangle
while the LOD scores are in the lower right
triangle. Red corresponds to a large LOD
or a small recombination fraction, while
blue is the reverse. Missing values appear
in light gray

Plot rf and lod for Chr 1 only Plot lod only for Chr 2 and 3
scanone() scanone(cross, chr, pheno.col=1,
• Genome scan with a single QTL method=c("em","imp","hk","ehk","mr","mr-
model, with possible allowance for imp","mr-argmax"), addcovar=NULL, n.perm,)
covariates, using any of several cross – object to be analyzed
possible models for the phenotype chr - optional vector indicating the chromosomes for
and any of several possible which LOD scores should be calculated
numerical methods pheno.col – column number of the phenotype data
addcovar - additive covariates, allowed only for the normal
and binary models
n.perm – the number of permutations

• normal – the standard QTL model for QTL
mapping. The residual phenotypic variation
is assumed to follow a normal distribution
• binary – for binary phenotype, which must
have values 0 and 1. Available for em and mr
methods only
• 2part – when there is a spike in the
phenotype distribution
• np( non-parametric) – an extension of the
Kruskal-Wallis test is used
Single marker ANOVA
• Threshold=3
• mr – single marker regression
o mr – deletes individuals with missing genotype
o mr-imp – fills in missing data using single imputation
o mr-argmax – fills in missing data suing the Vitervi algorithm
• em – maximum likelihood using the Expectation-
maximization (EM) algorithm • Using permutation test
• hk – Haley-Knott regression
• imp – multiple imputation (Sen and Churchill, 2001).
Uses Monte Carlo algorithm instead of EM.
• ehk – extended Haley-Knott method (Feenstra et al.,
2006). An improvement of the hk especially when
epistasis exists between QTLs

Estimating heritability
for each marker Interval Mapping (IM)

• Used for estimating

the position of a QTL
within two markers
• Statistically more
powerful than single
marker analysis
Methods used in IM Probabilities of a putative QTL for
• Maximum Likelihood (standard interval a backcross
• Haley-Knott Regression (1 − r1 )(1 − r2 )
• Extended Haley-Knott Regression 1 − r12
Note: (1 − r1 )r2
• All methods estimate three parameters: mean, r12
genetic effects and residual variance. r1 (1 − r2 )
• All methods compute the conditional r12
probabilities for each QTL genotype at a position r1r2
between markers. Prob(Q|m1m2)
1 − r12

LOD Scores Odds

• Logarithmic of the odds – used to identify
prob . of success p
the most likely position for a QTL in Odds = =
relation to the linkage map prob . of failure 1− p
• Test of Significance
o LOD > 3 is the significance threshold – 1 in 1,000 Odds = 1  equal chance of success and failure
the loci are not linked
o Permutation test Odds < 1  lower chance of success
Odds > 1  higher chance of success

Maximum Likelihood Maximum Likelihood
• A test statistic for this method is:
• The likelihood for a given set of parameters
(QTL position and QTL effect) given the Max_Likelihood(reduced model)
LR = −2 ln
observed data on phenotypes and marker Max_Likelihood(full model)
genotypes The reduced model refers to the null-
• The estimates for the parameters are those hypothesis of no QTL effect.
where the likelihood are highest
• Expectation-maximization(EM) method is • The LOD score for a QTL at position c is:
used in the estimation procedure LR(c) LR(c)
LOD(c) = =
2ln10 4.61

Haley-Knott (HK) Regression HK Regression

• For each QTL position, the residual sums of
• For two markers, the model is: squares (SSE) is determined.
y = µ + αx + e • The estimate of the QTL position is where
the SSE is the minimum.
where y is the observed phenotype • Estimates an approximate likelihood ratio:
x is the P(Q|mg1,mg2,r1,r12)
 SSEreduced 
LR = n ln 
 SSE 
 full 
Extended HK Regression Which IM method to use

• An improvement of the HK regression • ML provides better estimates but analysis

is complex and computationally expensive
• Correct variance for each genotype is • HK regression is computationally faster
being used instead of a constant but estimate of the residual variance is
biased and the power of QTL detection
variance used in the HK regression may be affected (Kao et al 1999)
• Extended HK regression is not as fast as
HK but provides improved approximations
and still faster than ML
• Results are hardly different in practical

Multiple Imputation Method

Interval Mapping
• Another method available for IM
• Fills in all missing genotype data then • Advantages • Disadvantages
uses single marker ANOVA to identify o Takes proper o Increased
significant QTLS account of missing computation time
• More robust than ML but has little data o Requires specialized
advantage over the extended HK for o Allows examination software
of positions between Difficult to generalize
single QTL mapping markers
o Only considers one
• Intensive in both computation time and o Gives improved QTL at a time
memory use estimates of QTL
IM sample output

Red – EM
Blue - EHK

Interval Mapping
EM, HK, and EHK

Interval mapping
• Maximum likelihood • Calculate QTL probabilities conditional
on the available marker data.
• Needed in most mapping functions
o step – indicates step size in cM at which the
probabilities are to be calculated
o error.prob – assumed genotyping error rate
Permutation test can also be used to get
threshold value for lod scores. Note: genotyping error occurs when the
observed genotype of an individual does not
correspond to the true genotype.
Combining IM results
Interval mapping
• Extended Haley-Knott Regression

Permutation test can also be used to get

threshold value for lod scores.

Plot of combined results Composite Interval Mapping

• Performs interval mapping using a
subset of marker loci as covariates
red – em
blue - ehk
• Markers serve as proxies for other
QTLs to account for linked QTLs and
reduce residuals
• Gives greater power in identifying key
• More statistically complicated and
requires more computational power.
Steps in CIM
Sample CIM output
• Selects a set of markers to serve as
• Performs interval mapping with these Blue – EM
markers as covariates. Red - CIM

• Excludes markers at a fixed distance from

the test position.
• Calculates a LOD score comparing the model
with the putative QTL in the presence of
covariates to the model with just the

Problem with CIM

• The estimated position of the first QTL

can be influenced by the second QTL R/qtl
and vice versa, especially for linked
• The choice of covariates is critical: if Composite Interval Mapping
too many or too few markers are
chosen there will be a loss of power to
detect QTL.
Composite interval mapping

• cim(cross, pheno.col=1, n.marcovar=3,

method=c("em", "imp", "hk", "ehk"),
imp.method=c("imp", "argmax"),
error.prob=0.0001, n.perm, window=10)
o n.marcovar - number of marker covariates to use
o imp.method - method used to impute any missing
marker genotype data
o window – marker covariates will be omitted this
distance from the test postion
• add.cim.covar - Add dots at the locations
of the selected marker covariates, for a
plot of composite interval mapping results

CIM-Using permutation test Composite interval mapping

blue – em
red – cim
Sample Multiple QTL Mapping
Multiple QTL Mapping output

• Extension of interval mapping to

multiple QTLs
• Infer the location of QTLs to positions
between markers
• Investigate interactions between QTLs
• More powerful and precise in detecting
QTL (Kao et al 1999)

Other Methods used in Interval


• Bayesian Method – uses probability

theories in parameter estimations R/qtl
based on prior knowledge about the
data (R/qtlbim)
Multiple QTL Mapping
• Mixed model regression – available
in R/ASReml
Multiple QTL Mapping
Multiple QTL Mapping Displays the QTL on the genetic map

• sim.geno() is used to impute genotypes with missing

data to minimize loss of information
• makeqtl() is used to create a qtl object. It pulls out the
imputed genotypes at the selected positions
• n.gen is the number of genotypes with imputed data

Multiple QTL Mapping Multiple QTL Mapping

Not significant and may be

dropped from the model
Multiple QTL Mapping Multiple QTL Mapping

refineqtl() - Iteratively scan the positions for QTL in the

context of a multiple QTL model, to try to identify the
positions with maximum likelihood, for a fixed QTL model.

Multiple QTL Mapping