Sie sind auf Seite 1von 38

Spell-QTL User Manual

Species perscrutandis enixe locis locabuntur

Damien Leroux Sylvain Jasson

Version 0.2-alpha
Foreword
Spell-QTL is a software suite allowing to detect QTLs in any crossing design. Spell is an acronym
that stands for Species Perscrutandis Enixe Locis Locabutur. This means The characters will be
localized by zealously inspecting the loci, the very purpose of the software. If you are reading this
manual, you already know what QTL stands for.
WeDamien Leroux and Sylvain Jassonare developping Spell-QTL at MIAT, a research lab
from INRA French National Institute for Agricultural Research. We are located near Toulouse, in
the French Occitanie region.

When the project started, we were just planning to rewrite our legacy code MCQTL, using modern
C++. After decades of development and many temporary contributorsmost of them not even
reachableMCQTL was in a stage of Hydra code: every bug correction, every improvement had
a very high cost.
As the project was moving forward, and as we were thinking it would be a quick and easy release,
we discovered that some hypotheses that were made in MCQTL, were no more valid when dealing
with modern crosses (MAGIC, AIC...)
This work switched inadvertently from an engineering project to a research project, and we are
very happy about that!

Both the software and this manual are works in progress: feedback is welcome and appreciated.

Spell-QTL is available at https://mulcyber.toulouse.inra.fr/frs/?group_id=204 under the Gnu


Public License.

This manual is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

2
1. The main Spell-QTL pipeline
1.1. General view

.ped spell-pedigree

.spdat .phen
N-point POP

.gen .smdat
spell-marker spell-qtl

formats
.map QTL analysis
1-point POP

Figure 1.1.: The main Spell-QTL pipeline

Global software organization is displayed in figure 1.1.

1.2. Minimal session


A minimal session for Spell-QTL analysis is 3 commands long. For an example to run example 1
from package:

spell-pedigree -wd my_directory -n my_name -p example1.ped


spell-marker -wd my_directory -n my_name -m F2:A/B example1_F2.gen -m F2C:A/C
example1_F2C.gen -o F2,F2C
spell-qtl -wd my_directory -n my_name -P auto -p F2 example1_F2.phen -p F2C
example1_F2C.phen -gm example1.map
If you want to duplicate these commands, you must check that the input files are available to the
programs. You may want to copy them into your test directory, or use an absolute or relative path
in your command lines.

3
1.3. Software suite details
1.3.1. spell-pedigree
Computes the transition matrices for the Continuous Time Hidden Markov Models (CTHMM).
They are the Td matrices in formula 4.1 on page 24.
These computations are inherently dependent, so it can only run sequentially.
Outputs a data file that can be fed to spell-marker.

1.3.2. spell-marker
Computes the 1-point Parental Origins Probabilities using Bayesian Inference at each marker.
Each marker is independent, so it can run in various ways:
Sequentially,
Multithreaded,
Scheduling jobs on Sun Grid Engine,
Sending jobs to remote machines via ssh
Outputs a data file that can be fed to spell-qtl.
Can also output the raw (1-point) Parental Origin Probabilities.

1.3.3. spell-qtl
Performs the QTL analysis per se.
Can also output the n-point Parental Origin Probabilities along the linkage groups.
Can run most computations concurrently on a multicore computer.
Computation results are cached on disk (and/or in RAM).

1.4. Input files


1.4.1. Pedigree
File format
See spell-pedigree man page (at A.3 on page 26.)

File sample

Sample file 1.1: Pedigree (selected lines from example1.ped from three_parents_F2 example)
1 generation ; individual ; mother ; father
2 A ;1;0;0
3 B ;2;0;0
4 C ;3;0;0
5 F1 ;4;1;2
6 F1C ;5;1;3
7 F2 ;6;4;4
8 F2 ;7;4;4

4
9 F2 ;8;4;4
10 F2 ;9;4;4
11 F2 ;10;4;4
12 F2 ;11;4;4
13 F2C ;106;5;5
14 F2C ;107;5;5
15 F2C ;108;5;5
16 F2C ;109;5;5
17 F2C ;110;5;5
18 F2C ;111;5;5

Note that:
the first line is expected to be header only and will be ignored by spell-pedigree.

Only four columns are used, any additional column will be silently ignored by spell-pedigree

1.4.2. Marker observations


File format
spell-marker understand a few common formats, based on MapMaker RAW format (without traits):

A line beginning with data type followed by ignored text (e.g. line 1 in sample 1.2 on the next
page)
A line containing four integer values: number of individuals, number of markers, two ignored
values (e.g. line 2 in sample 1.2 on the following page)
A line per marker beginning with starred(*) marker name followed by a space and by allele ob-
served or inferred for each individual (a character per individual). (e.g. line 3-39 in sample 1.2
on the next page)
Build in allele code are :
02 SNP observations, where 0 and 2 are homozygous and 1 is heterozygous. These observations
type are relevant for any individual in the pedigree, including parents. spell-marker will then
perform inference of possible genotypes and inference of possible states in the CTHMM.
ABHCD MapMaker-like Parental Origin inferred observations. These are relevant for inbred lines
crosses products. Lets consider the cross A|A B|B:
The child is typed A and the allele A is not dominant. The only possible genotype is A|A.
This is encoded by the character A in MapMaker.
The child is typed A and the allele A is dominant. The possible genotype are A|A, A|B
and B|A. This is encoded by the character D in MapMaker.
The child is typed B and the allele B is not dominant. The only possible genotype is B|B.
This is encoded by the character B in MapMaker.
The child is typed B and the allele B is dominant. The possible genotype are A|B, B|A
and B|B. This is encoded by the character C in MapMaker.
The child is typed AB (the allele A and B are codominant). The possible genotype are
A|B and B|A. This is encoded by the character H in MapMaker.
The child in not typed. The possible genotypes are A|A, A|B, B|A and B|B. This is
encoded by the character - in MapMaker.

5
The parental origin letters can be overridden in the command line.
CP Outbred observations as defined in Cathagene. These observations are relevant for all known
phase situations, including cases where one parent is homozygous, when 3 or 4 different alleles
are present. Lets consider the cross A|B C|D: The possibles child genotypes are A|C, A|D,
B|C and B|D. Carthagene format actually enables the user to express any subset of the 4
different possibilities using a single hexadecimal digit (0-f).

Code Possible genotypes


1 A|C
2 A|D
3 A|C,A|D
4 B|C
5 A|C,B|C
6 A|D,B|C
7 A|C,A|D,B|C
8 B|D
9 A|C,B|D
a A|D,B|D
b A|C,A|D,B|D
c B|C,B|D
d A|C,B|C,B|D
e A|D,B|C,B|D
0 or f or - A|C,A|D,B|C,B|D

Note that CP and ABHCD formats imply user-made genotype inference. Depending on genera-
tion, spell-marker will perform further genotype inference and HMM state inference using pedigree.
Other allele code can be defined via a JSON file. (see in appendix B.5 on page 29 for format
and B.5.1 on page 30 for sample files)

File sample

Sample file 1.2: Marker alleles (example1_F2.gen from three_parents_F2 example)


1 data type F2
2 100 37 0 0
3 * M_1_1 HHBHHHHAHHAAHAABHAAHABHBBABAHAHHAHHHHAAHHHHHHHHB - HHAHHABABH ...
4 * M_1_2 HHBHHHHAHHHAHAABHAABABHHBABAHAHHAHAHHAAAHHHHHHABBHHHHHABAHH ...
5 * M_1_3 HAHHHHBAHBHAHAHBHAABAHHHBABAHAHHAAAHHAAAHAHHHHHBHHHBHHHBAHH ...
6 * M_1_4 HHHHHHBAHBHAAAHBHAAHHBHH - ABHHHH - AHAHHAAAHAHHHBHBHHABHHHBHHH ...
7 * M_1_5 HHH - AH - HHBHAAAHBHAAHHBHHB - BHHBHHAHAAAHHHHHBHHBHH - HABHHHBHHH ...
8 * M_1_6 HHHHAHHHHBHAAAHBHAAHHBHHBABHHBHHAHAAAHHHHHBHHB - HHHABHHHBHHH ...
9 * M_1_7 HHHHH - HHHBHHAAHBHHHHABA - BABHHBBHHBAAAHHHHHHHHBAHHBABAHHBHHH ...
10 * M_1_8 HHBHHHHHHBHHAHHHHHHHHHAHHABHHBHBHHAAHHHHHHHHHBAHBHABAHHHHAH ...
11 * M_1_9 HHBBHHHHHBHBAAHH - HHHHAAHHABAHBHBHHAAHHHHHHHHHBHHBAABHHAHHAA ...
12 * M_1_10 HHBBHAHHHBHBAAHHHHHHHAAHHAHAHBHBHHHAAHHHHHHBHBHHBAABHHHHHA ...
13 * M_1_11 HHBBHAHHHBHBAAAHHHHHHAAHHABAHHHHHHHAAHHHHHHBHBHHBAABHHHHHA ...
14 * M_1_12 HHBBHAHHHBHBAAAHHHHHHAAHHABAHHHHHHHAAHHHHHHBHBHHBAABHHHHHA ...
15 * M_1_13 HBBBHAHHHBHBAAHHHHHHHAAHHABAHHHHHHHAAHHHAHHBHBHHBAABHHHHHA ...
16 * M_1_14 HHBHHAHHHBHBAAHHHHHAHHAHH - AHBAHHHBHAHHHHAHHBHABHHAABHBHHHA ...
17 * M_1_15 HHBHHAHHHBHBAAHHHHH - HHAHHAAHBAHHH - HAHHHHAHABHABHHAABHBHHHH ...
18 * M_1_16 HHHHHAHHA - HBAAHHHHHAHHAHHAAHBAHHHBHAHAHHABABHABHHAABHBHHHH ...
19 * M_1_17 HHHHHAHHABHB - AHHHHHAHHAHHAAHBAHHHBHAHAHHABABHABHHAABH - HHHH ...

6
20 * M_1_18 HHHHHAHHABHBAAHHHHHAHHAH - AAHBAHHHBHAHA - HABABHABHHAABHBHHHH ...
21 * M_1_19 HBHHHHHHABBBAHHHHHHAHHHHHAABBAHHHBBHHAHAABHBHABHHAHHHBHHHH ...
22 * M_1_20 HBHHHHHAHBHBAAHHHBHAHHH - AAAB - AHHHBB - HAHAABHBHAHHHAHHHBHHBH ...
23 * M_2_1 BHHHBHHHHHBAAHHHBABH - ABBBHHBHHBAAABHHABHBHBHHBHHHHABBBAHHBH ...
24 * M_2_2 HHHHBHHHHHBAAHHHBABHAABBHHHBHHBAAABHHAHHBHHHHBHH -H - BBBAHHBA ...
25 * M_2_3 HHBHBH - HHHBAAHHHB - BHAABBHHHBHHBAAAB - HAHHBHHHHBHHHHABBBAHHBA ...
26 * M_2_4 HABHBHBHHHBAHAHHBAHHAAHBHHHBHHBAAAHHAHH - BHHHHBBABAHHBHHHHBA ...
27 * M_2_5 HABHB - BHHHBAHAHHBAHHAAHBHHHBHHBAAAHHAHHHBHHHHB - ABAHBBHHHHBA ...
28 * M_2_6 HAHABHHHHAHAHAHBBHHBHABBHHHBHHBBAAAAAHAHBHHHHHAABHHH - HBHHBH ...
29 * M_2_7 HAH - BHBHHAHAHAHBHHHBHABBHH - BHHBBAAAAAHAHBHHHHB - ABHHHBHBHHBH ...
30 * M_2_8 HAHAHABHHAAAHAHBAAHBHABBHHHBA - BBAHAAAHHHBHHHHBHHBAHAB - BHHBH ...
31 * M_2_9 AHHAHABHHAAHHHHBAAHBHA - BHBH - AHHHAH - AHHAHBHAHBBHHBA - ABHBHHBH ...
32 * M_2_10 AHAAHABHHAAHHHBBAAHBHABBHBHBAHHHAHBAHAAHBHAH - BHHBABABHBHHB ...
33 * M_2_11 AHAAHA - HHAAHHHBBAAHBHAHBHBHBHBHAAH - AHAAHB - AHBBHHHABABHBAHB ...
34 * M_2_12 AHAAHAHHHAAHHHHBA - HBAAHBHBHBHBHAAHBAHAAHBHAHBBHHHABABHBAHB ...
35 * M_2_13 AHAAHAAHHAAHHAHBAABHHAHBHBHHHBHAAHHAHAAHBBAHHBHBHABHB - BA - B ...
36 * M_2_14 ABAAHHAHHAAHHAHBAABHAHHHHBHBBBHAAHAAHAHABBAHHBHHHHBHBBHAHB ...
37 * M_2_15 ABAAHHAHHAAHHAHBAABHAHHHHBHBBBHAAHAAHAHABBAHHBHHHHBHBBHAHB ...
38 * M_2_16 ABAAHHAHHAAHHAHBAABHAHHHHBHBBBHAAHHAHHHABBAHHBHHHHBBBB - AHB ...
39 * M_2_17 ABAAHHAHHAAHHAHBAABHAHHHHBHBBBAAAHHAHHHABBAHHBHHHHBBBBHAHB ...

Note that
in line 1 F2 after data type is irrelevant for spell-marker.
in line 2 0 0 after 100 37 is irrelevant for spell-marker.

1.4.3. Genetic map


File format
One line per linkage group (space separated) :
Starred(*) name for this linkage group

Number of markers in the linkage group


Name of first marker
Series of distance in cM and name of next marker

File sample

Sample file 1.3: Genetic map (example1.map from three_parents_F2 example)


1 * ch1 18 M_1_1 7.69 M_1_2 11.26 M_1_3 9.18 M_1_4 10.86 M_1_5 0.91 M ...
2 * ch2 17 M_2_1 4.65 M_2_2 1.46 M_2_3 9.24 M_2_4 0.71 M_2_5 18.73 M_ ...

1.4.4. Trait observations


File format
As in MapMaker RAW format, without header : one line per trait beginning with starred(*) trait
name followed by space separated observations (one numerical observation per individual, - means
unobserved).

7
File sample

Sample file 1.4: Trait observations (example1_F2.phen from three_parents_F2 example)


1 * t1 6.64860236836 4.48333766263 4.63945285025 4.76820197025 3.9847...

8
2. Generated outputs
2.1. General organization
General output directory organization is :
my_directory (directory)
my_name.1-point (directory)
my_name.pedigree-and-probabilities.M_1_10.csv (text file)
...
my_name.cache (directory)
my_name.spell-marker.data (binary file)
my_name.spell-pedigree.data (binary file)
my_name.n-point (directory)
ch1 (directory)
F2 (directory)
my_name.ch1.F2.0.csv (text file)
...
F2C (directory)
my_name.ch1.F2.0.csv (text file)
...
ch2 (directory)
...
my_name.report (directory)
full_map.txt (text file)
t1 (directory)
ch1:124.34_LOD.txt (text file)
Model_Cross_ch1:114.96_ch1:124.34_ch1:150_XtX_inv.txt (text file)
Model_Cross_ch1:114.96_ch1:124.34_ch1:150_X.txt (text file)
t1_report.txt (text file)
trait_values.txt (text file)
my_name.spell-qtl.log (text file)

A common working directory must be set for all 3 executables using the commandline option
-wd my_directory or --work-directory my_directory. Every result produced during analysis
will be output into this directory.
A configuration name using commandline option -n my_name or --name my_name is used to
prefix the subdirectories and output files.
Parental Origin Probabilities are output as CSV files:
1-point one file per marker (with pedigree-like structure). Use option -O1 in command spell-marker
to output these files.

9
n-point one file per linkage group per generation per individual. Use option output-nppop in
command spell-qtl to output these files.
A report containing:
A text-mode rendering of the genetic map with the detected QTLs and their respective
confidence intervals,
The model matrix and the variance-covariance matrix for each selected set of loci used
by the detection algorithm,
The detailed final model including variance-covariance matrix, coefficients, contrasts, and
contrasts significance.
A cache of intermediate computation results.

2.2. Output files samples


2.2.1. 1-point POP
Sample file 2.1: 1-point output file (selected lines from file my_name.pedigree-and-probabilities.M_1_1.csv
in directory my_directory/my_name.1-point/ )
1 Gen ; Id ; P1 ; P2 ; Prob
2 A ;1;0;0;{" aa ": 1}
3 B ;2;0;0;{" bb ": 1}
4 C ;3;0;0;{" cc ": 1}
5 F1 ;4;1;2;{" ab ": 1}
6 F1C ;5;1;3;{" ac ": 1}
7 F2 ;6;4;4;{" ba ": 0.5 , " ab ": 0.5}
8 F2 ;7;4;4;{" ba ": 0.5 , " ab ": 0.5}
9 F2 ;8;4;4;{" bb ": 1}
10 F2 ;9;4;4;{" ba ": 0.5 , " ab ": 0.5}
11 F2 ;10;4;4;{" ba ": 0.5 , " ab ": 0.5}
12 F2 ;11;4;4;{" ba ": 0.5 , " ab ": 0.5}
13 F2C ;106;5;5;{" ca ": 0.5 , " ac ": 0.5}
14 F2C ;107;5;5;{" ca ": 0.5 , " ac ": 0.5}
15 F2C ;108;5;5;{" cc ": 1}
16 F2C ;109;5;5;{" cc ": 1}
17 F2C ;110;5;5;{" cc ": 1}
18 F2C ;111;5;5;{" aa ": 1}

Note that you must use the -O1 commandline option in spell-marker in order to generate these
files (see B.4.4 on page 29.)

2.2.2. n-point POP


Sample file 2.2: n-point output file (my_name.ch1.F2.0.csv in directory
my_directory/my_name.n-point/ch1/F2/ )
1 markers ; M_1_1 ;;;;;;;; M_1_2 ;;;;;;;;;;;; M_1_3 ;;;;;;;;;; M_1_4 ;;;;;;;;;;; M ...
2 locus ;0;1;2;3;4;5;6;7;7.69;8.69;9.69;10.69;11.69;12.69;13.69;14.69;15....
3 aa ;0;0.00132637;0.0022548;0.00278679;0.00292319;0.00266421;0.00200944;...
4 ab ;0.5;0.498674;0.497745;0.497213;0.497077;0.497336;0.497991;0.499042;...
5 ba ;0.5;0.498674;0.497745;0.497213;0.497077;0.497336;0.497991;0.499042;...
6 bb ;0;0.00132637;0.0022548;0.00278679;0.00292319;0.00266421;0.00200944;...

10
Note that you must use the output-nppop processing option in spell-qtl in order to generate
these files (see C.4.5 on page 33.)

2.2.3. full map


The special text file named full_map.txt is produced at the root of the my name.report directory.
The full genetic map is drawn using text characters. Any detected QTL (for any trait under study)
is inserted in this map with its confidence interval (figure 2.1).

Figure 2.1.: screen capture of less -RS full_map.txt in an ad hoc resized terminal. Chromosome
names are printed in light blue. The chromosome and the marker names are in white.
The detected QTL and its confidence interval are added in green, labeled with trait
name (t1), @, QTL position (135.34) and confidence interval ([129.246:140.84])

Note that you must use the cat or less -RS command in order to see it properly. more command,
less command or your favorite text editor may fail to read special characters.

2.2.4. Trait by trait reports


For every trait under analysis, a report directory is generated (this directory name is the trait name).
Within this directory the report file itself is named after the trait name followed by _report.txt
This file is divided in several parts. For the sample file 2.3 they are :
General information (lines 1-7) Trait name and what was detected (QTLs positions and confidence
intervals)
R2 (lines 12-20) Part of the variance explained by each QTL
Coefficients (lines 22-54) Cross effects and QTLs allele effects are displayed with their estimated
vairiance-covariance matrix

Contrasts (lines 57-79 and 81-103) Spell-QTL computes all tractable contrasts and tests their sig-
nificances. Spell-QTL displays a contrasts section for each comparable effect group.
Final Model (lines 105-325) The final linear model for this detection is then displayed in human
readable form. (It is available in computer readable form in files trait_values.txt and
Model_*X.txt)

Final Model inversion (lines 328-351) (X t X)1 matrix from linear model solving in human read-
able form. (It is available in computer readable form in file Model_*_XtX_inv.txt).

Sample file 2.3: sample report file (t1_report.txt in directory my_directory/my_name.report/t1 )


1 ======================================================================...
2 Report for single trait t1
3 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -...

11
4
5 QTL detected on chromosome ch1 at 135.34 cM with confidence interval {1...
6
7 ======================================================================...
8 t1
9 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
10
11
12 |
13 R2 |
14 ___ |
15
16 | Cross | ch1 :135.34
17 | |
18 - -+ - - - - - - - - -+ - - - - - - - - - - -
19 | 0.04109 | 0.8187
20
21
22 |
23 Coefficients |
24 _____________ |
25
26 | Cross | ch1 :135.34
27 | F2 F2C | A B A C
28 - -+ - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
29 | 4.982 2.623 | 2.566 -2.566 4.234 -4.234
30
31
32 |
33 Covariance matrix |
34 __________________ |
35
36 | Cross | ch1 :135.34
37 | F2 F2C | A B A C
38 - - - - - - -+ - - - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
39 C | |
40 r F2 | 3.834 0 | -1.903 -1.892 0 0
41 o F2C | 0 3.833 | 0 0 -1.896 -1.899
42 s | |
43 s | |
44 - - - - - - -+ - - - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
45 c | |
46 h | |
47 1 | |
48 : A | -1.903 0 | 0.9791 0.9182 0 0
49 1 B | -1.892 0 | 0.9182 0.9791 0 0
50 3 A | 0 -1.896 | 0 0 0.976 0.9213
51 5 C | 0 -1.899 | 0 0 0.9213 0.976
52 . | |
53 3 | |
54 4 | |
55

12
56
57 |
58 Contrasts |
59 __________ |
60
61 | ch1 :135.34
62 | A B
63 - - - - -+ - - - - - - - - - - - - - - - - - - - -
64 A | -5.13 ***
65 B | 5.13 ***
66
67 Significance codes : 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
68
69 |
70 Contrast significance |
71 ______________________ |
72
73 | ch1 :135.34
74 | A B
75 - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - -
76 A | 6.3 e -47% ***
77 B | 6.3 e -47% ***
78
79 Significance codes : 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
80
81 |
82 Contrasts |
83 __________ |
84
85 | ch1 :135.34
86 | A C
87 - - - - -+ - - - - - - - - - - - - - - - - - - - -
88 A | -8.47 ***
89 C | 8.47 ***
90
91 Significance codes : 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
92
93 |
94 Contrast significance |
95 ______________________ |
96
97 | ch1 :135.34
98 | A C
99 - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
100 A | 1.7 e -142% ***
101 C | 1.7 e -142% ***
102
103 Significance codes : 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1
104
105 ======================================================================...
106 Final model
107 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -...

13
108
109
110 | Trait | Cross | ch1 :135.34
111 | 1 | F2 F2C | A B A C
112 - -+ - - - - - - - - - -+ - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
113 | 6.649 | 1 0 | 1 1 0 0
114 | 4.483 | 1 0 | 0.5863 1.414 0 0
115 | 4.639 | 1 0 | 1 1 0 0
116 | 4.768 | 1 0 | 1 1 0 0
117 | 3.985 | 1 0 | 1 1 0 0
118 | 7.791 | 1 0 | 1.555 0.4451 0 0
119 | 5.916 | 1 0 | 1 1 0 0
120 | 3.347 | 1 0 | 1.414 0.5863 0 0
121 | 9.834 | 1 0 | 1.555 0.4451 0 0
122 | -0.9893 | 1 0 | 0.03139 1.969 0 0
123 | 2.853 | 1 0 | 1 1 0 0
124 | -0.4524 | 1 0 | 0.03139 1.969 0 0
125 | 8.769 | 1 0 | 1.969 0.03139 0 0
126 | 8.731 | 1 0 | 1.969 0.03139 0 0
127 | 4.66 | 1 0 | 1 1 0 0
128 | 4.621 | 1 0 | 1 1 0 0
129 | 6.387 | 1 0 | 1 1 0 0
130 | 4.99 | 1 0 | 0.5863 1.414 0 0
131 | 4.289 | 1 0 | 1 1 0 0
132 | 9.582 | 1 0 | 1.969 0.03139 0 0
133 | 5.367 | 1 0 | 1 1 0 0
134 | 4.095 | 1 0 | 1 1 0 0
135 | 7.92 | 1 0 | 1.555 0.4451 0 0
136 | 6.286 | 1 0 | 1 1 0 0
137 | 3.758 | 1 0 | 1.418 0.5819 0 0
138 | 10.43 | 1 0 | 1.969 0.03139 0 0
139 | 11.6 | 1 0 | 1.969 0.03139 0 0
140 | 0.1829 | 1 0 | 0.5863 1.414 0 0
141 | -0.9951 | 1 0 | 0.1975 1.803 0 0
142 | 10.33 | 1 0 | 1.969 0.03139 0 0
143 | 5.104 | 1 0 | 1 1 0 0
144 | 5.916 | 1 0 | 1 1 0 0
145 | 4.667 | 1 0 | 1 1 0 0
146 | -0.9623 | 1 0 | 0.03139 1.969 0 0
147 | 2.2 | 1 0 | 0.5863 1.414 0 0
148 | 9.357 | 1 0 | 1.803 0.1975 0 0
149 | 7.095 | 1 0 | 1 1 0 0
150 | 10.01 | 1 0 | 1.969 0.03139 0 0
151 | 4.211 | 1 0 | 1 1 0 0
152 | 6.529 | 1 0 | 1.414 0.5863 0 0
153 | 11.12 | 1 0 | 1.969 0.03139 0 0
154 | 0.1361 | 1 0 | 0.03139 1.969 0 0
155 | 8.818 | 1 0 | 1.555 0.4451 0 0
156 | -1.573 | 1 0 | 0.03139 1.969 0 0
157 | 5.478 | 1 0 | 1 1 0 0
158 | 9.169 | 1 0 | 1.969 0.03139 0 0
159 | 1.602 | 1 0 | 0.4451 1.555 0 0

14
160 | 5.495 | 1 0 | 1 1 0 0
161 | 5.705 | 1 0 | 1 1 0 0
162 F | 8.979 | 1 0 | 1.969 0.03139 0 0
163 2 | 9.615 | 1 0 | 1.555 0.4451 0 0
164 | 0.6673 | 1 0 | 0.4451 1.555 0 0
165 | 6.689 | 1 0 | 1 1 0 0
166 | -1.704 | 1 0 | 0.03139 1.969 0 0
167 | 6.398 | 1 0 | 1 1 0 0
168 | 6.254 | 1 0 | 1 1 0 0
169 | 4.482 | 1 0 | 0.5863 1.414 0 0
170 | 4.357 | 1 0 | 1 1 0 0
171 | 10.95 | 1 0 | 1.969 0.03139 0 0
172 | 9.739 | 1 0 | 1.555 0.4451 0 0
173 | 4.582 | 1 0 | 1.414 0.5863 0 0
174 | 9.671 | 1 0 | 1.969 0.03139 0 0
175 | 3.116 | 1 0 | 0.5863 1.414 0 0
176 | 4.863 | 1 0 | 1 1 0 0
177 | 6.31 | 1 0 | 1 1 0 0
178 | 8.927 | 1 0 | 1.555 0.4451 0 0
179 | 0.8143 | 1 0 | 0.03139 1.969 0 0
180 | 6.742 | 1 0 | 1 1 0 0
181 | 1.307 | 1 0 | 0.03139 1.969 0 0
182 | 5.316 | 1 0 | 1 1 0 0
183 | 5.528 | 1 0 | 1 1 0 0
184 | 3.012 | 1 0 | 1 1 0 0
185 | 9.804 | 1 0 | 1.969 0.03139 0 0
186 | 3.965 | 1 0 | 0.4451 1.555 0 0
187 | 2.709 | 1 0 | 0.4451 1.555 0 0
188 | 6.506 | 1 0 | 1.969 0.03139 0 0
189 | 6.592 | 1 0 | 1 1 0 0
190 | 1.019 | 1 0 | 0.4451 1.555 0 0
191 | 10.09 | 1 0 | 1.969 0.03139 0 0
192 | 4.32 | 1 0 | 1 1 0 0
193 | 5.579 | 1 0 | 1.414 0.5863 0 0
194 | -0.2688 | 1 0 | 0.03139 1.969 0 0
195 | 5.615 | 1 0 | 1 1 0 0
196 | 3.622 | 1 0 | 0.5863 1.414 0 0
197 | 0.9091 | 1 0 | 0.4451 1.555 0 0
198 | 6.558 | 1 0 | 1 1 0 0
199 | 8.116 | 1 0 | 1.414 0.5863 0 0
200 | 6.486 | 1 0 | 1 1 0 0
201 | 6.282 | 1 0 | 1 1 0 0
202 | 5.598 | 1 0 | 1.555 0.4451 0 0
203 | 5.106 | 1 0 | 1 1 0 0
204 | 5.119 | 1 0 | 1 1 0 0
205 | 4.022 | 1 0 | 1 1 0 0
206 | 4.781 | 1 0 | 1.414 0.5863 0 0
207 | 6.796 | 1 0 | 1.141 0.8589 0 0
208 | 5.876 | 1 0 | 1 1 0 0
209 | 9.718 | 1 0 | 1.969 0.03139 0 0
210 | 8.922 | 1 0 | 1.555 0.4451 0 0
211 | 2.874 | 1 0 | 1 1 0 0

15
212 | 7.235 | 1 0 | 1.414 0.5863 0 0
213 - -+ - - - - - - - - - -+ - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
214 | 0.3284 | 0 1 | 0 0 1 1
215 | 5.107 | 0 1 | 0 0 1 1
216 | 4.257 | 0 1 | 0 0 0.5863 1.414
217 | 4.68 | 0 1 | 0 0 0.5863 1.414
218 | -0.914 | 0 1 | 0 0 0.5863 1.414
219 | -5.741 | 0 1 | 0 0 0.03139 1.969
220 | 2.577 | 0 1 | 0 0 1 1
221 | -5.421 | 0 1 | 0 0 0.03139 1.969
222 | 14.08 | 0 1 | 0 0 1.969 0.03139
223 | -2.639 | 0 1 | 0 0 0.2008 1.799
224 | 10.55 | 0 1 | 0 0 1.969 0.03139
225 | -2.687 | 0 1 | 0 0 0.5863 1.414
226 | 3.741 | 0 1 | 0 0 1 1
227 | 3.777 | 0 1 | 0 0 1 1
228 | 11.72 | 0 1 | 0 0 1.969 0.03139
229 | -3.799 | 0 1 | 0 0 0.5863 1.414
230 | 2.45 | 0 1 | 0 0 1 1
231 | 0.3365 | 0 1 | 0 0 1 1
232 | 0.7089 | 0 1 | 0 0 1 1
233 | -1.176 | 0 1 | 0 0 1 1
234 | 1.837 | 0 1 | 0 0 1 1
235 | -5.121 | 0 1 | 0 0 0.03139 1.969
236 | -0.8553 | 0 1 | 0 0 1 1
237 | 2.297 | 0 1 | 0 0 1.55 0.45
238 | 8.949 | 0 1 | 0 0 1 1
239 | -3.697 | 0 1 | 0 0 0.03139 1.969
240 | -5.037 | 0 1 | 0 0 0.03139 1.969
241 | -5.091 | 0 1 | 0 0 0.5863 1.414
242 | -0.8961 | 0 1 | 0 0 1 1
243 | 11.3 | 0 1 | 0 0 1.969 0.03139
244 | 10.78 | 0 1 | 0 0 1.969 0.03139
245 | 13.3 | 0 1 | 0 0 1.555 0.4451
246 | 3.935 | 0 1 | 0 0 1 1
247 | 1.769 | 0 1 | 0 0 1 1
248 | -4.48 | 0 1 | 0 0 0.03139 1.969
249 | 5.643 | 0 1 | 0 0 1 1
250 | 4.379 | 0 1 | 0 0 1 1
251 | -0.2102 | 0 1 | 0 0 0.5863 1.414
252 | -2.577 | 0 1 | 0 0 0.5863 1.414
253 | -5.633 | 0 1 | 0 0 0.03139 1.969
254 | 8.377 | 0 1 | 0 0 1.969 0.03139
255 | 2.554 | 0 1 | 0 0 1 1
256 | 0.8266 | 0 1 | 0 0 1 1
257 | 10.67 | 0 1 | 0 0 1.969 0.03139
258 | 4.602 | 0 1 | 0 0 1 1
259 | 0.5988 | 0 1 | 0 0 1 1
260 | -0.5299 | 0 1 | 0 0 1 1
261 | -2.743 | 0 1 | 0 0 0.5863 1.414
262 F | 12.24 | 0 1 | 0 0 1.969 0.03139
263 2 | 1.469 | 0 1 | 0 0 1 1

16
264 C | 6.326 | 0 1 | 0 0 1 1
265 | 8.279 | 0 1 | 0 0 1 1
266 | 5.631 | 0 1 | 0 0 1 1
267 | 0.918 | 0 1 | 0 0 1 1
268 | -5.382 | 0 1 | 0 0 0.03139 1.969
269 | -1.858 | 0 1 | 0 0 0.5863 1.414
270 | 4.569 | 0 1 | 0 0 1.555 0.4451
271 | -2.331 | 0 1 | 0 0 0.4451 1.555
272 | -1.063 | 0 1 | 0 0 0.4451 1.555
273 | -3.628 | 0 1 | 0 0 0.5863 1.414
274 | 7.607 | 0 1 | 0 0 1.555 0.4451
275 | 9.786 | 0 1 | 0 0 1.969 0.03139
276 | -8.351 | 0 1 | 0 0 0.03198 1.968
277 | 9.109 | 0 1 | 0 0 1.969 0.03139
278 | -0.04689 | 0 1 | 0 0 1 1
279 | 4.278 | 0 1 | 0 0 1 1
280 | 1.762 | 0 1 | 0 0 0.8589 1.141
281 | -2.486 | 0 1 | 0 0 1 1
282 | 8.925 | 0 1 | 0 0 1.555 0.4451
283 | 7.806 | 0 1 | 0 0 1.555 0.4451
284 | 8.988 | 0 1 | 0 0 1.414 0.5863
285 | 8.475 | 0 1 | 0 0 1.969 0.03139
286 | 7.536 | 0 1 | 0 0 1.414 0.5863
287 | -5.944 | 0 1 | 0 0 0.03198 1.968
288 | -4.18 | 0 1 | 0 0 0.4451 1.555
289 | 11.21 | 0 1 | 0 0 1.969 0.03139
290 | -1.041 | 0 1 | 0 0 1 1
291 | 8.283 | 0 1 | 0 0 1.418 0.5819
292 | 0.1551 | 0 1 | 0 0 1 1
293 | -2.024 | 0 1 | 0 0 0.4451 1.555
294 | 0.5139 | 0 1 | 0 0 1 1
295 | 6.886 | 0 1 | 0 0 1.414 0.5863
296 | 6.178 | 0 1 | 0 0 1.555 0.4451
297 | -5.845 | 0 1 | 0 0 0.03139 1.969
298 | 11.15 | 0 1 | 0 0 1.969 0.03139
299 | 12.44 | 0 1 | 0 0 1.969 0.03139
300 | 4.912 | 0 1 | 0 0 1 1
301 | -1.146 | 0 1 | 0 0 1 1
302 | 0.667 | 0 1 | 0 0 0.5863 1.414
303 | -3.129 | 0 1 | 0 0 0.03139 1.969
304 | -7.047 | 0 1 | 0 0 0.03198 1.968
305 | 12.43 | 0 1 | 0 0 1.969 0.03139
306 | 4.702 | 0 1 | 0 0 1 1
307 | 3.553 | 0 1 | 0 0 1 1
308 | 3.177 | 0 1 | 0 0 1 1
309 | -0.7692 | 0 1 | 0 0 0.4451 1.555
310 | -1.318 | 0 1 | 0 0 0.5863 1.414
311 | -1.074 | 0 1 | 0 0 0.4451 1.555
312 | 0.3149 | 0 1 | 0 0 1 1
313 | -2.766 | 0 1 | 0 0 0.03139 1.969
314 - -+ - - - - - - - - - -+ - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
315 C | | |

17
316 O | | |
317 N | | |
318 S | | |
319 T | 0 | 0 0 | 1 1 0 0
320 R | 0 | 0 0 | 0 0 1 1
321 A | | |
322 I | | |
323 N | | |
324 T | | |
325 S | | |
326
327
328 ======================================================================...
329 XtX ^ -1
330 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -...
331
332
333 | Cross | ch1 :135.34
334 | F2 F2C | A B A C
335 - - - - - - -+ - - - - - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
336 C | |
337 r F2 | 1.01 0 | -0.5014 -0.4986 0 0
338 o F2C | 0 1.01 | 0 0 -0.4995 -0.5005
339 s | |
340 s | |
341 - - - - - - -+ - - - - - - - - - - - - - - - - - -+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
342 c | |
343 h | |
344 1 | |
345 : A | -0.5014 0 | 0.258 0.242 0 0
346 1 B | -0.4986 0 | 0.242 0.258 0 0
347 3 A | 0 -0.4995 | 0 0 0.2572 0.2428
348 5 C | 0 -0.5005 | 0 0 0.2428 0.2572
349 . | |
350 3 | |
351 4 | |

18
3. Detection model
During detection we will consider many linear models and perform many analysis of variance (aka
ANOVA).

3.1. Complete model


The more general form of the models is :
X X X
i N yi = ci + P (i, g, h)g,h,ci + vi v + i (3.1)
hH gG(h) vV

Where :

yi is the value of the trait for individual i;


N is the set of individuals under study, its cardinality |N| is the total number of individual in
the design under study;
ci is the cross to which individual i belongs (let be C the set of crosses in the design, i
N, ci C, its cardinality |C| is the total number of crosses in the design);
ci is the estimated mean value of the trait for this cross;
H is the set of genetic explicative factors (mainly H is a set QTLs, or a set of sets of QTLs
when looking for epistasy);

G(h) is the set of possible genotypes for factor h (if h is a unit set h = { l } that is we address
a single QTL, then G(h) = G({ l }) is the set of possible genotypes at locus l);
P (i, g, h) is the probability that individual i has genotype g on h;
g,h,ci is the phenotypic value of genotype g on h in genetic background ci (see section 3.3 on
the following page for more information about the hypothesis we can make about ) ;
v is a user-defined explicative variable;
V is the set of user-defined explicative variables (this set can be empty);
vi is the value of variable v for individual i;

v is the effect of variable v;


i is the modeling error for individual i.
Lower case letter values yi , ci and vi are user provided, upper case letter value P (i, g, h) is computed
from markers score and genetic map (see chapter 4 on page 24), bold upper case letters N, C, H, G
and V are sets, Greek letters values c , g,h,c , v and i are computed by the linear model engine.
The sets N, C and V are user defined, our main goal is to find the best H (G is H-dependent).

19
3.2. Fitting and testing models
The lineal model engine minimizes the residual sum of squares RSS = iN 2i under constraints 3.1
P
on the preceding page for any tested set H. Note that adding more elements to the tested set H
will automatically reduce RSS since it allows minimization over a wider space.
Therefore the reduction of RSS cannot be our rule for model selection. We must switch the
question from "Does the new model display a smaller RSS?" to "Is the RSS reduction of the new
model worth the degrees of freedom it costs ?".
In order to compare two nested models we will then use F-test, that is a measurement of the
this compromise between goodness of fit and degrees of freedom cost. (To be fully honest, when we
compare to models for pleiotropic effects the residual errors i wont be a vector but a matrix, its
squared sum over individuals wont be number but a vector, and therefore we must use some kind
of log-likelihood ratio, and a 2 test with appropriate degree of freedom will do the job.)
Lets now suppose that we have current model, and we wander if some new extended model is
better. Let dfcurrent (resp. dfnew ) be the number of degree of freedom for the current model (resp.
the new model). This number of degree of freedom is the number of independent parameters in the
fitted model, that is the number of independent values the linear model engine has to choose for c ,
g,h,c and v . The test value is then:
RSScurrent RSSnew
dfnew dfcurrent
F (Hnew , Hcurrent ) = RSSnew
(3.2)
|N|dfnew

The higher the value F (Hnew , Hcurrent ) is, the better is the new model.
The typical use of such a score is to address the question of adding a new locus in the model. Every
locus along the genome may then be tested. ( If a tested locus is already in H, RSScurrent = RSSnew
its score is 0). Moreover the amount of degrees of freedom involved in 3.2 can change along the
genome, with lack of independence of loci or with similarity of parents. It is then wiser to use the
probability that a Fischer distribution (F) with these degrees of freedom is greater than the actual
F value. Our score is then
score(Hnew , Hcurrent ) = log10 P [F(dfnew dfcurrent , |N| dfnew ) > F (Hnew , Hcurrent )] (3.3)
(We pvrefer positive and not so great values, so the log10 provides a monotonic and aesthetic
transformation that makes this score prettier).
Unfortunately we have no analytic expression of the distribution of its maximum along a genome
under the null hypothesis, and can not establish a critical value for its significativity at a chosen risk
level. So well use a resampling method by permutation to access an empiric value of its quantile.
This permutations are performed on the fly by the software and their result is saved for potential
reuse. Dont be surprised if the very same analysis takes longer time at first attempt.

3.3. Phenotypic values


As you can see in formula 3.2 the difference of degrees of freedom between the two models has an
important effect on the test value. Therefore it is a good idea not to waste them. Some hypothesis on
the effects allow phenotypic values simplification and degrees of freedom reduction. Well review some
possible hypothesis. Lets suppose that in our study population a set P of alleles are possible. Typi-
cally P = { P 1, P 2 }, |P| = 2 for a simple cross between two parents, P = { A, B, C, D, E, F, G, H },
|P| = 8 in MAGIC.

3.3.1. Single locus effects


Remember that H is a set of explicative factors h. Each explicative factor is a set of one or more
loci.

20
In the simplest models, each QTL has its own effect without any interaction. That is: every
explicative factor contains exactly one locus. Every h = { l } is a unit set, that is |h| = 1. Possible
genotypes are then allele pairs g = (g1 , g2 ) P2 .

Additive and connected model


The simplest model is the additive and connected one. Each allele has its own effect without any
dominance, nor interaction with genetic background.
g,h,ci = (g1 ,g2 ),{ l },ci = g1 ,l + g2 ,l (3.4)
Most of the time, the degree of freedom used by adding a new QTL to such a model is the number
of distinct possible allele minus one dfnew dfcurrent = p 1.

Dominance and connected model


This model is much like the previous one. The only difference is that interaction between allele are
possible.
g,h,ci = (g1 ,g2 ),{ l },ci = g1 ,l + g2 ,l + g1 ,g2 ,l (3.5)
Note that gk P, gk ,gk ,l = 0: dominance exists only when distinct alleles are involved.
Most of the time, the degree of freedom used by adding a new QTL to such a model is the number
of distinct possible allele minus one plus the number of possible interactions dfnew dfcurrent =
p 1 + p(p1)
2 = (p1)(p+2)
2 . Some design do not allow to detect such effect (e.g. Back-cross),
spell-qtl is able to detect such cases, and will then discard any dominance effect, even if asked.

Additive and disconnected model


A different enrichment of additive and connected model is allowing interaction with genetic back-
ground.
g,h,ci = (g1 ,g2 ),{ l },ci = g1 ,l + g2 ,l + g1 ,l,ci + g2 ,l,ci (3.6)
This is equivalent to enrich the model by saying that additive affects are genetic background
dependent. That is g,l,c
dis
i
= g,l + g,l,ci .

g,h,ci = (g1 ,g2 ),{ l },ci = gdis


1 ,l,ci
+ gdis
2 ,l,ci
(3.7)
Most of the time, the degree of freedom used by adding a new QTL to such a model is the sum of
degrees of freedom that would have been needed by any cross in the design addressed alone. In the
worst case dfnew dfcurrent = (p 1) |C|.
Of course if there is only one cross |C| = 1 this is exactly the same as the additive and connected
model. More surprisingly, some complex design such as star designs lead to identical model either
connected or disconnected.

Dominance and disconnected model


This is the richest single locus model. Both additive and dominance effects are possible, and every
effect has interactions with genetic background.
g,h,ci = (g1 ,g2 ),{ l },ci = g1 ,l + g2 ,l + g1 ,g2 ,l + g1 ,l,ci + g2 ,l,ci + g1 ,g2 ,l,ci (3.8)
One more time this is equivalent to have a model with background dependent effects g,l,c
dis
i
=
g,l + g,l,ci , and gdis
1 ,g2 ,l,ci
= g1 ,g2 ,l + g1 ,g2 ,l,ci

g,h,ci = (g1 ,g2 ),{ l },ci = gdis


1 ,l,ci
+ gdis
2 ,l,ci
+ gdis
1 ,g2 ,l,ci
(3.9)
Both remarks about the previous model are true again. The number of degree of freedom for each
added QTL in the worst case is : dfnew dfcurrent = (p1)(p+2)
2 |C|.

21
3.3.2. Epistasis between two loci
Addressing epistasis limited to two loci interactions means that the explicative factors in H are
either unit sets or two elements sets. That is h H, |h| { 1, 2 }. The phenotypic values for
every single locus explicative factor h are the same as in previous section. Well focus in this section
on phenotypic values when addressing exactly 2 QTLs: h = { l1 , l2 }. Possible genotypes are then
4-uple of allele : genotype at locus l1 is (g11 , g21 ), genotype at locus l2 is (g12 , g22 ), overall genotype is
g = (g11 , g21 , g12 , g22 ).
Note that the correct use of formula 3.1 on page 19 requires that P (i, g, h) is the joint probablity
of g on h. If loci l1 and l2 are far enough from each other (or even better are on different linkage
groups) this probability can be computed as a product of single locus probabilities. But in general
case a true joint probability must be computed (especially in order to avoid ghost QTLs). Spell-qtl
can compute such a probability (see chapter 4 on page 24)
The joint two locus phenotypic effect can be expressed as the sum of the direct terms (the pheno-
typic effect of each locus) and an interaction term.

Additive Additive model


In this first epistatsis model, the single QTL direct term follows the additive connected model, and
there is interaction only between additive effects.

(g11 ,g21 ,g12 ,g22 ),{ l1 ,l2 },ci = (g11 ,g21 ),{ l1 } + (g12 ,g22 ),{ l2 }
(3.10)
+ (g11 ,g12 ),{ l1 ,l2 } + (g11 ,g22 ),{ l1 ,l2 } + (g21 ,g12 ),{ l1 ,l2 } + (g21 ,g22 ),{ l1 ,l2 }

3.4. Simplest possible test


If we address only direct additive effects of a unique QTL in a unique biparental population, the
phenotypic values in the equation 3.1 on page 19 can be simplified :

no QTL
H=V=
The linear model with only cross mean is:

i N yi = + H=
i (3.11)
X
RSSH= = H=
i (3.12)
iN

The only independant parameter is the mean therefore:

pH= = 1 (3.13)

a unique QTL
V = ; H = { { l } } that is the set H contains only one explicative element and this element contains
only one locus.

H={ { l } }
i N yi = + 2P (AA, l)A,l + P (AB, l)(A,l + B,l ) + 2P (BB, l)B,l + i
H={ { l } }
(3.14)
= + [2P (AA, l) + P (AB, l)]A,l + [P (AB, l) + 2P (BB, l)]B,l + i

22
H={ { l } }
X
RSSH={ { l } } = i (3.15)
iN

We have two independant parameters: the mean and the contrast between alleles A,l B,l

pH={ { l } } = 2 (3.16)

F-test
The test in order to decide if we accept or not accept the QTL at locus l well first compute F value
:
RSSH= RSSH={ { l } }  
pH={ { l } } pH= RSSH=
F (H = { { l } } , H = ) = RSSH={ { l } }
= (|N| 2) 1 (3.17)
RSSH={ { l } }
|N|pH={ { l } }

And then the score

score(H = { { l } } , H = ) = log10 P [F(1, |N| 2) > F (H = { { l } } , H = )] (3.18)

23
4. Underlying parental origin probability
model
The probabilities in equation 3.1 on page 19 are computed on the fly by spell-qtl when necessary.
A hidden Markov chain representing the origin of the genetic material is derived for any population
in the pedigree by spell-pedigree. Depending on the number of different parents in the design
and the number of non-independent meiosis the minimal number of states that allow to enjoy the
Markov property can be much more than the number of possible genotypes. Spell-pedigree ensures
that this number is the smallest possible using lumping operations as soon as it is possible.
At any observed marker, the probabilities of these states are inferred by spell-marker. Note that
as soon as non clone population are involved, some information is gathered from siblings in order to
infer states probabilities. (See figure 1.1 on page 3 for more information about information exchange
between parts of the software).
Suppose the locus of interest q being preceded by nl observed loci and followed by nr observed
loci, the loci sequence being [lnl , ..., l2 , l1 , q, r1 , r2 , ..., rnr ]. The parental origin probability vector at
locus q is then :

p(q|Ml1 , ..., Mlnl , Mr1 , ..., Mrnr ) =


nYl 1 r 1
! ! nY
! !
Tql1 Mli Tli li+1 Mlnl Tr1 q Mri Tri+1 ri Mrnr 1
i=1 i=1
(4.1)
l 1

nY r 1
! ! nY
! !

Tql1 Mli Tli li+1 Mlnl Tr1 q Mri Tri+1 ri Mrnr 1


i=1 i=1

Where :

Td is the transition matrix of the Markov chain (as a function of genetic distance d). It is
computed by spell-pedigree;
is the steady state vector associated with transition matrix Td , 1 its component wise
inverse ;

Ml is the observation matrix at locus l. It is inferred from observations at locus l by spell-


marker;
Equation 4.1 can be applied recursively in order to compute multi-loci joint probabilities. The
code in spell-qtl allows these joint probabilities computations when needed.

24
Appendices

25
A. spell-pedigree man page
A.1. NAME
spell-pedigree -- Precompute the Markov Models for a pedigree

A.2. SYNOPSIS
spell-pedigree [-h] [-wd PATH ] -n NAME [-s CHAR] -p FILE

A.3. DESCRIPTION
spell-pedigree computes Markov Models representing the evolution of the genotype on all the
individuals in a pedigree.
It outputs a data file that can be used with spell-marker to compute the Parental Origin Prob-
abilities for this pedigree given allelic or genotype observations on a set of markers.

-wd,--work-directory PATH Path to directory for cache files and outputs. Defaults to the current
directory.
-n,--name NAME User-friendly name for this configuration
-p, --pedigree-file FILE Path to the genetic map file.
The expected pedigree file must be a CSV file with each row in the following format:
GENERATION_NAME ; Individual number ; Parent1 number ; Parent2 number
Any additional column will be silently ignored by spell-pedigree.
Individual numbers are expected to increase and all GREATER than zero, and parent numbers
for a given individual are expected to be LESSER than the individual number.
Breeding lines are encoded with Parent1 = Parent2 = 0.
Selfings are encoded with Parent1 = Parent2.
Doubled haploids are encoded with Parent2 = 0.
The generation names will be used when specifying genotype and phenotype observations in
the later steps.
The first line is expected to be a header line and will be ignored.

A.4. OPTIONS
-h, --help Display usage.
-s,--separator CHAR Column delimiter character used in the pedigree file. Defaults to ;.

A.5. OUTPUT
spell-pedigree will create a file named NAME .spell-pedigree.data in the directory
WORK_DIRECTORY/NAME.cache.

26
A.6. EXAMPLES
See spell-qtl-examples (1) for complete examples of the spell-qtl pipeline.

A.7. SEE ALSO


spell-marker (1), spell-qtl (1), spell-qtl-examples (1).

27
B. spell-marker man page
B.1. NAME
spell-marker -- Compute the 1-point Parental Origin Probabilities in a pedigree given genotype or
allelic observations

B.2. SYNOPSIS
spell-marker [options. . . ] [-wd PATH ] -n NAME -m GEN:FORMAT FILE [-m. . . ]

B.3. DESCRIPTION
spell-marker computes the 1-point Parental Origin Probabilities using Bayesian inference.
It outputs a data file that can be used with spell-qtl to compute the n-point Parental Origin
Probabilities and perform the actual QTL analysis.
spell-marker expects that spell-pedigree has been run beforehand with the same working
directory and configuration name.
Because each marker is supposed to be independent, spell-marker can perform the computations
in parallel in a variety of ways. See the Job control subsection of the options for details.

-wd,--work-directory PATH Path to directory for cache files and outputs. Defaults to the current
directory.
-n,--name NAME User-friendly name for this configuration
-m,--marker-obs GEN:FORMAT FILE Path to the marker observations file of generation GEN
with given format FORMAT. This file must have as many individuals as the pedigree has for
that generation.
spell-marker knows three marker observation formats by default. Bi-allelic SNP observations
encoded as 0, 1, 2 (02 ), bi-parental genotype observations as in the Mapmaker format (AB-
HCD), and phased outbred parental observations as in carthagene (CP ). You can define other
formats using the -mos option.
You can direct spell-marker to use only a slice of an observation file using the following
syntaxes:
FILE :single_column_index
FILE :first_column_index:last_column_index
When using genotype observations in a pedigree with more than two ancestors, you
can specify the format for each generation as Parent1_letter/Parent2_letter or Par-
ent1_generation/Parent2_generation. The format will be ABHCD with a and b replaced
with the corresponding letters.

28
B.4. OPTIONS
B.4.1. Miscellaneous
-h, --help Display usage.
-z,--noise level Set the noise level for marker observations. Defaults to 0.

B.4.2. Job control


Select and configure the job control scheme

-mt,--dispatch-multithread n_threads Use single-machine, multi-threading.


-ssh,--dispatch-SSH HOSTS Use SSH for job dispatch. HOSTS is a comma-separated list of host-
names. spell-marker expects to find the same file system structure on all hosts.
-sge,--dispatch-SGE n_jobs qsub options Use SGE for job dispatch. Use - for qsub options if
you dont wish to provide any specific option.

B.4.3. Inputs
Input files and configuration of observations. There are two essential parameters to compute the
genotype probabilities: the number of ancestors and the number of observed alleles (for SNP obser-
vations). The number of ancestors is automatically computed from the given pedigree or breeding
design specification. The number of alleles is computed from the marker observation specifications.

-mos,--marker-observation-spec path Path to a marker observation specification file.


-o,--output-generations comma-separated list Specifies the list of variables to extract after the
computation. The state probabilities for all individuals in the given generations will be ex-
tracted and made available for spell-qtl. Defaults to all generations.

B.4.4. Output modes


Set the output mode. By default, only the population data file will be written. If you specify -O1,
only the 1-point Parental Origin Probabilities will be written, unless you also specify -Op.

-Op,--output-population-data Output the population data file for use in spell-qtl. This is the
default behaviour. The output will be named NAME .spell-marker.data in the directory
WORK_DIRECTORY/NAME.cache.
-O1,--output-one-point-prob Output the 1-point Parental Origin Probabilities. This will disable
the output of the population data file unless -Op is also explicitly used.

B.5. MARKER OBSERVATION FORMAT SPECIFICATION


A format specification file is a JSON object (dictionary) where each key is a format name. Each
corresponding value is a JSON object containing the following keys:

domain either allele or ancestor.


alphabet-from a string containing all the characters (alleles or ancestor letters) that can be ob-
served.
scores an object where each key is an observation and each value an array of all the possible
genotype/allelic pairs it encompasses.

29
B.5.1. Example: the 02, ABHCD, and CP formats
{
"02": {
" domain ": " allele " ,
" alphabet_from ": "01" ,
" scores ": {
"0": ["00"] ,
"1": ["01" , "10"] ,
"2": ["11"] ,
" -": ["00" , "01" , "10" , "11"]
}
},
" ABHCD ": {
" domain ": " ancestor " ,
" alphabet_from ": " ab " ,
" scores ": {
" A ": [" aa "] ,
" H ": [" ab " , " ba "] ,
" B ": [" bb "] ,
" -": [" aa " , " ab " , " ba " , " bb "] ,
" C ": [" ab " , " ba " , " bb "] ,
" D ": [" aa " , " ab " , " ba "]
}
},
" CP ": {
" domain ": " ancestor " ,
" alphabet_from ": " abcd " ,
" scores ": {
"0": [" ac " , " ad " , " bc " , " bd "] ,
"1": [" ac "] ,
"2": [" ad "] ,
"3": [" ac " , " ad "] ,
"4": [" bc "] ,
"5": [" ac " , " bc "] ,
"6": [" ad " , " bc "] ,
"7": [" ac , ad " , " bc "] ,
"8": [" bd "] ,
"9": [" ac " , " bd "] ,
" A ": [" ad " , " bd "] ,
" B ": [" ac " , " ad " , " bd "] ,
" C ": [" bc " , " bd "] ,
" D ": [" ac " , " bc " , " bd "] ,
" E ": [" ad " , " bc " , " bd "] ,
" F ": [" ac " , " ad " , " bc " , " bd "] ,
" a ": [" ad " , " bd "] ,
" b ": [" ac " , " ad " , " bd "] ,
" c ": [" bc " , " bd "] ,
" d ": [" ac " , " bc " , " bd "] ,
" e ": [" ad " , " bc " , " bd "] ,
" f ": [" ac " , " ad " , " bc " , " bd "] ,
" -": [" ac " , " ad " , " bc " , " bd "]

30
}
}
}

B.6. EXAMPLES
See spell-qtl-examples (1) for complete examples of the spell-qtl pipeline.

B.7. SEE ALSO


spell-pedigree (1), spell-qtl (1), spell-qtl-examples (1).

31
C. spell-qtl man page
C.1. NAME
spell-qtl Compute n-point Parental Origin Probabilities and perform QTL analysis on modern
genetic datasets.

C.2. SYNOPSIS
spell-qtl [options. . . ] [-wd PATH ] -n NAME -gm MAP -p GEN TRAITS [-p. . . ] [model and
algorithms configuration. . . ]

C.3. DESCRIPTION
spell-qtl computes the n-point Parental Origin Probabilities along the linkage groups using the data
provided by spell-marker.
spell-qtl expects that spell-pedigree and spell-marker have been run beforehand with the
same working directory and configuration name.

-wd,work-directory PATH Path to directory for cache files and outputs. Defaults to the current
directory.
-n,name NAME User-friendly name for this configuration

-gm,genetic-map MAP Path to the genetic map file


-p,population GEN TRAITS Specify a new population (dataset) to work on.
GEN is the name of the phenotyped generation. The TRAITS path must point to a sin-
gle_trait observation file with the same number of individuals as defined for the given gener-
ation in the pedigree for this population.

C.4. OPTIONS
C.4.1. Miscellaneous
-v,version Display version and exit
-h,help Display usage and exit
-N,notes TEXT Optional free text
-P,parallel N_CORES Setup parallel computations (number of cores to use or auto). Defaults
to 0.
clean Clears all cached files in the specified working directory (the -wd parameter MUST appear
before clean).
-a,ansi Use ANSI escape sequences to display colors and realtime progress information at the top
of the terminal. Enabled by default only if output is on a terminal.
-na,no-ansi Dont use ANSI escape sequences, dont display colors or realtime progress information.

32
-rj,join-radius DISTANCE Specify the maximum distance from a selected locus to compute joint
probabilities. Default is 10.
-rs,skip-radius DISTANCE Specify the maximum distance from a selected locus to skip tests.
Default is 1.

C.4.2. Input datasets


The following configures the construction of the linear model. The following specifies the datasets you
want processed. A dataset specification starts with argument -p, followed by one or more arguments
-m.

-gm,-genetic-map PATH Path to the genetic map file.


-p,-population QTL_generation_name PHEN_PATH Specify a new population (dataset) to
work on.

C.4.3. Model options


The following configures the construction of the linear model.

connected Select connected mode. Disabled by default.


In connected mode, the same ancestors in two datasets share the same column in the linear
model.

C.4.4. Working set options


The following configures the analysis domain.

lg LINKAGE GROUP NAMES Specify the list of linkage groupe to study.


covar COVARIABLE NAMES Specify the list of covariables to put in the model.
traits TRAIT NAMES Specify the list of traits to analyse.
pleiotropy PLEIOTROPIC_TRAIT_NAME TOLERANCE TRAIT_NAMES Specify a
pleiotropic trait. This trait will be added to the list of traits to analyze. 1.e-3 is a
good default value for TOLERANCE.

C.4.5. Processing options


The following configures the QTL analysis. The standard pipeline is: 1. skeleton creation 2. cofactors
detection 3. QTLs detection 4. effects estimation

output-nppop Compute the n-point Parental Origin Probabilities and exit. The results will be
written under the WORK_DIRECTORY/NAME.n-point directory.
qtl-threshold-permutations VALUE Set the number of permutations to compute the QTL threshold
value in automatic mode. Default is 10000.

qtl-threshold-quantile VALUE Set the quantile value in range [0:1] to select the QTL threshold
value in automatic mode. Default is 0.05.
qtl-threshold-value single_trait=value,. . . Set the QTL threshold value manually for some traits.
If not specified, will be automatically computed using the above settings.
cofactor-threshold single_trait=value,. . . Set the cofactor threshold value manually for some
traits. Defaults to value of QTL threshold * .9.

33
cofactor-exclusion-window DISTANCE Set the half-size (in cM) of the exclusion window around
cofactors. No detection will be performed inside this window. Defaults to 30.

step VALUE Step size in cM. Defaults to 1.


lod-support VALUE LOD support value. Defaults to 1.
skeleton MODE marker,. . . OR distance Setup the cofactor detection skeleton. Mode can be
either manual, auto or none.
If manual, specify a comma-separated marker list. If auto, specify the minimum interval
between markers in cM.
By default, mode is auto and interval is 20.
cofactor-detection ALGORITHM Specify the cofactor detection algorithm. Available algorithms
are forward, backward, none, and all. Default is forward.

initial-selection SELECTION Specify the initial selection of QTLs for the detection algorithm.
The selection is a comma-separated list of CHROMOSOME:POSITION values. Setting an
initial selection overrides and cancels skeleton generation and cofactor detection.
QTL-detection ALGORITHM Specify the QTL detection algorithm. Available algorithms are none,
CIM, CIM-, iQTLm, and iQTLm-GW. The default algorithm is iQTLm.

C.5. EXAMPLES
See spell-qtl-examples (1) for complete examples of the spell-qtl pipeline.

C.6. SEE ALSO


spell-pedigree (1), spell-qtl (1), spell-qtl-examples (1).

34
D. spell-qtl-examples man page
D.1. NAME
spell-qtl-examples -- Example datasets for the spell-qtl software suite.

D.2. DESCRIPTION
The datasets are located in /usr/share/spell-qtl/examples if it was installed system-wide, or
share/spell-qtl/examples in the installation directory.
The example datasets each consist in a set of files : - the pedigree in a .ped file, - the genetic map
in a .map file, - one or more genotypic or allelic observation files in .gen files, - one or more sets of
single_trait observations in .phen files.
Additionally, a README file in each directory describes the dataset features and the commands
to run to process them.

D.3. SEE ALSO


spell-pedigree (1), spell-marker (1), spell-qtl (1)

35
List of Sample files
1.1. Pedigree (.ped input file) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2. Marker alleles (.gen input file) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3. Genetic Map (.map input file) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4. Trait observations (.phen input file) . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1. 1-point output file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10


2.2. n-point output file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3. report file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

36
Contents
1. The main Spell-QTL pipeline 3
1.1. General view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2. Minimal session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3. Software suite details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1. spell-pedigree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2. spell-marker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.3. spell-qtl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4. Input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.1. Pedigree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4.2. Marker observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4.3. Genetic map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.4. Trait observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2. Generated outputs 9
2.1. General organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2. Output files samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1. 1-point POP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2. n-point POP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.3. full map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2.4. Trait by trait reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3. Detection model 19
3.1. Complete model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2. Fitting and testing models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3. Phenotypic values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.1. Single locus effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3.2. Epistasis between two loci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.4. Simplest possible test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4. Underlying parental origin probability model 24

Appendices 25

A. spell-pedigree man page 26


A.1. NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.2. SYNOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.4. OPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.5. OUTPUT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
A.6. EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
A.7. SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

B. spell-marker man page 28


B.1. NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
B.2. SYNOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
B.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

37
B.4. OPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
B.4.1. Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
B.4.2. Job control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
B.4.3. Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
B.4.4. Output modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
B.5. MARKER OBSERVATION FORMAT SPECIFICATION . . . . . . . . . . . . . . . 29
B.5.1. Example: the 02, ABHCD, and CP formats . . . . . . . . . . . . . . . . . . . 30
B.6. EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
B.7. SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

C. spell-qtl man page 32


C.1. NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
C.2. SYNOPSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
C.3. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
C.4. OPTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
C.4.1. Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
C.4.2. Input datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
C.4.3. Model options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
C.4.4. Working set options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
C.4.5. Processing options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
C.5. EXAMPLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
C.6. SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

D. spell-qtl-examples man page 35


D.1. NAME . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
D.2. DESCRIPTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
D.3. SEE ALSO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

38

Das könnte Ihnen auch gefallen