Sie sind auf Seite 1von 10

Forensic Science International 119 (2001) 1±10

Sequence variation in humans and other primates at six short


tandem repeat loci used in forensic identity testing
Katherine Lazaruk*, Jeanette Wallin, Cydne Holt,
Theresa Nguyen, P. Sean Walsh
Applied Biosystems, 850 Lincoln Centre Drive, M/S 404/1, Foster City, CA 94404, USA
Received 28 September 1999; received in revised form 14 August 2000; accepted 15 August 2000

Abstract

A large number of alleles from the six different short tandem repeat (STR) loci FGA, D3S1358, vWA, CSF1PO, TPOX and
TH01, used in human identity testing were sequenced to provide support for the robustness of ¯uorescent STR DNA typing by
allele size. Sequence information for some of these loci (FGA, vWA, TH01) is an extension of published work, whereas no
extensive sequence information is available with respect to the D3S1358, CSF1PO, and TPOX loci. Sequencing of alleles at
each locus has provided quantitative data with respect to the true nucleotide length of common alleles, and of alleles that vary
in length from the common alleles. All alleles that were identi®ed as ``off-ladder'' alleles through ¯uorescent typing at these
STR loci have proven to be true length variant alleles. Sequencing at the D3S1358 and CSF1PO loci allowed for the
establishment of a common nomenclature for these loci. A correlation between percent stutter and the length of the core
tandem repeat is demonstrated at the FGA locus. Alleles in which the core tandem repeat is interrupted by a repeat unit of
different sequence have a reduced percent stutter. DNA samples from three non-human primates (chimpanzee, orangutan, and
gorilla) were compared to the human sequences, and shown to differ markedly across loci with respect to their homology. The
effects of primer binding site mutations on the ampli®cation ef®ciency at a particular locus, and methods used to interpret
ampli®cation imbalance of heterozygous alleles at a locus is also addressed. # 2001 Elsevier Science Ireland Ltd. All rights
reserved.

Keywords: STR; Tetranucleotide; Sequencing; Forensic; Primate

1. Introduction at a single locus (versus biallelic markers) making inter-


pretation of mixed DNA samples practicable [4].
Short tandem repeats (STRs) have become the preferred Different electrophoretic platforms allow for discrete
genetic loci for forensic identity testing for the following single base pair separation of ¯uorescently labelled DNA
reasons: (1) they have a high degree of polymorphism in fragments [5]. The quantitative nucleotide length differences
human populations [1]; (2) they are amenable to analysis by in the ampli®ed DNA fragments are the basis for the allele
the polymerase chain reaction (PCR), which allows for the designation. Genotyping is accomplished by comparing
analysis of minute and/or degraded DNA samples [2]; and the unknown samples to an allelic ladder and the use of
(3) the intermediate number of alleles (5±15 common alleles software that allows for accurate and ef®cient typing of the
per locus) keeps the locus size range small enough for multi- samples [6].
locus PCR ampli®cation, and also minimizes the chance for Underlying the length differences at an STR locus, some
preferential ampli®cation [3]. Multiple alleles per locus can sequence variation is expected simply due to the frequency
also translate into a relatively higher power of discrimination of sequence variation in human populations. For example,
single nucleotide polymorphisms (SNPs) are estimated to
*
Corresponding author. Tel.: ‡1-650-638-5486; occur in the human genome at a frequency of approximately
fax: ‡1-650-638-6222. 1 in 500±1000 nucleotides [7,8]. When other types of
E-mail address: klazaruk@fc.celera.com (K. Lazaruk). sequence variation, for example STRs, larger insertions or

0379-0738/01/$ ± see front matter # 2001 Elsevier Science Ireland Ltd. All rights reserved.
PII: S 0 3 7 9 - 0 7 3 8 ( 0 0 ) 0 0 3 8 8 - 1
2 K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10

deletions, etc. are included, the frequency of sequence used as template for the ®rst round of PCR ampli®cation in a
variation is estimated to be 1 in 250±300 nucleotides [9]. 50 ml reaction mixture comprised of 1X AmpFlSTR11 PCR
Therefore, one would expect to ®nd some sequence variation reaction mix (Applied Biosystems, Foster City, CA), 0.4 mM
within the stretches of between 100 and 350 nucleotides that each primer, and 4 units AmpliTaq Gold12 DNA Polymer-
are being ampli®ed at an STR locus. It is important to stress ase. Samples were ampli®ed in a GeneAmp1 PCR System
that sequence variation within a tetranucleotide repeat is not 9600 (Applied Biosystems). Cycling parameters, particu-
detected by the methods used for ¯uorescent STR genotyp- larly annealing temperatures, varied depending on the tem-
ing and, therefore, not categorized in the allele frequency plate and primers used. Typical ampli®cation parameters
estimates used for forensic testing. Accurate and informative were: 11 min enzyme activation at 958C followed by 30
genotypes are obtained based on length differences, even cycles of denaturation at 948C (1 min), annealing at 588C
when sequence variation exists in the core STR repeat unit (1 min) and extension at 728C (1 min).
(or the ¯anking sequence) between a sample allele and an The PCR product (5 ml) was mixed with 5 ml of forma-
allele in the allelic ladder [5,10,11]. mide loading solution (5:1 deionized formamide:blue dex-
Sequencing of alleles at STR loci used in forensic identity tran in 25 mM EDTA) and heat denatured. Denatured
testing was undertaken for a variety of reasons: (1) to study product was loaded on to a 6.5% acrylamide, 7.5 M urea
off-ladder alleles; (2) for assistance in validation of the STR gel and run at 40 W for 2 h in 1 TBE buffer. DNA bands
loci chosen; (3) to establish a consistent nomenclature for were visualized by silver staining, excised from the gel with
new loci, e.g. D3S1358; (4) to investigate if percent stutter is a razor blade, and placed into a 0.5 ml microfuge tube
correlated with allele length; and (5) to begin to examine the containing 40 ml of TE buffer (10 mM Tris, pH 8.0;
species speci®city of STR allele sequences. 0.1 mM EDTA). The 0.5 ml tube containing the excised
band was then heated to 858C for 10±15 min. A 5 ml aliquot
of the recovered PCR product was reampli®ed (second-
2. Materials and methods round PCR), using the same primers as used for the ®rst-
round PCR, for 19±21 cycles of denaturation at 948C (30 s),
2.1. Sample preparation annealing at 58±638C (30 s) and extension at 728C (30 s).

Extracted human genomic DNA samples were obtained 2.2.1. Sequencing reactions
from a variety of laboratory sources. Some genomic DNA Direct link sequencing (sequencing of diluted PCR pro-
samples were extracted from bloodstains, hair roots, or duct, with no clean-up procedure) was performed on the
buccal scrapings following the procedure as described pre- second-round PCR product. These products were diluted 1:5
viously using Chelex1100 Resin [12]. Extracted primate with deionized water and sequenced directly using the ABI
DNAs were obtained from BIOS Laboratories (New Haven, PRISM1 21M13 and M13 Reverse Dye Primer Ready
CT). Reaction Mix with AmpliTaq1 DNA Polymerase FS kits
(Applied Biosystems).
2.2. PCR ampli®cation The pooled, precipitated samples were loaded on to a 5%
Long Ranger1 (FMC Corporation, Rockland, ME) gel
The PCR primers used to generate sequencing template at prepared in 1 TBE buffer, and analyzed on an ABI
each of the loci were designed, in most cases, to ¯ank the PRISM1 377 DNA Sequencer. Electrophoresis was carried
primer binding region that is utilized to amplify the tetra- out at 3000 V, 518C for 3.5 h (Module 4XA). Sequences
nucleotide repeats of STR loci. These primers were designed were analyzed using the ABI PRISM1 Sequencing Analysis
with M13 universal primer sequence at the 50 ends. The 2.1.2 or 3.0 Software, and aligned using Sequence Naviga-
forward primer was tailed with the 21M13 universal primer tor1 (Applied Biosystems).
sequence (underlined below), and the reverse primer was
tailed with the M13 reverse universal sequence (underlined
below). For example, the vWA primers used to generate 3. Results
sequencing template were the following:
The alleles selected for sequence analysis at any locus
Forward: TGT AAA ACG ACG GCC AGT GTT CCC
were not selected at random, so the number of sequence
ACC TTG CAG AAG
Reverse: CAG GAA ACA GCT ATG ACC ATA GGA
1
TAG ATG ATA GAT ACA AGG G AmpFlSTR Blue, Profiler, and Profiler Plus are trademarks
and ABI Prism, AmpFlSTR, Applied Biosystems and Sequence
Both the 21M13 and M13 reverse 50 tails were used on each Navigator are registered trademarks of PE Corporation or its
template in order to allow sequencing of both DNA strands. subsidiaries in the United States and certain other countries.
Generation of sequencing template involved two rounds 2
AmpliTaq, AmpliTaq Gold, and GeneAmp are registered
of PCR with band isolation prior to the second round, as trademarks of Roche Molecular Systems, Inc. All other trademarks
described below. Extracted genomic DNA (2±20 ng) was are owned by their respective owners.
K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10 3

variants represented in the results tables are not a re¯ection therefore, all differ from the consensus sequence [13]. Many
of the proportion of sequence variants expected in a random 2 bp length variant alleles were sequenced and all were
sampling of any human population group. The method for shown to have the same insertion, namely a TT dinucleotide
selection of alleles for sequencing varied considerably following the TTTT TTCT stretch on the sense strand.
between loci. Some early collaborators contributed length The chimpanzee, orangutan, and gorilla FGA alleles are
variant alleles (e.g. FGA, D3S1358, TH01) that had been all homologous to the human sequence before and after the
typed as off-ladder alleles in their laboratories. The FGA repeat structure, but differ signi®cantly from human, and
locus also had alleles with interesting stutter percent values, from each other, in their core repeat structure. The chim-
so a disproportionate number of FGA alleles were sequenced panzee FGA allele structure is the least complex and closest
to complement the stutter percent data at the vWA locus that in structure to human.
has been published previously [11]. Many vWA alleles were
sequenced due to the identi®cation of sequence variation in 3.2. D3S1358
the primer binding regions at this locus. Samples were
arbitrarily named, depending on the source of the sample. The D3S1358 alleles have a compound repeat unit struc-
ture based on the tetranucleotide TCTR, where R represents
3.1. FGA purines, either A or G [14,15]. Most of the alleles differ in
size by one complete tetranucleotide repeat unit.
There are two distinct groupings of FGA alleles based on The D3S1358 alleles in the AmpFlSTR BlueTM Allelic
size, those in the allele 16±34.2 size range, and those in the Ladder contain from 12 to 19 tandem TCTA and/or TCTG
allele 42.2±51.2 size range. We have not encountered any repeat units, and are therefore designated as alleles 12±19
alleles between 34.2 and 42.2 to date, but it is likely that (Table 2). The repeat motif was designated in our laboratory
these alleles will exist in some individuals. All of the FGA prior to the publication of the most recent ISFH recommen-
alleles can also be divided into two groups based on dations, so the nomenclature will remain as described in this
sequence similarities. All of the shorter FGA alleles (16± paper [16±19].
30) have the [CTCC (TTCC)2] repeat motif in common at
the 30 end, whereas all of the alleles larger than FGA30 in 3.3. vWA
Table 1 have a [(CTTC)3±4(CTTT)3 CTCC (TTCC)4] at the
30 end of the repeat motif. The very large FGA alleles (42.2± The vWA alleles have the compound repeat unit structure
51.2) have a CTGT repeat unit of variable length (1±5 TCTR, where R represents A or G [14]. Only TCTA
repeats) interrupting the long (CTTT)n repeat sequence. (TCTG)3±4 (TCTA)n have been reported to date [11,20±
9947A is DNA from a female human cell line and is the 22] (Table 3). The primate vWA sequencing results are
control DNA used in AmpFlSTR BlueTM, AmpFlSTR similar to those reported previously [23,24]. The chimpan-
Green ITM, AmpFlSTR Pro®lerTM, and AmpFlSTR Pro®ler zee and gorilla vWA alleles have a very similar repeat motif
PlusTM PCR ampli®cation kits. 9947A allele 24 was found to to the human alleles.
contain a CTTT ! CTCC change in the 21st repeat unit. During vWA primer development, it was observed that a
The consensus sequence of a 24 allele has a single CTCC in small number of samples in an African-American population
the 22nd repeat unit, one repeat later. Thus, a duplication of database had a conspicuous peak height imbalance of het-
the CTCC unit that resides after the long stretch of CTTT erozygous alleles, under the ampli®cation conditions used in
core repeats probably occurred, rather than a 2 bp change in the AmpFlSTR1 kits (Fig. 1A, 59 and 62.58C Tanneal,
the core repeat itself. This is the only example of this respectively). A peak height balance of 70% (where the
particular mutation that we have found to date. Since this height of the lower amplitude peak is divided by the height
is DNA from a transformed cell line, it is unknown if this of the higher amplitude peak) is typically observed for high
mutation actually exists in human genomic DNA. quality, single-source heterozygous samples [25]. To inves-
B1 allele 26, two of the 27 alleles, and all of the 28 and 29 tigate the cause of the allele imbalance in these African-
alleles sequenced here have a CCTT tetranucleotide unit American samples, the alleles were sequenced. The results
(therefore a T ! C transition) interrupting the long (CTTT)n revealed a C ! T transition in the reverse primer-binding
tandem repeat. The CCTT unit is followed by the same region at a position 4 nucleotides from the 30 end of the
sequence [(CTTT)5 CTCC (TTCC)2] in all of the alleles reverse primer (Fig. 2). A degenerate reverse primer was
sequenced. It is interesting to note that this transition has then synthesized that contained both of the reverse primer
only been seen in alleles with greater than or equal to 26 sequences, i.e. 50 - - - - -TAGAT-30 and 50 - - - - -TGGAT-30 .
repeat units, suggesting that the mutation arose after expan- When this degenerate primer was used to amplify the
sion of the core CTTT repeat unit. previously imbalanced samples, the peak height balance
The samples C49 and C100 (FGA27 alleles) have a between the two alleles was restored; i.e. both alleles were
CTTT ! GTTT base change in the 23rd repeat unit. This ampli®ed with the same ef®ciency (Fig. 1B). This mutation
mutation is different from the allele 27 mutation noted was therefore the cause of the poor ampli®cation ef®ciency
above. The four FGA27 alleles sequenced in this study, of the allele containing the altered sequence when using the
4 K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10

Table 1
Allele nomenclature, number of alleles sequenced, sequence structure of the repeat region, and sample names of alleles at the FGA locus

Allele nc Repeat motif Sample name

Consensus (TTTC)3 TTTT TTCT (CTTT)n CTCC (TTCC)2 [13]


16a 1 (TTTC)3 TTTT TTCT (CTTT)8 CTCC (TTCC)2 FSS13
17 1 (TTTC)3 TTTT TTCT (CTTT)9 CTCC (TTCC)2
18 1 (TTTC)3 TTTT TTCT (CTTT)10 CTCC (TTCC)2 CEPH1340/033
18.2 1 (TTTC)3 TTTT TT (CTTT)11 CTCC (TTCC)2 BBD289
19 5 (TTTC)3 TTTT TTCT (CTTT)11 CTCC (TTCC)2
20 2 (TTTC)3 TTTT TTCT (CTTT)12 CTCC (TTCC)2 RCMP1, 2
20.2 1 (TTTC)3 TTTT TT (CTTT)13 CTCC (TTCC)2 RCMP3
21 2 (TTTC)3 TTTT TTCT (CTTT)13 CTCC (TTCC)2 B1 and B7
21.2a 1 (TTTC)3 TTTT TT (CTTT)14 CTCC (TTCC)2 BBD127
22 6 (TTTC)3 TTTT TTCT (CTTT)14 CTCC (TTCC)2
22.2 1 (TTTC)3 TTTT TT (CTTT)15 CTCC (TTCC)2 BBD229
22.3a,b 1 (TTTC)3 TTTT TTCT (CTTT)8 CTT (CTTT)6 CTCC (TTCC)2 BBD172
23 5 (TTTC)3 TTTT TTCT (CTTT)15 CTCC (TTCC)2
24 4 (TTTC)3 TTTT TTCT (CTTT)16 CTCC (TTCC)2
24 1 (TTTC)3 TTTT TTCT (CTTT)15 (CTCC)2 (TTCC)2 9947A
24.2 1 (TTTC)3 TTTT TT (CTTT)17 CTCC (TTCC)2 RCMP4
25 4 (TTTC)3 TTTT TTCT (CTTT)17 CTCC (TTCC)2
26 1 (TTTC)3 TTTT TTCT (CTTT)18 CTCC (TTCC)2 KL
26 1 (TTTC)3 TTTT TTCT (CTTT)12 CCTT (CTTT)5 CTCC (TTCC)2 B1
26.2a 2 (TTTC)3 TTTT TT (CTTT)19 CTCC (TTCC)2 RCMP4, 5
27 2 (TTTC)3 TTTT TTCT (CTTT)13 CCTT (CTTT)5 CTCC (TTCC)2 B7
27 2 (TTTC)3 TTTT TTCT (CTTT)17 GTTT CTTT CTCC (TTCC)2 C49 and C100
28 2 (TTTC)3 TTTT TTCT (CTTT)14 CCTT (CTTT)5 CTCC (TTCC)2 B25
29 2 (TTTC)3 TTTT TTCT (CTTT)15 CCTT (CTTT)5 CTCC (TTCC)2 H73
30a 1 (TTTC)3 TTTT TTCT (CTTT)22 CTCC (TCC)2 H25
30.2 2 (TTTC)4 TTTT TT (CTTT)14 (CTTC)3 (CTTT)3 CTCC (TTCC)4 BBD365
31.2a 1 (TTTC)4 TTTT TT (CTTT)15 (CTTC)3 (CTTT)3 CTCC (TTCC)4 FSS1
32.2a 1 (TTTC)4 TTTT TT (CTTT)16 (CTTC)3 (CTTT)3 CTCC (TTCC)4 FSS2
33.2a 1 (TTTC)4 TTTT TT (CTTT)17 (CTTC)3 (CTTT)3 CTCC (TTCC)4 FSS3
34.2a 1 (TTTC)4 TTTT TT (CTTT)18 (CTTC)3 (CTTT)3 CTCC (TTCC)4 CFS1
42.2a 1 (TTTC)4 TTTT TT (CTTT)12 (CTGT)3 (CTTT)11 (CTTC)3 (CTTT)3 CTCC (TTCC)4 FSS4
43.2a 1 (TTTC)4 TTTT TT (CTTT)13 (CTGT)3 (CTTT)11 (CTTC)3 (CTTT)3 CTCC (TTCC)4 FSS5
44.2a 1 (TTTC)4 TTTT TT (CTTT)13 (CTGT)4 (CTTT)10 (CTTC)4 (CTTT)3 CTCC (TTCC)4 FSS6
45.2a 1 (TTTC)4 TTTT TT (CTTT)10 (CTGT)5 (CTTT)13 (CTTC)4 (CTTT)3 CTCC (TTCC)4 FSS7
46.2 1 (TTTC)4 TTTT TT (CTTT)13 (CTGT)2 (CTTT)14 (CTTC)4 (CTTT)3 CTCC (TTCC)4 FSS8
47.2a 1 (TTTC)4 TTTT TT (CTTT)13 CTGT (CTTT)16 (CTTC)4 (CTTT)3 CTCC (TTCC)4 FSS9
48.2a 1 (TTTC)4 TTTT TT (CTTT)14 (CTGT)3 (CTTT)14 (CTTC)4 (CTTT)3 CTCC (TTCC)4 FSS10
50.2a 1 (TTTC)4 TTTT TT (CTTT)13 (CTGT)4 (CTTT)16 (CTTC)4 (CTTT)3 CTCC (TTCC)4 B34
50.2 1 (TTTC)4 TTTT TT (CTTT)14 (CTGT)4 (CTTT)15 (CTTC)4 (CTTT)3 CTCC (TTCC)4 FSS11
51.2a 1 (TTTC)4 TTTT TT (CTTT)16 (CTGT)2 (CTTT)16 (CTTC)4 (CTTT)3 CTCC (TTCC)4 FSS12
Chimpanzee short 1 TTT (CTTT)14 CTCC (TTCC)3
Chimpanzee long 1 TTT (CTTT)16 CTCC (TTCC)3
Orangutan short 1 (TTTC)2 TCTT TCT CTTT (CTTTT)3 (CTTT)6 CTGT CCTT(CTTT)2 CTCC
Orangutan long 1 (TTTC)2 TCTT TCT CTTT (CTTTT)3 (CTTT)10 CTGT CCTT(CTTT)2 CTCC
Gorilla short 1 (TTTC)4 TC (CTTT)2 CT (CTTT)13 T CT (CTTT)4 CTCC (TTCC)2
Gorilla long 1 (TTTC)4 TC (CTTT)2 CT (CTTT)16 T CT (CTTT)4 CTCC (TTCC)2
a
Previously unpublished length variants.
b
Florida Department of Law Enforcement.
c
(n ˆ 73).
K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10 5

Table 2
Allele nomenclature, number of alleles sequenced and sequence structure of the repeat region at the D3S1358 locusa

Allele nb Repeat motif

Consensus TCTA (TCTG)13(TCTA)n [15]


11 1 TCTA (TCTG)2 (TCTA)8
12 1 TCTA TCTG (TCTA)10
13 3 TCTA (TCTG)2 (TCTA)10
14 3 TCTA (TCTG)2 (TCTA)11
15 3 TCTA TCTG (TCTA)13
15 4 TCTA (TCTG)2 (TCTA)12
15.2a 1 TCTA (TCTG)3 TC (TCTA)11
16 1 TCTA TCTG (TCTA)14
16 5 TCTA (TCTG)2 (TCTA)13
17 3 TCTA (TCTG)2 (TCTA)14
17 3 TCTA (TCTG)3 (TCTA)13
18 2 TCTA (TCTG)3 (TCTA)14
19 1 TCTA (TCTG)3 (TCTA)16
Chimpanzee short 1 TCTA TCTG (TCTA)11 TATA
Chimpanzee long 1 TCTA TCTG (TCTA)12 TATA
Orangutan short 1 TCTA TTTA (CCTA)2 (TCTA)5 (TCTG)2 (TCTA)2
Orangutan long 1 TCTA TTTA (CCTA)2 (TCTA)8 (TCTG)2 (TCTA)2
Gorilla short 1 TCTA TCTG TTTA (TCTA)12
Gorilla long 1 TCTA TCTG TTTA (TCTA)6 TCTC (TCTA)6
a
Centre for Forensic Science, Toronto, Canada.
b
(n ˆ 37).

Table 3
Allele nomenclature, number of alleles sequenced and sequence structure of the repeat region at the vWA locus

Allele nb Repeat motif

Consensus TCTA (TCTG)3±6 (TCTA)n [14]


11 1 TCTA (TCTG)3 (TCTA)7 TCCA TCTA TCCA TCCA
12 1 TCTA (TCTG)4 (TCTA)7 TCCA TCTA TCCA TCCA
13 3 TCTA (TCTG)4 (TCTA)8 TCCA TCTA TCCA TCCA
13 1 TCTA (TCTG)4 (TCTA)8 TCTA TCTA TCCA TCCA
13 1 TCTA (TCTG)4 (TCTA)8 TCTA TCTA TCCA TCTA
140 3 TCTA TCTG TCTA (TCTG)4 (TCTA)3 TCCA (TCTA)3 (TCCA)4
15 5 TCTA (TCTG)4 (TCTA)10 TCCA TCTA TCCA TCCA
15 1 TCTA (TCTG)4 (TCTA)10 TCTA TCTA TCCA TCTA
15 1 TCTA (TCTG)3 (TCTA)11 TCCA TCTA TCCA TCCA
16 4 TCTA (TCTG)4 (TCTA)11 TCCA TCTA TCCA TCCA
17 1 TCTA (TCTG)3 (TCTA)13 TCCA TCTA TCCA TCCA
17 5 TCTA (TCTG)4 (TCTA)12 TCCA TCTA TCCA TCCA
18a 1 TCTA (TCTG)5 (TCTA)12 TCCA TCTA TCCA TCCA
18 7 TCTA (TCTG)4 (TCTA)13 TCCA TCTA TCCA TCCA
18 1 TCTA (TCTG)3 (TCTA)14 TCCA TCTA TCCA TCCA
19a 1 TCTA (TCTG)6 (TCTA)12 TCCA TCTA TCCA TCCA
19 4 TCTA (TCTG)4 (TCTA)14 TCCA TCTA TCCA TCCA
20a 1 TCTA (TCTG)5 (TCTA)14 TCCA TCTA TCCA TCCA
21 1 TCTA (TCTG)4 (TCTA)16 TCCA TCTA TCCA TCCA
Chimpanzee short 1 TCTA TCTG (TCTA)2 TCTG (TCTA)9 TCCA TCCA
Chimpanzee long 1 TCTA TCTG (TCTA)2 TCTG (TCTA)11 TCCA TCCA
Orangutan 1 TCGA (TCTA)3 TCTG (TCTA)5 TCCA
Gorilla short 1 TCTA TCTG (TCTA)3 TCTG (TCTA)2 TCA (TCTA)13 TCCA
Gorilla long 1 TCTA TCTG (TCTA)3 TCTG (TCTA)16 TCCA TCCA
a
Previously unpublished length variants.
b
(n ˆ 48).
6 K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10

Fig. 1. (A) Heterozygous peak height balance at the vWA locus using a single AmpFlSTR1 kit vWA reverse primer (not degenerate). Top
panel is peak height balance at a 608C annealing temperature in the PCR reaction. Bottom panel is peak height balance at a 62.58C annealing
temperature in the PCR reaction. The x-axis is allele length in nucleotides and the y-axis is peak height in relative ¯uorescence units (rfu). (B)
Heterozygous peak height balance at the vWA locus using a degenerate AmpFlSTR1 kit vWA reverse primer (see Section 3).

``development'' primer set (to the more common primer Medical Examiner, New York City. Sequencing revealed a T
binding site sequence). The degenerate vWA reverse primer to A transversion in the AmpFlSTR1 kit vWA forward
set is included in all of the AmpFlSTR1 kits that amplify primer-binding region. The point mutation is at the penulti-
the vWA locus. mate nucleotide position at the 30 end of the AmpFlSTR1
During the CODIS STR Standardization Project spon- kit vWA forward primer-binding site (data not shown). The
sored by the FBI from 1996±1997 two population database presence of a point mutation so close to the 30 end of the
samples were encountered that exhibited non-ampli®cation forward primer inhibits ampli®cation of an allele that con-
of a vWA allele using AmpFlSTR1 BlueTM PCR ampli®- tains the mutation. The frequency of this particular mutation
cation kit [28]. One DNA sample was provided to our has subsequently been determined to be less than 0.07% in
laboratory by the National Institute of Standards and Tech- the major US population groups, and therefore the primer
nology (NIST) and the other by the Of®ce of the Chief sequence has not been changed [29].

Fig. 2. Sequence at the vWA locus 30 to the tandem repeat region. The top sequence is the consensus sequence [26]. Highlighted in red are the
nucleotide changes for the three sequence variants reported in Section 3. Indicated by arrows below the sequences are the AmpFlSTR1 kit
vWA reverse 30 primer-binding site and the previously published vWA reverse 30 primer-binding site [4,14,27].
K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10 7

Table 4 Table 5
Allele nomenclature, number of alleles sequenced and sequence Allele nomenclature, number of alleles sequenced and sequence
structure of the repeat region at the CSF1PO locus structure of the repeat region at the TPOX locus

Allele na Repeat structure Allele na Repeat structure

Consensus (AGAT)n [30] Consensus (AATG)n [31]


6 2 (AGAT)6 6 3 (AATG)6
7 2 (AGAT)7 7 1 (AATG)7
8 1 (AGAT)8 8 4 (AATG)8
9 3 (AGAT)9 9 2 (AATG)9
10 3 (AGAT)10 10 3 (AATG)10
10.3 1 (AGAT)5 AGT (AGAT)5 11 2 (AATG)11
11 4 (AGAT)11 12 1 (AATG)12
12 7 (AGAT)12 Chimpanzee short 1 (AATG)6
13 2 (AGAT)13 Chimpanzee long 1 GATG (AATG)8
14 2 (AGAT)14 Orangutan 1 (AATG)5
15 1 (AGAT)15 Gorilla short 1 AATG AATA AACG (AATG)7
Chimpanzee short 1 (AGAT)8 Gorilla long 1 AATG AATA AACG (AATG)8
Chimpanzee long 1 (AGAT)10 a
(n ˆ 21).
Orangutan 1 (AGAT)3 AGT (AGAT)2
Gorilla short 1 (AGAT)13
a
(n ˆ 32). Orangutan DNA did not yield any PCR product when
ampli®ed with the AmpFlSTR1 kit TPOX primers, but did
amplify when primers ¯anking the AmpFlSTR1 kit primer-
3.4. CSF1PO binding sites were used to generate a sequencing template.
The orangutan sequence revealed an A nucleotide deletion at
Thirty-two alleles at the CSF1PO locus were sequenced the penultimate base from the 30 end of the forward
(Table 4). The common repeat motif is (AGAT)n, and there- AmpFlSTR1 kit primer (data not shown). This mutation
fore, the alleles are designated by the number of tandem did not occur in the other two primate species, or in any of
AGAT repeats [30]. Most of the alleles differ in size by one the human DNA samples sequenced.
complete tetranucleotide repeat unit. An off-ladder allele
(CSF1PO10.3) was identi®ed during typing of a population 3.6. TH01
database using the AmpFlSTR1 kit primers. The 10.3 allele
differed in length from the usual 4 bp repeat pattern because Twenty-one TH01 alleles were sequenced (Table 6). The
of a deletion of the second A in one of the AGAT repeat units common repeat motif is a simple (AATG)n [1,32]. The
(AGAT ! AGT).
All of the primate CSF1PO alleles, except the single
orangutan allele, contain the same single (AGAT)n repeat Table 6
sequence motif. The orangutan allele had a deletion of the Allele nomenclature, number of alleles sequenced and sequence
second A in one of the AGAT repeat units, similar to the structure of the repeat region at the TH01 locus
human 10.3 allele. Allele nb Repeat structure

3.5. TPOX Consensus (AATG)n [1,32]


5 2 (AATG)5
Twenty-one alleles at the TPOX locus were sequenced 6 1 (AATG)6
7 4 (AATG)7
(Table 5). The common repeat motif is a simple (AATG)n
8 1 (AATG)8
[31]. No length variants were identi®ed in our population 8.3a 1 (AATG)5 ATG (AATG)3
database samples, nor have any been reported in the litera- 9 1 (AATG)9
ture to date. 9.3 4 (AATG)6 ATG (AATG)3
Five of the 21 alleles sequenced had a T ! G transver- 10 2 (AATG)10
sion (from the consensus sequence [31]) in the reverse 10.3a 1 (AATG)7 ATG (AATG)3
primer binding region, eight nucleotides upstream from Chimpanzee short 1 (AATG)6
the 30 end of the primer (data not shown). This mutation Chimpanzee long 1 (AATG)7
in the primer binding region does not have any adverse effect Orangutan 1 AATG AAGG (AATG)2
on ampli®cation using the AmpFlSTR1 kit TPOX primer Gorilla 1 (AATG)6
set, even at annealing temperatures 28C higher than the a
Colorado Bureau of Investigation.
b
recommended annealing temperature (data not shown). (n ˆ 21).
8 K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10

chimpanzee and gorilla alleles have the same sequence motif for lower-than-expected stutter percent due to this particular
as human. The orangutan DNA did not amplify with the interruption of the CTTT repeat is not clear.
AmpFlSTR1 primers, but did amplify with primers ¯ank- Both samples B25 and PB37 FGA28 alleles (14 unin-
ing the AmpFlSTR1 kit primers that were used to generate terrupted repeats) had an average of approximately 5%
sequencing template. Sequencing of the AmpFlSTR1 stutter, a value that is generally associated with an
kit forward primer-binding region in orangutan revealed FGA22 allele. There was no FGA28 allele sequenced that
three (3) nucleotide changes from the human sequence contained a non-interrupted core repeat motif from which to
(data not shown). calculate percent stutter. The trend, however, as demon-
strated in Fig. 2 from [25] is that a 28 allele with an
3.7. FGA percent stutter uninterrupted repeat motif is expected to have an average
of approximately 7±9% stutter.
Percent stutter values for sequenced FGA alleles of It is interesting to note that the longest number of tandem
various lengths are presented in Table 7. Generally, the CTTT repeats seen so far is 22, in the FGA30 allele. In all of
percent stutter increases as the number of uninterrupted the longer FGA alleles (30.2±51.2) the CTTT tandem repeat
CTTT repeats increases; long alleles in which the long is interrupted at least once by a different repeat unit, and the
CTTT repeat is interrupted by some other four nucleotide longest tandem CTTT repeat in the longer alleles sequenced
sequence have stutter percents that match a shorter allele is 18. There may be an upper limit to the number of tandem
with an equivalent number of consecutive CTTT repeats. repeats that can be replicated by a DNA polymerase, and
Percent stutter was determined for 162 population data- thus expansions of a repeat reach a certain size limit. In all of
base samples [25]. The stutter percents of two FGA27 alleles the six STR loci sequenced in this paper, the FGA30 allele
(C49 and C100) deviated from the trend towards higher has the greatest number of tandem repeats of any tetranu-
percent stutter with increasing allele length (Fig. 2 in [25]). cleotide unit.
Sequencing results revealed that these two alleles were
found to have a CTTT ! GTTT variation in the 18th repeat
(Table 1). Both of these FGA27 alleles contained 17 perfect 4. Discussion
CTTT repeats. The stutter percent for these two FGA27
sequence variant alleles was actually lower (4.8  0.2%) Sequencing of STR loci yields an abundance of informa-
than the average stutter percent for FGA25 alleles (7.0  tion about the speci®c loci being sequenced and about
0.5%) that also contain 17 tandem CTTT repeats. The reason tetranucleotide repeats as a more general class of length

Table 7
Percent stutter values for select alleles at the FGA locusa

Sample Allele Percent stutter  S.D. n Number of uninterrupted repeats

PB103 17 2.6  0.3 4 9


CEPH1340-033 18 3.1  0.2 5 10
KG 19 4.2  0.3 5 11
B1 26 4.3  0.5 4 12
B1 21 4.4  0.6 5 13
CEPH1340-033 21 5.0  0.4 5 13
KL 22 5.5  0.5 3 14
CFS1 22 5.8  0.9 5 14
H73 22 5.3  0.8 5 14
PB37 28b 5.3  0.4 5 14
B25 28b 4.8  0.5 5 14
PB37 23 6.1  0.7 5 15
H73 29b 6.8  0.6 5 15
RCMP5 24 6.8  0.8 5 16
KG 24 6.7  0.2 5 16
B25 24 6.6  0.5 5 16
H25 25 7.5  0.7 5 17
KL 26b 7.2  0.7 4 18
CFS1 34.2 8.1  1.0 5 18
RCMP5 26.2 9.5  1.4 5 19
H25 30 9.3  0.7 5 22
a
Values are mean  S.D. for the n replicates indicated.
b
The perfect (CTTT)n repeat interrupted by CTTT repeat.
K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10 9

polymorphisms. Sequencing of STR loci has helped to containing that locus. The power of the STR typing systems
validate the robustness of allele sizing and genotyping using is in the ability to simultaneously obtain genotype informa-
¯uorescent technology; it has been shown several times that tion from several independent loci. The peak height balance
sequence differences in alleles of the same length at a at a particular locus must be interpreted by the laboratory
particular locus have no effect on the ability to bin those analyst in conjunction with the qualitative and quantitative
alleles into size and allele categories [5,11,25]. peak pro®les of the other loci being co-ampli®ed, most
All of the sample alleles ¯agged as off-ladder alleles obviously in the case of a suspected mixture.
using the AmpFlSTR1 kit loci described here were deter- The most certain way to assure that a particular sample
mined to be length variants upon sequencing and, therefore, will type reproducibly, for instance in the comparison of
had been accurately genotyped. The exercise of sequencing exemplar versus evidence DNA samples, is to use the same
off-ladder alleles determines exactly where the insertion or primer set under the same set of ampli®cation conditions for
deletion has occurred. Furthermore, it is important to note each DNA sample typed. By using the same primer
that this exercise does not add anything in terms of correctly sequences, a primer binding site mutation will have the
genotyping the alleles at a locus according to length, which same effect on the ampli®cation of an evidence sample as an
is the basis for the forensically relevant allele designation. It exemplar sample, and therefore the effects of the mutation
is not practical, nor is it recommended that off-ladder alleles will be consistent.
be sequenced in forensic casework. All relevant statistical The data from the FGA locus gives support to the general
population data has been collected using the allele size observation that the percent stutter for alleles at tetranucleo-
data alone. tide repeat loci tends to increase with increasing allele length
The discovery of primer-binding site mutations in a [11,25] as long as the allele length increase is due to the
number of the STR loci (vWA reverse, vWA forward, TPOX; presence of additional perfect tandem repeats. The sequence
[29]) has led to the conclusion that it is dif®cult to ®nd an and stutter percent data for several different FGA alleles that
absolutely conserved primer binding region that will not be are the same length, but vary in sequence in the core repeat
prone to mutation in some individuals. In the case of the region, support the theory that slipped strand mispairing is
vWA forward primer-binding site, even the primate responsible for the production of stutter bands during the
sequences are conserved and identical to the human con- PCR. When the core repeat sequence is interrupted by a
sensus sequence (data not shown). The location of the primer different repeat unit sequence there will be less of a chance
binding site mutation with respect to the 30 end of the primer for DNA polymerization to continue for two possible rea-
can have a signi®cant impact on how the ampli®cation of the sons: (1) instability of the mismatched hybrids, and/or (2)
locus will be affected. In the case of the forward inability of the polymerase to extend a 30 mismatch. Either
AmpFlSTR1 kit vWA primer, a mutation 1 nucleotide of these possibilities can lead to lack of ampli®cation when
from the 30 end of the primer causes a total lack of ampli- and if slipped strand mispairing occurs. Therefore, a corre-
®cation under the conditions used for AmpFlSTR1 kits (i.e. sponding decrease in the percent stutter for that allele would
annealing temperature, salt concentration in the PCR buffer, be observed.
primer concentration). Similarly, orangutan DNA did not
amplify using the AmpFlSTR1 kit TPOX forward primer
because of an A nucleotide deletion 1 nucleotide upstream Acknowledgements
from the 30 end of the primer. In the case of the
AmpFlSTR1 kit vWA reverse primer (not including the The authors wish to thank Roche Molecular Systems
degenerate primer), a mutation of four nucleotides from the (Alameda, CA); Laboratory Corporation of America; For-
30 end of the primer causes a distinct imbalance in the ensic Science Service (Birmingham, UK); Royal Canadian
ampli®cation ef®ciency of the mutant versus consensus Mounted Police (Ottawa, Canada); Colorado Bureau of
sequence. In the case of the AmpFlSTR1 kit TPOX reverse Investigation; Florida Department of Law Enforcement;
primer, a mutation of eight nucleotides from the 30 end of the California Department of Justice DNA Laboratory; SERI
primer had no effect on the ampli®cation ef®ciency of any (Richmond, CA) for the contribution of DNA samples for
mutant versus consensus sequence (data not shown). sequencing. We also wish to thank Rhonda Roby for helpful
These examples indicate that primer-binding site muta- comments and Mary Anderson for administrative assistance
tions can lead to either no effect on ampli®cation, an during the preparation of the manuscript.
imbalance in the ampli®cation ef®ciency of two heterozy-
gous alleles, or non-ampli®cation of an allele. It is important
References
to remember that reproducible peak height imbalance at a
locus may be due to any of the following possibilities: (1) the [1] A. Edwards, A. Civitello, DNA typing and genetic mapping
presence of a mixture of two or more DNA samples; (2) with trimeric and tetrameric tandem repeats, Am. J. Hum.
preferential ampli®cation, for example, due to extreme Genetics 49 (1991) 746±756.
inhibition or degradation; (3) primer binding site mutations; [2] H.A. Erlich, D. Gelgand, J.J. Sninsky, Recent advances in the
or (4) more than two copies of the chromosomal segment polymerase chain reaction, Science 252 (1991) 1643±1651.
10 K. Lazaruk et al. / Forensic Science International 119 (2001) 1±10

[3] C.J. Fregeau, R.M. Fourney, DNA typing with ¯uorescently [18] P. Gill, E. d'Aloja, J. Andersen, B. Dupuy, M. Jangblad, V.
tagged short tandem repeats: a sensitive and accurate Johnsson, A.D. Kloosterman, A. Kratzer, M.V. Lareu, M.
approach to human identi®cation, Biotechniques 15 (1993) Meldegaard, C. Phillips, H. P®tzinger, S. Rand, M. Sabatier,
100±119. R. Scheithauer, H. Schmitter, P. Schneider, M.C. Vide, Report
[4] N.J. Oldroyd, A.J. Urquhart, C.P. Kimpton, E.S. Millican, of the European DNA pro®ling group (EDNAP): an
S.K. Watson, T. Downes, P. Gill, A highly discriminating investigation of the complex STR loci D21S11 and
octoplex short tandem repeat polymerase chain reaction HUMFIBRA (FGA), Forensic Sci. Int. 86 (1997) 25±33.
system suitable for human individual identi®cation, Electro- [19] P. Gill, B. Brinkmann, E. d'Aloja, J. Andersen, W. Bar, A.
phoresis 16 (1995) 334±337. Carracedo, B. Dupuy, B. Eriksen, M. Jangblad, V. Johnsson,
[5] K. Lazaruk, P.S. Walsh, F. Oaks, D. Gilbert, B.B. Rosenblum, A.D. Kloosterman, P. Lincoln, N. Morling, S. Rand, M.
S. Menchen, D. Scheibler, H.M. Wenz, C. Holt, J. Wallin, Sabatier, R. Scheithauer, P. Schneider, M.C. Vide, Considera-
Genotyping of forensic short tandem repeat (STR) systems tions from the European DNA pro®ling group (EDNAP)
based on sizing precision in a capillary electrophoresis concerning STR nomenclature, Forensic Sci. Int. 87 (1997)
instrument, Electrophoresis 19 (1998) 86±93. 185±192.
[6] J.S. Ziegle, Y. Su, K.P. Corcoran, L. Nie, P.E. Mayrand, L.B. [20] A. Moller, E. Meyer, B. Brinkmann, Different types of
Hoff, L.J. McBride, M.N. Kronick, S.R. Diehl, Application of structural variation in STRs: HumFES/FPS, HumVWA and
automated DNA sizing technology for genotyping micro- HumD21S11, Int. J. Legal Med. 106 (1994) 319±323.
satellite loci, Genomics 14 (1992) 1026±1031. [21] M.D. Barber, B.J. McKeown, B.H. Parkin, Structural
[7] D.G. Wang, Large-scale identi®cation, mapping, and geno- variation of novel alleles at the HumvWA and HumFES/
typing of single-nucleotide polymorphisms in the human FPS short tandem repeat loci, Int. J. Legal Med. 108 (1995)
genome, Science 280 (1998) 1077±1082. 31±35.
[8] P.-Y. Kwok, C. Carlson, T. Yager, W. Ankener, D. Nickerson, [22] A. Urquhart, C.P. Kimpton, T.J. Downes, P. Gill, Variation in
Comparative analysis of human DNA variations by ¯uores- short tandem repeat sequences Ð a survey of twelve
cence-based sequencing of PCR products, Genomics 23 microsatellite loci for use as forensic identi®cation markers,
(1994) 138±149. Int. J. Legal Med. 107 (1994) 13±20.
[9] D.N. Cooper, B.A. Smith, H.J. Cooke, S. Niemann, J. [23] E. Meyer, P. Wiegand, S.P. Rand, D. Kuhlmann, M. Brack, B.
Schmidtke, An estimate of unique DNA sequence hetero- Brinkmann, Microsatellite polymorphisms reveal phyloge-
zygosity in the human genome, Hum. Genet. 69 (1985) 201± netic relationships in primates, J. Mol. Evol. 41 (1995) 10±14.
205. [24] P. Wiegand, E. Meyer, B. Brinkmann, Microsatellite struc-
[10] P. Gill, A. Urquhart, E. Millican, N. Oldroyd, S. Watson, R. tures in the context of human evolution, Electrophoresis 21
Sparkes, C.P. Kimpton, A new method of STR interpretation (2000) 889±895.
using inferential logic-development of a criminal intelligence [25] J. Wallin, M.R. Buoncristiani, K. Lazaruk, N. Fildes, C.L.
database, Int. J. Legal Med. 109 (1996) 14±22. Holt, P.S. Walsh, TWGDAM validation of the AmpFlSTR
[11] P.S. Walsh, N.J. Fildes, R. Reynolds, Sequence analysis and Blue PCR ampli®cation kit for forensic casework analysis, J.
characterization of stutter products at the tetranucleotide Forensic Sci. 43 (1998) 854±870.
repeat locus vWA, Nucleic Acids Res. 24 (1996) 2807±2812. [26] D.J. Mancuso, E.A. Tuley, L.A. West®eld, N.K. Worrall, B.B.
[12] P.S. Walsh, D.A. Metzger, R. Higuchi, Chelex1 100 as a Shelton-Inloes, J.M. Sorace, Y.G. Alevy, J.E. Sadler,
medium for simple extraction of DNA for PCR-based typing Structure of the Gene for Human von Willebrand Factor, J.
from forensic material, Biotechniques 10 (1991) 506±513. Biol. Chem. 264 (1989) 19514±19527.
[13] M.D. Barber, B.J. McKeown, B.H. Parkin, Structural [27] A. Urquhart, M.J. Oldroyd, C.P. Kimpton, P. Gill, Highly
variation in the alleles of a short tandem repeat system at discriminating heptaplex short tandem repeat PCR system for
the human alpha ®brinogen locus, Int. J. Legal Med. 108 forensic identi®cation, Biotechniques 18 (1995) 116±121.
(1996) 180±185. [28] M.C. Kline, B. Jenkins, S. Rodgers, Non-ampli®cation of a
[14] C. Kimpton, A. Walton, P. Gill, A further tetranucleotide vWA allele, J. Forensic Sci. 43 (1998) 250.
repeat polymorphism in the vWF gene, Hum. Mol. Genet. 1 [29] Walsh, P.S., Commentary on Kline MC, Jenkins B, Rogers S,
(1992) 287. Non-ampli®cation of a vWA allele, J. Forensic Sci. 43 (1998)
[15] H. Li, L. Schmidt, M.-H. Wei, T. Hudstad, M.I. Leman, 1103±1104.
B. Zbar, K. Tory, Three tetranucleotide polymorphisms for [30] H.A. Hammond, L. Jin, Y. Zhong, C.T. Caskey, R.
loci: D351352; D3S1358; D3S1359, Hum. Mol. Genet. 2 Chakraborty, Evaluation of thirteen STR loci for use in
(1993) 1327. personal identi®cation applications, Am. J. Hum. Genet. 55
[16] DNA recommendations, 1994 report concerning further (1994) 175±189.
recommendations of the DNA Commission of the ISFH [31] R. Anker, T. Steinbrueck, H. Donis-Keller, Tetranucleotide
regarding PCR-based polymorphisms in STR (short tandem repeat polymorphism at the human thyroid peroxidase
repeat) systems, Int. J. Legal Med. 107 (1994) 159±160. (hTPO) locus, Hum. Mol. Genet. 1 (1992) 137.
[17] W. Bar, B. Brinkmann, B. Budowle, A. Carracedo, P. Gill, P. [32] C. Puers, H.A. Hammond, L. Jin, C.T. Caskey, J.W. Schumm,
Lincoln, W. Mayr, B. Olaisen, DNA recommendations: Identi®cation of repeat sequence heterogeneity at the
further report of the DNA commission of the ISFH regarding polymorphic short tandem repeat locus HUMTH01 [AATG]n
the use of short tandem repeat systems, Int. J. Legal Med. 110 and reassignment of alleles in population analysis by using a
(1997) 175±176. locus-speci®c allelic ladder, Am. J. Hum. Genet. 53 (1993) 1±5.

Das könnte Ihnen auch gefallen