Beruflich Dokumente
Kultur Dokumente
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at http://www.jstor.org/page/
info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content
in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship.
For more information about JSTOR, please contact support@jstor.org.
American Society of Plant Biologists (ASPB) is collaborating with JSTOR to digitize, preserve and extend access to Plant
Physiology.
http://www.jstor.org
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
of Expressed
Analyses
Sequence
P. Gleave,
D. Newcomb*,
Erik H.A. Rikkerink,
Ross N. Crowhurst,
Andrew
Andrew
C. Allan,
L.
Kim
R.
H.
Emma
Bart
William
A.
Gera,
Bowen,
Judith
Jamieson,
J. Janssen,
Lesley
Beuning,
Laing,
Steve McArtney,
C. Snowden,
Bhawana Nain, Gavin S. Ross, Kimberley
J.F. Souleyre,
Edwige
Eric F.Walton,
Yauk
and Yar-Khing
Richard
Horticultural
and Food
New
Auckland,
Research
Institute
of New
Zealand
Mt.
Limited,
Albert
The domestic apple (Malus domestica; also known as Malus pumila Mill.) has become
commercial traits such as disease and pest resistance, grafting, and flavor and health
of
discovery
substantial
Over
genes
expressed
in
involved
sequence
150,000 expressed
treatments.
and
Research
Centre,
Zealand
these
tag
sequence
Clustering
of
these
traits,
collection
to map
markers
and
genes,
develop
tissues
from various
of apple,
focusing
from 43 different
results
in a set
of 42,938
breed
on
cDNA
nonredundant
a model
fruit
cultivars,
tissues
of
we
the
have
cultivar
libraries representing
sequences
Royal
34 different
17,460
comprising
produced
Gala.
tissues
tentative
contigs and 25,478 singletons, together representing what we predict are approximately one-half the expressed genes from
repeats are found in 4,018
apple. Many potential molecular markers are abundant in the apple transcripts. Dinucleotide
nonredundant
sequences, mainly in the 5'-untranslated region of the gene, with a bias toward one repeat type (containing AG,
88%) and against another (repeats containing CG, 0.1%). Trinucleotide repeats are most common in the predicted coding
regions and do not show a similar degree of sequence bias in their representation. Bi-allelic single-nucleotide polymorphisms
are highly abundant with one found, on average, every 706 bp of transcribed DNA. Predictions of the numbers of
representatives from protein families indicate the presence of many genes involved in disease resistance and the biosynthesis
of flavor and health-associated
compounds. Comparisons of some of these gene families with Arabidopsis
(Arabidopsis thaliana)
in the lineages leading to apple of biosynthetic and regulatory genes
instances
where
have
there
been duplications
suggest
that
are
fruit
in fruit.
expressed
This
resource
paves
the way
for a concerted
are recognized
for their flavor,
Apples
by consumers
attributes
health, and nutritional
(Harker et al, 2003).
Because of this, they have become
the major temperate
fruit crop and a significant
horticultural
of
component
fresh fruit traded internationally
(Zohary and Hopf,
2000). The domestic apple (Malus domestica; also known
as Malus
to the family Rosaceae.
pumila Mill.) belongs
with
fruit and ornamental
other
commercial
Together
it forms the subfamily Maloideae
(Challice,
species,
is thought to have evolved by hybridiza
1974), which
=
tion from the families
(x
9) and
Spiraeoideae
=
et al., 2000). The re
Prunoideae
8; Lespinasse
(x
has a basic haploid
number
of
sulting allopolyploid
x = 17 and an estimated
to
size
of
743
796
Mb
genome
and Earle, 1991).
(Arumuganathan
was
This work
for Research,
supported
by the Foundation
no. C06X0207),
and Technology
and the Horticultural
Science,
(grant
and Food Research
Institute of New
Zealand
Limited.
*
e-mail
author;
rnewcomb@hortresearch.co.nz;
Corresponding
fax 64-9-8154200.
The author
findings
described
of materials
for distribution
responsible
integral to the
in
this
in
with
the policy
article
accordance
presented
in the Instructions
for Authors
is:
(www.plantphysiol.org)
D. Newcomb
Richard
Article,
publication
(rnewcomb@hortresearch.co.nz).
and citation
information
date,
can be found
at
www.plantphysiol.org/cgi/doi/10.1104/pp.105.076208.
Plant
functional
genomics
effort
in this
important
temperate
crop.
Physiology,
May
2006,
Vol.
141, pp.
147-166,
www.plantphysiol.org
a model
im
for understanding
Apple has become
tree
in
to
traits
The
crops.
portant
ability
graft
fruiting
a
to speed propagation
scions
and mass
produce
uniform
fruit from an outbreeding
genetically
plant
to the success of apple and many other
has contributed
horticultural
other important
traits, in
crops. Also,
some insect resistance
and
traits, can
cluding dwarfing
be conferred by rootstocks
and
Carlson,
(Ferr?e
1987).
in the skin and flesh of the fruit confer
Compounds
flavor, taste, and health benefits that are important con
sumer traits in apple. Presumably,
these compounds
as attractants
evolved
and bribes for their seed dis
Flavor compounds
increase substantially
dur
perses.
fruit
which
toward
the
end
takes
of
ing
ripening,
place
20 to 21 weeks
of fruit development.
This increase in
flavor is caused by an autocatalytic
burst of ethylene
late in fruit development,
characteristic
of
production
all climacteric fruit (Fellman et al., 2000). Also triggered
are a marked
increase
in cell wall
and
by ethylene
and a general progression
starch breakdown
through
and breakdown
senescence,
(Giovannoni,
ripening,
2001).
in many of the aforementioned
The genes involved
in apple. However,
traits are yet to be identified
with
the advent of high-throughput
isolation
sequencing,
of genes potentially
in such traits is now
involved
more
is the single
readily attainable. One approach
cloned
of
cDNAs
RNA
pass sequencing
representing
?
2006 American
Society
of Plant
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Biologists
147
Newcomb
et al.
as
known
(mRNAs). These are otherwise
transcripts
an
and
have
become
(ESTs)
sequence
tags
expressed
for rapidly developing
established method
gene data
bases (Adams et al., 1993). By sequencing
numbers
of
clones from cDNA
libraries derived
from RNA from
source tissues, the total set of genes
different
sampled
can be maximized.
from the genome
Bioinformatic
of the resulting
sequences yield
sorting and clustering
that form the basis of a
databases
of putative
genes
functional
of these
program. Gene mining
genomics
as
aided
such
databases,
microarrays,
by techniques
can be used to select candidate
genes that are impli
cated in particular
ESTs have
crop traits. In addition,
as useful
sources
of both
been
identified
simple
(SSRs) and single-nucleotide
sequence
repeats
poly
for creating
(SNPs), both useful markers
morphisms
in plants (Morgante et al., 2002; Rafalski,
genetic maps
2002).
ESTs have been collected
for many
plant species.
are
The most
comprehensively
surveyed
Arabidopsis
in GenBank)
and rice
thaliana; 418,563
(Arabidopsis
in
both
of
which
GenBank),
(Oryza sativa; 406,624
have also had their entire genome
(Arabi
sequenced
Rice Ge
Initiative, 2000; International
dopsis Genome
nome
fruit crops
Project, 2005). Whereas
Sequencing
an EST
have been
less extensively
using
surveyed
a number
there
have
been
of re
approach,
recently
an
on
EST
is
extensive
EST
fruit
There
ports
projects.
collection
available
from tomato (Lycopersicon esculen
et al., 2002), and genes likely to
tum; Van der Hoeven
in the ripening process have been identi
be involved
fied by virtual northern analysis
(Fei et al., 2004). ESTs
of strawberry
fruit have been ana
ananassa)
(Fragaria
(Aharoni et al., 2000)
lyzed by microarray
technology
also during
2002).
(Aharoni and O'Connell,
ripening
In an EST collection
from the fruit of pineapple
(Ananas
et al. (2005) found a very high abun
comosus), Moyle
re
dance of metallothione
gene transcripts, whereas
new
(Vitis vinifera) identify many
ports from grape
SSRs useful for grape mapping
(Moser et al., 2005) and
in fruit development
candidate
involved
many
genes
traits (Goes da Silva et al., 2005). The only other
a significant
to have
Rosaceae
number
of
species
ESTs described
is apricot (Prunus armeniaca; Grimplet
et al., 2005).
the first EST sequencing
Here we describe
project in
and analysis of 151,687
apple. We report the collection
high-quality
apple ESTs, largely from the commercial
cultivar
infor
apple
Royal Gala. From this sequence
in
into functional
mation, we put sequences
categories
for functional
and
programs
preparation
genomics
describe SSRs and SNPs in the sequence data that will
in marker-assisted
In
be useful
programs.
breeding
ESTs that po
addition, we show that there are many
and
flavor
encode
of important
enzymes
tentially
and explore
health compound
biosynthetic
pathways,
whether
there has been an expansion
of the number of
in
from
families
involved
genes
gene
secondary me
and regulation
that are expressed
tabolite biosynthesis
in fruit tissues.
RESULTS
EST Sequencing
and Clustering
cDNA
libraries were
constructed
from a range of
different
tissues and developmental
time points using
material
from the apple cultivars Royal Gala, Pinkie,
Pacific Rose, and the dwarfing
rootstock M9. Libraries
were also constructed
from some tissues, plants, and
cell lines that were
to biotic and abiotic
subjected
stresses. The libraries were sequenced
to varying depths
on library
and novelty.
(Table I), depending
quality
Over the 43 cDNA libraries sequenced,
151,687 good
quality sequences were recovered. The average edited
length of the sequences was 468 bases.
of the sequences
Clustering
using a 95% threshold
consensus
17,460 tentative
(TC) sequences
yielded
with 25,478 sequences
remaining unclustered
(single
tons). TC sequences
range in length from 66 to 6,145
bases with an average of 745 bases, whereas
singletons
range in size from 47 to 790 bases with an average of
394 bases. The GC ratio of singletons
from
ranged
13% to 78%, with an average of 44%, whereas
that for
TC sequences
ranged from 14% to 69%, also with an
of
44%.
the TC sequences
and sin
average
Together,
an
se
EST
dataset
of
42,938
gletons
yielded
apple
the singletons
and TC sequences
quences. Hereafter,
are
referred to as the nonredundant
(NR)
collectively
set.
of 90% gener
gene
using a threshold
Clustering
ated fewer TC sequences
(16,756) and singletons
(17,858).
this
lower
threshold
increased
the
However,
using
number
of instances
of paralogs
being
incorporated
into the same cluster and was
not used
therefore
subsequently.
on a set of 545
Codon
assessed
usage was
apple
to
cDNA sequences
contain
cod
predicted
full-length
checked by manual
ing regions. These cDNAs were
inspection of BLASTx versus NRDB90
reports to make
sure they are devoid of introns and frameshift
errors.
From these data, the open reading
frames were de
fined and a codon usage table created from the 203,267
are found
codons
in the full
(Table II). All codons
cDNA
with
the
least
codon
dataset,
length
frequent
over
times.
content
at the
100
The
GC
represented
third position
of the codon is 52%.
SSRs and SNPs
are a useful source of microsatel
cDNA sequences
common
lites or SSRs in plants. SSRs are particularly
a lesser
in the 5'-untranslated
to
and,
(UTR)
region
in the 3'-UTR of transcribed
extent,
sequences
plant
the nature of the
(Morgante et al., 2002). We analyzed
dataset. Approxi
perfect SSRs in the apple sequence
one or
17%
of
the
contained
mately
apple sequences
more
SSRs. The relative
di-, tri-, or tetranucleotide
of di- and trinucleotide
is similar
repeats
frequency
Table III). Just over
(4,018 versus 4,010, respectively;
12 and 14
one-half
(57%) of the repeats were between
bases in length and only 17% of the di-, tri-, and tetra
in
nucleotide
repeats were
longer than 20 nucleotides
148
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
Physiol.
Vol.
141,
2006
Analyses
Table
I. Summary
AAFA
AAFB
AAGA
AAKA
Minimum
Average
Maximum
Sequence
Sequence
Sequence
Length
Length
Length
50
47
767425
7,018
382
821
6,184
5,389
50
331
682
1,556
1,058
Library
Description
fruit, seeds removed
fruit 150 DAFB
skin peel, tree-ripened
fruit 150 DAFB
skin peel, tree-ripened
51 342 702 530
leaf, normalized
59 DAFBb
Royal Gala
Royal Gala
apple
51
leaf, normalized
687
365
Total
No.
ESTs
No. ESTs
Assigned
toTCa
Royal Gala
senescing
partially
50
leaf
318
729
1,888
50
leaf
451
3,059
770
partially
AAYA
Royal Gala
126 DAFB
ABCA
fruit cortex
521
118
736
751
740
753
512
589
524
591
at 0.5?C
for 24
ABEB
ABKA
ABLC
infected with
V.
inaequalisc
V.
inaequalisc
128
539
72
573
Royal Gala
Royal Gala
leaves
seedling
Braeburn
cultured
fruit cells,
boron
3,946
5,004
523
4,693 756
165
552
741
4,808
4,616
170
564
775
4,798
3,967
198
521
562
766
4,082
557
209
exposed
781
4,900
AELA
Aotea
expanding
Royal Gala young
51
leaf
758
349
leaf
expanding
777
3,963
831
1,665
50 318
1,524
712
5,457
4,388
51
344
777
918
shoot
50
296
737
5,572
4,511
AVBC
Royal Gala
young
shoot
57
544
790
17,967
15,874
for all
libraries combined
1,077
468
877
Total
to a TC sequence
of ESTs from each
(contig).
library that have been assigned
bDays after
V. inaequalis
derived from the pathogen
(K. Plummer, W. Cui, and M. Templeton,
fungal sequences
in the entire database
TC sequences
(i.e. this figure is not additive).
aNumber
contain
unique
Physiol.
Vol.
141,
2006
unpublished
8.6
11.9
21.8
24.0
22.2
17.8
18.6
10.2
4.0
17.3
29.7
13.8
13.4
10.6
17.7
17.7
10.3
8.5
19.6
24.9
18.7
20.7
295
27.4
19.0
11.6
2,093
235
25,478
17,460d
no. of NR sequences
43,
full bloom.
3.9
12.8
1,061
410
151,687
7.0
6.7
1,026
782
young
14.2
304
895
3,933
Royal Gala
Royal Gala
27.4
1,069
3,891
AVBB
AOFA
fruit
4,383
754
585 944
32.3
24.4
189
108
192
92
478
192
831
167
148
633
517
167
850
95
141
395
927
770
1,075
764
4,715
31.3
23.0
1,090
4,215
3,914
186
108
ABMA M9 phloem
ABNB
4,655
600
378
886
403
2,361
4,769
756
26.8
691
295
27
97
703
437
535
378
506
21.8
2,426
3,945
789
486
1,078
495
24.6
646
4,169
7424,481
21.1
1,460
6,412
97
139
105
201
226
spur buds
spur buds
spur buds
spur buds
fruit stored
Royal Gala
8,838
12.9
32.0
342
309
435
2,003
Royal Gala
Royal Gala
Royal Gala
Royal Gala
Royal Gala
Royal Gala
21.7
1,152
1,453
Royal Gala
Percentage
of Singletons
per Library
795
498
112
127
933
679
AASA
AASB
AASC
AAUA
AAWA
AAXA
M9xylem
Pacific Rose
Pacific Rose
Pacific Rose
Pacific Rose
from Apple
1,523
4,130
AARA
senescing
No.
418
389
516
AAZA
ABAA
ABAB
ABBA
ABBB
Tags
Singletons
5,495
Sequence
ESTs
of apple
Library
Code
AAAA
of Expressed
cThese
data).
36.4
938
libraries also will
dTotal number
of
for Arabidopsis,
followed
by AC repeats at 4% in
in
8%
CG repeats
with
apple compared
Arabidopsis.
are very infrequent
in plants at 0.05% in apple and
0.14% in Arabidopsis.
Next we
the position
of the SSRs in
investigated
relation to putative
initiation
(Met) and stop codons
within
the apple sequence dataset. First, we identified
a dinucleotide
sequences
containing
repeat with more
than 100 bp of flanking
DNA
and
ranked
the
149
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
et al.
Newcomb
Table
cDNA
II. Codon
usage
calculated
using
545
full-length
Table
apple
sequences3
Codon
II. (Continued.)
Codon
AminoAcid
Fraction0
Per 1,000e
No.
GCA
0.27
18.92
3,846
GCC
0.26
18.42
3,745
GCG
0.14
9.82
GCT
0.33
23.36
TGC
0.59
10.38
TGT
0.41
7.20
GAC
0.44
23.91
4,860
1,997
4,748
2,110
1,464
GAT
0.56
29.98
6,095
GAA
0.45
28.70
5,834
GAG
0.55
34.67
7,047
TTC
0.53
22.14
4,500
F
F
TTT
0.47
19.95
4,056
GGA
0.29
19.85
4,034
GGC
CAC
CAT
0.23
16.09
3,270
GGG
G
0.22
15.40
3,130
GGT
G
0.25
17.43
3,542
0.51
13.34
2,712
0.49
12.67
2,575
ATA
0.21
9.98
2,028
ATC
0.37
17.49
3,556
ATT 0.42
19.74
4,012
I
K
AAA
0.39
23.77
4,832
AAG
0.60
36.10
7,337
CTA
L
0.09
7.77
1,579
0.21
18.83
3,828
L
L
L
CTC
CTG
CTT
0.17
15.33
3,116
0.21
18.76
3,813
TTA
0.09
8.09
TTG
0.23
21.11
1 ATGM
N
AAC 0.51
1,644
4,291
24.87
5,056
23.02
4,679
21.98
4,467
AAT
0.49
CCA
0.30
17.37
3,530
CCC
0.20
11.60
2,358
CCG
0.21
11.84
2,406
CCT 0.29
CA?
16.37
3,328
0.48
19.08
3,879
Q
R
CAG
0.52
20.52
4,172
AGA
0.26
13.02
2,646
AGG
0.27
13.60
2,764
CGA
0.11
5.28
1,073
CGC
0.12
6.17
1,254
CGG
1,255
0.12
6.17
CGT
0.10
5.14
AGC
0.16
14.65
2,977
1,044
AGT
0.14
12.18
2,476
TCA
0.19
16.63
3,381
0.19
16.99
3,453
11.21
2,279
TCC
TCG 0.13
TCT
S
0.19
17.28
3,512
ACA
0.26
12.97
2,636
ACC
3,079
0.30
15.15
ACG
0.14
7.23
ACT
0.29
14.68
V
GTA
0.11
7.27
V
GTC
0.23
14.47
1,469
2,985
1,478
2,942
GTG
0.33
20.81
4,231
GTT
0.33
20.57
4,181
Y
Y
TAC
TAT
1 TGG W
14.01
0.57
14.83
3,015
0.43
11.16
2,269
2,847
Amino Acid
*
*
*
Per 1,000e
Fraction6
TAA
0.34
0.91
TAG
0.25
0.66
TGA
0.41
1.11
No.
185
135
225
acid).
frequency
normalized
1,000
per
bases.
in order of significance
to public
of match
sequences
BLASTx
et
al., 1990). This
(Altschul
sequences
using
that we had identified
ensured
the correct open read
the start and stop codons
ing frame and, therefore,
accurately
(Fig. 1A). Of the top 100 in this ranking, we
found that 83% contained
dinucleotide
repeats in the
in
5'
2%
the
-UTR,
putative
putative
coding region, and
15% in the putative
3'-UTR. These figures are similar
for the Arabidopsis
83%, coding
(5'-UTR
genome
from relative
region 0.4%, and 3'-UTR 16% as deduced
per Megabase
frequencies
pair given by Zhang et al.
data
that repeats
in the
[2004]). These
suggested
5'-UTR are disproportionately
within
100
high
bp of
start site. We then analyzed
the translation
all dinu
cleotide
(Fig. IB) and trinucleotide
(Fig. 1C) repeats
than
six repeats present
in the entire apple
longer
database
cutoff
using a BLASTx E-value
significance
criterion of e-20 to identify all sequences with a rea
on which
sonable protein match
in GenBank
to base
start
At
re
sites.
least
for
the
dinucleotide
putative
seen
on
this
the
data
is
consistent
peats,
pattern
global
that manually
with
collected
for the top 100 ranked
show a consistent
genes used above. Both datasets
in the 5'-UTR closest to
pattern, with SSRs clustered
the
start
codon.
In addition
are also a useful
to SSRs, EST sequences
source of SNPs, which
can also be used in
mapping
and marker-assisted
The major
cultivar
breeding.
in this study was
(78.9%).
sequenced
Royal Gala
some sequences were also from other
However,
apple
cultivars,
including M9 (9.7%), Pinkie (3.8%), Braeburn
(3.7%), Pacific Rose (1.9%), Aotea
(1.1%), and Northern
an
is
also
which will
outbreeder,
Spy (0.8%). Apple
increase levels of heterozygosity
within
cultivars. To
increase the instances of
gether, these factors should
SNPs in the apple EST data. This seems to be the case
with evidence
for 18,408 bi-allelic
SNPs confirmed
by
more than one sequence per al?ele from the 13.0 Mb of
Bi-allelic
SNPs are
aligned NR sequences
analyzed.
therefore
706
found, on average,
every
bp of tran
were more
common
scribed DNA.
Transitions
than
transversions.
There were
4,592 AG and 5,112 CT
transitions
compared with 2,032 AC, 2,372 AT, 2,228
CG, and 2,072 GT transversions
(Table IV). Further
more,
one
or more
restriction
endonuclease
were
site polymorphisms
revealed
in approximately
SNPs
82% of NR
SNPs.
predicted
150
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
cleavage
with
candidate
with
sequences
Physiol.
Vol.
141,
2006
Analyses
Table
III. Summary
in apple
of microsatellites
Dinucleotide
No. NR
Repeat
Composition
Sequences
AC/CA/GT/TG
AG/GA/CT/TC
4.0
3 162
3,548
AT/TA
306
CG/GC
0.1
Totals
Trinucleotide
4,018
No. NR
Repeat
Composition
Sequences
with
Percentage of
Apple Di Repeats
of Expressed
Sequence
Tags
from Apple
Arabidopsis
Apple
Rank
Percentage of
Di Repeats
Arabidopsis
8.0
1 88.3
7.62
4
83.0
8.8
0.14
100
100
Percentage of
Apple Tri
Apple
Rank
Arabidopsis
Rank
Repeats
Functional
4,010
identified
from apple. We used automated
predictions
to the Inter-Pro database
based on comparisons
to
in
factors
and
detail
analyze
transcription
greater
common
identified
the most
factor fam
transcription
ilies in the apple sequences
and compared
the rank
(Table VII). The MYB
ings of these with Arabidopsis
factor family is the most common within
transcription
the apple NR sequences.
Plant
Physiol.
Vol.
141,
2006
1
5
4
9
8
7
6
2
10
100%
Genes
Categorization
Encoding
Important Traits
in Apple
Fruit
This
collection
of ESTs contains signatures
of many
in
involved
in
traits
Whereas
genes
important
apple.
much of primary metabolism
and basic plant physio
are not
to apple,
some
logical processes
peculiar
elements
of the biology
to the
of apple are unique
or other
of the Rosaceae
species, or at least members
climacteric
fruit.
Fruit Ripening
are a climacteric
a
fruit, displaying
Apples
rapid
at the onset of ripening simul
increase in respiration
an increase
taneous with
in the production
of the
hormone
(Knee, 1993). This process alters the
ethylene
and physiology
of the fruit to produce
biochemistry
the attributes we associate with fruit that are ready to
eat, including
color, texture, flavor, and nutritional
content (Fellman et al., 2000). Many of these processes
are under
the control of ethylene,
the synthesis
of
which
is autocatalytic
(McKeon and Yang, 1987; Fig. 2).
In the first biosynthetic
is converted
to
step, Met
(SAM) by S-adenosyl-L-Met
S-adenosyl-L-Met
synthe
tase (EC 2.5.1.6), represented
in
by 28 NR sequences
the apple sequence
dataset. Next,
SAM is converted
to
acid (ACC) by
1-aminocyclopropane-l-carboxylic
ACC synthase
in what
(EC 4.4.1.14; 10 NR sequences)
is the rate-limiting
in
the
step
pathway.
Finally, ethyl
ene is
(EC 1.14.17.4; 13
synthesized
by ACC oxidase
NR sequences).
In apple, an ACC
and an
synthase
ACC oxidase gene have each been silenced
in trans
that many
of the flavor and
lines, revealing
genic
texture traits are under ethylene
control
(Dandekar
et al., 2004).
151
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
et al.
Newcomb
20 -,
Position
Length
cd 16 H
O)
?CO 14
12 H
10
o
I
o
CM
O
o
t-
Distance
from putative
O
O
t-
nl
POih
O
O
O
O
O
O
O
O
CN co ?3- m
start/stop
Dinucleotide
70
60
C\|
CD
v
50
0
a.
40
LU
co ^n
d
z:
20
10
il? nlknnfll?
illnnnnllll
rjnilpnl^n,11
CO cm
Distance
Flavor
from putative
initiating Met
35
Trinucleotide
30
o
CN
25 H
O
CD 20
Q.
X
LU
CO 15 H
or:
z:
10 H
Jjldll
i nnn[ll]
Distance
in the perception
of ethylene have
Proteins
involved
the posi
almost
been
isolated
exclusively
through
in tomato
and
of genetic mutants
tional cloning
et
2001; Adams-Phillips
al.,
(Giovannoni,
Arabidopsis
of the apple
these sequences,
2004). Using
many
can be found in the apple sequence
representatives
are members
of
dataset
receptors
(Fig. 2). Ethylene
are
17
which
there
class
of
the His kinase
receptor
in the apple sequence
dataset. These
NR sequences
their signal through a mitogen
transduce
receptors
activated protein
(MAP) kinase cascade. MAP kinase
kinase kinases are negative
regulators of the receptors
constitutive
with mutants
for these genes
showing
families
activation of the pathway. Two representative
have relatives in the apple sequence dataset
(CTR1, 27
Alterna
NR sequences;
CTR2, eight NR sequences).
an ethylene-inducible
is
MAP
MPK6,
kinase,
tively,
of these
represented by six NR sequences. Downstream
are amembrane-bound
insensitive-2
receptor ethylene
two sets of
NR
and
then
(EIN2; eight
sequences)
the
factors/including
ethylene-insensitive
transcription
and the ethylene
like (EIL) family (18 NR sequences)
21
NR
factors
ERF2, six
(ERF1,
response
sequences;
NR sequences;
ERF4, 10 NR
ERF3, 15 NR sequences;
The targets of the ERFs are likely to
sequences).
in flavor biosynthesis
and
include
involved
genes
in ripening
texture modification
fruit.
lllll|lljll,
from
Biosynthesis
and modulators
contributors
Sugars are important
in most
fruit species Sue is
of flavor in fruit. Whereas
in members
of
the major
transported
photosynthate,
for
the Rosaceae,
including
apple, sorbitol accounts
more
than 50% of the fixed carbon and the carbon
from the leaves (Bieleski,
1982). The enzy
exported
in
matic
for
the
of sorbitol
steps required
synthesis
source tissues and its metabolism
in sink tissues are
are well
these enzymes
known. Genes encoding
rep
resented in this apple NR set (Fig. 3). In source leaves,
from the same hexose
sorbitol is derived
phosphate
aldose 6-P reduc?ase
(EC
pool as Sue. The enzyme
is the rate-limiting
1.1.1.200; 11NR sequences)
step for
to sorbitol from the hexose phosphate
the conversion
sorbitol
and
Loescher,
1981), synthesizing
pool (Negm
6-P from Glc 6-P. Antisense
of the aldose
suppression
initiating Met
than 500 bases from the putative
Repeats more
UTRs
(and
stop
longer than 500) have been combined
bin set at the beginning
and end of the range, respectively.
B,
or stop codon
from the putative
(3'-SSR) in 50
initiating Met (5'-SSR)
to
ranked in order of most significant
base bin sets. NR sequences were
to decrease
to public domain databases
least significant
BLASTx match
and stop codon
putative
initiating Met
a BLASTx
NR sequences
with
top-ranked
for repeat
match more significant
than e-38 were manually
inspected
position. Also shown are the lengths of the UTRs (% of total for the same
the
influence
identifications.
of
The
start and
into one
of dinucleotide
from an automated
incorrect
predicted
100
repeats
are
that fit into the same bin set. The stop and start positions
dataset)
in between
these two indicates the distance
shown and the numbering
sites
with
analysis
significant
in B, the
including
than e-20
stop
of trinucleotide
C, Position
analysis.
to the putative
from an automated
initiating Met
more
a BLASTx match
with
all NR sequences
this automated
in relation
sites
in 15-base
have
not
predicted
with
this
analysis
automated
analysis.
152
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
Physiol.
Vol.
141,
2006
of Expressed
Analyses
IV. SNP
Table
Cumulative
NR
Total
of apple
analysis
NR
13.0
NR
from 42,938
+
sequences
18,408
predicted
(25,478 TC
17,460
singletons)
of predicted
occurrence
Average
bi-allelic
SNPs +
with
Mb
sequences
occurrence
Average
SNPs
bi-allelic
of predicted
in 706
in 144 bp
bp
sequences
base
polymorphic
one
NR
20.61%
with
sequences
SNPs
bi-allelic
predicted
no.
1.05
bi-allelic
Average
predicted
SNPs per NR sequence
Contig
Transitions
transversions
and
transitions
AG
4,592
CT transitions
Total
AC
5,112
transitions
9,704
transversions
1,516
AT transversions
3,508
CG
transversions
2,726
GT
transversions
2,570
Total
transversions
8,704
Total
18,408
Plant
Physiol.
Vol.
141,
2006
Tags
from Apple
and isopentyl-diphos
four NR
sequences),
?-isomerase
(EC 5.3.3.2; four NR
sequences).
phate
The progenitors
of the terpenoids, geranyl diphosphate,
and geranylgeranyl
farnesyl diphosphate,
diphosphate,
are
(EC 2.5.1.x;
synthases
synthesized
by polyisoprene
12 NR sequences). The sesquiterpenes
(E,E)- and (Z,E)
are produced
from farnesyl diphosphate
a-farnesene
the
a-farnesene
enzyme
by
synthase. The gene encod
has
been isolated and shown
a-farnesene
synthase
ing
to be up-regulated
in fruit during
(Pechous
ripening
and Whitaker,
2004). The ct-farnesene synthase gene is
in the apple
sequences
by three NR
represented
dataset. Other
sesquiterpenes
(e.g. ?-caryophyllene,
/3-farnesene,
D) and monoterpenes
germacrene
(e.g.
ocimene,
linalool) are produced
by apple (Bengtsson
et al., 2001); however,
the terpene synthases
responsi
are yet to be identified.
ble for their biosynthesis
from ripe
The major group of compounds
produced
fruit of apple cultivars,
such as Royal Gala, is esters
(Young et al., 1996, 2004),
including
straight-chain
esters derived
from fatty acids (Rowan et al., 1999)
esters derived
and branched-chain
from branched
chain amino acids (Rowan et al., 1996). Of the straight
are thought to be derived
chain esters, C-6 constituents
via the lipoxygenase
from linoleic acid (Fig.
pathway
5). The first committed
step is performed
by members
of the lipoxygenase
(EC 1.13.11.12), which
family
linoleic
acid from linoleic
produces
13-hydroperoxide
acid. A large number of candidate
have
lipoxygenases
4.1.1.33;
sequences
length of analyzed
entries
sequence
no. bi-allelic
SNPs
Sequence
Table V. MIPS
with
FunCat
of apple NR
analysis
sequences
compared
Apple NR
Sequences
Arabidopsis
Arabidopsis
No.
Functional Category
01
Metabolism
02
Cell
04
Storage protein
Cell cycle and DNA
10
11
5.39
fate
12
Transcription
Protein synthesis
14
Protein
16
Protein with
18
Protein
Cellular
30
Cellular
32
Cell
0.26
0.13
0.07
0.47
0.44
0.42
0.41
0.03
0.02
0.64
0.93
function
binding
2.29
1.17
2.30
or cofactor
20
2.13
processing
2.47
fate
%
4.09
requirement
activity
regulation
2.19
transport
1.39
2.07
communication/signal
transduction
mechanism
34
and virulence
rescue, defense,
Interaction with
the cellular
36
Interaction
2.02
1.19
1.02
0.10
0.03
0.16
0.07
0.01
0.48
environment
38
with
Transposable
and plasmid
40
Cell
41
Development
42
70
Biogenesis
Subcellular
98
Classification
99
Unclassified
the environment
viral
elements,
proteins
fate
0.24
0.11
0.31
of cellular
0.11
components
localization
not yet clear
proteins
153
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
cut
1.36
0.40
1.75
0.31
2.49
2.36
72.79
83.39
Newcomb
et al.
Table VI.
Fifty most
common
Inter-Pro
families
IPR000719
IPR002290
IPR001611
IPR008271
IPR001245
IPR000504
IPR001680
IPR007090
IPR001128
IPR001841
IPR001005
IPR001806
IPR000626
IPR002048
IPR002885
IPR002110
IPR001810
IPR006662
IPR000608
IPR001410
IPR001440
IPR002401
IPR001932
IPR001993
IPR001623
IPR001471
IPR002198
IPR001087
IPR001344
IPR001356
IPR007087
IPR002130
IPR003439
IPR002347
IPR001878
IPR000157
IPR000795
IPR000425
IPR002016
IPR001395
IPR000008
IPR002423
IPR004087
IPR000916
IPR001092
IPR001023
IPR000571
IPR000217
IPR001938
IPR000823
found
the apple NR
sequences
Description
Protein
Frequency
801
359
346
274
269
202
193
170
159
156
133
124
124
118
117
111
106
100
98
96
95
95
83
81
78
76
73
65
63
63
61
60
60
60
57
56
54
53
52
51
49
48
46
45
43
42
41
40
39
38
kinase
kinase
Ser-Thr protein
LRR
active
region
? VVD-40
RNP-1
site
(RNA
motif)
recognition
repeat
Ubiquitin
EF-hand
Calcium-binding
PPR repeat
Ankyrin
F-box
Cyclin-like
domain
Thioredoxin-type
enzymes
Ubiquitin-conjugating
repeat
E-class P450, group
Protein phosphatase
Mitochondrial
Heat
shock
I
2C-like
carrier
substrate
protein
DnaJ,
terminus
transcriptional
Pathogenesis-related
Short-chain
dehydrogenase/reductase
enzyme,
Lipolytic
Chlorophyll
Homeobox
factor and
ERF
SDR
G-D-S-L
protein
a/b-binding
Zinc
type
finger, C2H2
cis-trans
Peptidyl-prolyl
isomerase,
cyclophilin
type
ABC
transporter
Glucose/ribitol
dehydrogenase
Zinc finger, CCHC type
TIR
Protein
Major
Haem
factor, GTP
synthesis
intrinsic protein
binding
plant/fungal/bacterial
peroxidase,
reductase
Aldo/keto
C2 domain
Chaperonin
KH
Bet v
Cpn60/TCP-1
I allergen
Basic
helix-loop-helix
Heat-shock
protein
dimerization
Hsp70
Zinc finger, C-x8-C-x5-C-x3-H
Tubulin
domain
bHLH
type
Thaumatin,
pathogenesis-related
Plant peroxidase
in the apple
dataset
(41 NR
sequence
not
will
all
of
these
however,
necessarily
sequences);
in fruit. In tomato, at
in ester biosynthesis
be involved
least five lipoxygenase
genes have been identified, but
in the
one
has
been
of
these
implicated
directly
only
of flavor compounds
(Chen et al., 2004).
production
linoleic acid, the cytochrome
From 13-hydroperoxide
for the con
P450, hydroperoxide
lyase, is responsible
se
to the aldehyde,
hex-3-enal
version
(four NR
can
to
hex-2-enal
be converted
which
by
quences),
been
within
represented
Inter-Pro No.
another
P450, hydroperoxide
lyase (EC
cytochrome
Alcohol
three NR sequences).
4.2.1.92;
dehydrogen
to alcohols
ases can reduce the aldehydes
(EC 1.1.1.1;
To date, one alcohol dehydrogenase
37 NR sequences).
from apple and shown not to be
has been identified
under the control of ethylene
(Defilippi et al., 2005b).
to
can be converted
to alcohols, aldehydes
In addition
27
NR
acids by aldehyde
1.2.1.3;
(EC
dehydrogenases
are then able to be esterified with
sequences). Alcohols
Co A acids by alcohol
(EC 2.3.1.84;
acyl transf erases
154
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
Physiol.
Vol.
141,
2006
Analyses
common
10 most
Table VII.
The
automated
predictions
using
Top 10 TF
Family Descriptions
factor
No. Apple
NR Sequences
Inter-Pro
Accessions Nos.
MYB
138
related
Rathogenesis
76
66
C2H2 Zn finger 64
NAC 52
Basic
43
helix-loop-helix
1 Zn finger
41
C3H-type
WRKY 40
bZip 36
Total No.
aBased
TFs
on data
not determined
et al.
et al.
(2000).
are
characteristic
of
many
Physiol.
Vol.
141,
2006
on data
Tags
from Apple
of
Apple
Arabidopsis3
Riceb
14
1/ 9
1, 11,
2
7, 8,
10
NDC
ND
7
8
9
10
ND
18
ND
10
from Goff
et al.
(2002).
1,306
cFamily
(2002).
bBased
Sequence
TF Family Rank
1,470
from Riechmann
three NR
Branched-chain
by searches
identified
952
by Goff
cultivars
in apple
IPR001005,
IPR006447,
IPR000818,
IPR001471
IPR002991,
IPR000315,
IPR000679,
IPR003851,
IPR006780
IPR001356,
IPR003106,
IPR000047
IPR007087,
IPR003656
IPR008917,
IPR003441
IPR001092
IPR000571
IPR003657
IPR004827
C2C2 Zn finger 74
Homeobox
(TF) families
transcription
Inter-Pro
of Expressed
able to be hydrolyzed
(EC 3.1.1.1; two NR
by esterases
are
esterases
for
Such
responsible
sequences).
perhaps
in very ripe
found
the large quantities
of alcohols
apple fruit and apple juice.
Color and Health-Related
Compound Biosynthesis
and flavanols,
Flavonoids,
including
anthocyanins
are a class of secondary metabolites,
derived
from the
amino
that impart
beneficial
acid Phe,
important
their antioxidant
health attributes
probably
through
et
and
2003;
Liu, 2004). Rep
al.,
activity (Wolfe
Boyer
are found in
resentative
and
flavanols
anthocyanins
in
apple (McGhie et al., 2005). The major anthocyanins
are
which
the
apple
cyanidins,
produce
glycosylated
in the fruit of many
red color observed
cultivars,
is found
Gala.
The
flavanol
quercetin
including Royal
in apple
fruit and has also been
with
associated
health benefits. Representatives
of the genes involved
are present
in the apple
in flavonoid
biosynthesis
are
NR se
dataset
There
7).
sequence
(Fig.
multiple
for all the genes in the pathway,
quences
including Phe
ammonia
cin
lyase (EC 4.3.1.5; eight NR sequences),
se
namate
NR
(EC 1.14.13.11;
4-hydroxylase
eight
14
4-coumarate
CoA
NR
6.2.1.12;
quences),
ligase (EC
chalcone
(EC 2.3.1.74; 25 NR se
sequences),
synthase
isomerase
chalcone
(EC 5.5.1.6; nine NR
quences),
flavanone
(EC 1.14.11.9;
sequences),
3-hydroxylase
seven NR sequences),
and flavanone
3'-hydroxylase
From dihydroquer
(EC 1.14.13.21; six NR sequences).
cetin to the production
of anthocyanins,
the pathway
155
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Newcomb
et al.
Ethylene
synthesis
and signal
transduction
EnzymeESTs
Compounds
methionine
S-adenosyl-L-methionine
EC 2.5.1.6
synthase
(28,17,11,323)
I
S-adenosyl-L-methioine
I
1-aminocyclopropane-1
carboxylic
1 -aminocyclopropane-1
EC 4.4.1.14
-carboxylate
synthase
(10,10,0,10)
1 -aminocyclopropane-1
EC 1.14.17.4
-carboxylate
oxidase
(13,7,6,139)
acid
ethylene
V
ETR1,
receptors
ethylene
ETR2,
ERS1,
ERS2,
EIN4
(17,11,6,32)
I
MAP
Pkin
kinases
CTR1
CTR2
MPK6
(27,12,15,59)
(8,5,3,13)
(6,5,1,7)
I
receptor
;cepti
EIN2
(8,4,4,17)
EIN3,EIL1,EIL2
(18,7,11,51)
ERF1
ERF2
ERF3
ERF4
(21,8,13,319)
(6,2,4,19)
(15,4,11,238)
4
insensitive
ethylene
transcription
factors
I
ethylene
response
ethylene
response
factors
to leucoanthocyanidins
proceeds
produced
by dihy
droflavonol
reduc?ase
NR
1.1.1.219;
(EC
eight
to
then
sequences)
anthocyanidins
by anthocyanidin
(EC 1.14.11.19; six NR sequences).
synthase
Finally, the
are formed
red-colored
cyanidin 3-glycosides
through
the transfer of a sugar onto a hydroxyl
group by a
transferase
(EC 2.4.1.91; 26 NR sequences).
glycosyl
can be
Also, from dihydroquercetin,
quercetin
synthe
seven NR
sized by flavonol
1.14.11.23;
(EC
synthase
in turn can be glycosylated
which
sequences),
by
transferases.
Some members
of these gene
glycosyl
in the apple skin have been
families that are expressed
to be inducible by UV and coor
isolated and shown
in the skins of red apple varie
dinately up-regulated
ties (Kim et al, 2003; Ben-Yehudah
et al., 2005). In
MYB
the
members
of
factor
addition,
transcription
can
have
been
that
interact
with
identified
pro
family
moters
of these genes
(Hellens et al., 2005). Further
one MYB
has
more,
(MdMYBlO; one NR sequence)
in
been
that up-regulates
identified
this pathway
skin
R.P.
and
Hellens,
J. Putterill,
(R.V. Espley,
apple
A.C. Allan, personal
communication).
(10,3,7,104)
genes
Gene
Family Evolution
Within
sentatives
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
Physiol.
Vol.
141,
2006
Analyses
Sorbitol metabolism
Enzyme
Compounds
ESTs
from Apple
Tags
were
using
2003).
refer
the
database
in parentheses
number
of
(Wu et al.,
ESTs
under
apple
TC sequences,
quences,
singletons,
total number of ESTs, respectively.
glucose 6-phosphate
Nk
Sequence
Figure
v
aldose 6-phosphate reductase
EC 1.1.1.200
of Expressed
(11,6,5,177)
PIR reference
EC 3.1.3.50
gene
because
NR
se
and
*, No
for
currently available
the gene has yet to be
isolated.
sorbitol 6-phosphate
sorbitol 6-phosphatase
EC 3.1.3.50
sorbitol
sorbitol transporters
(18,7,11,97)
sorbitol dehydrogenase
EC 1.1.1.14
(23,15,8,299)
sorbitol
\|/
fructose
Physiol.
Vol.
141,
2006
157
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
et al.
Newcomb
Figure
pathway
4. The
via
terpene
biosynthesis
the mevalonate
pathway.
sequences
encoding
Apple
in the mevalonate
involved
enzymes
pathway
identified by BLASTx (e-05 cutoff)
(Wu et al.,
using the PIRNREF database
in parentheses
under
2003). Numbers
ESTs refer to the number of apple NR
Compounds
pathway
ESTs
Enzyme
were
TC sequences,
sequences,
singletons,
and total number of ESTs, respectively.
mevalonate
mevalonate
EC
kinase
(3,3,0,3)
2.7.1.36
5-phosphomevalonate
phosphomevalonate kinase
^
EC
(1,0,1,4)
2.7.4.2
5-diphosphomevalorate
isopentenyl diphosphate
N
polyisoprene
EC 5.3.3.2
\|
(4,2,2,17)
4.1.1.33
(4,1,3,22)
synthase
dimethylallyl diphosphate
(12,3,9,103)
isopentyl-diphosphatedelta-isomerase
V EC
2.5.1.x
[monoterpenes]
(12,3,9,103)
polyisoprene synthase
EC
2.5.1.x
4/
farnesyl diphosphate -^ [sesquiterpenes]
polyisoprene
(12,3,9,103)
synthase
EC 2.5.1.x
vj,
_geranylgeranyl
diphosphate
?^
[diterpenes]
squalene
EC
squalene
?^
(4,0,4,26)
synthase
2.5.1.21
[triterpenes]
squalene
EC
(13,7,6,38)
monoxygenase
1.14.99.7
4
sterols
phytoene
EC
M/
phytoene
?>
(4,2,2,18)
synthase
2.5.1.32
[carotenoids]
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
Physiol.
Vol.
141,
2006
Analyses
ESTs
Enzyme
of Expressed
Figure
thetic
Sequence
5. The
pathway
Tags
straight-chain
from fatty
sequences
encoding
ester
straight-chain
enzymes
from Apple
ester
biosyn
acids.
Apple
in
involved
biosynthesis
were
linoleic acid
M/
13 hydroperoxide
(41,34,17,165)
lipoxygenase
EC 1.13.11.12
linoleic acid
hydroperoxide
lyase
(4,1,3,29)
Numbers
[CYTP45074B]
hex-3-enal
hydroperoxide
EC
isomerase
(3,1,2,9)
4.2.1.92
hex-2-enal
aldehyde dehydrogenase
EC
hex-3-enoic
(27,10,17,143)
1.2.1.3
acid
alcohol dehydrogenase
EC
(37,19,18,153)
1.1.1.1
hex-3-enol
hex-3-enyl
it is expected
that 43,938 NR sequences
is an
Overall,
tran
overestimate
of protein-coding
of the number
in apple
scripts
(protein-coding
genes)
represented
and that more sequencing,
both of the cDNAs sampled
reduce this
here and novel cDNAs from apple, would
number
of NR sequences.
Other EST projects under
taken in fruit crops of a similar size in terms of total
number
of ESTs collected have reported
lower num
a
For example,
bers of NR sequences.
study of 152,635
tomato ESTs produced
31,012 NR sequences
(Fei et al.,
a
of
whereas
collection
146,075 grape ESTs
2004),
rendered
25,746 NR sequences
(Goes da Silva et al.,
to the higher
is likely due
2005). This
clustering
to the
in this study compared
threshold
(95%) used
tomato and grape
If
the
studies
(90%).
apple EST
a
is analyzed
dataset
90%
threshold,
using
clustering
are attained
in the
similar numbers
of NR sequences
com
other fruit EST studies
(34,614 NR sequences
TC
and
of
16,756
17,858 singletons).
sequences
posed
even this lower number of NR sequences
is
However,
an overestimate
to
be
number
in
of
the
of
genes
likely
the apple genome.
set esti
The Arabidopsis
unigene
a 35%
mated
from all Arabidopsis
ESTs produces
overestimate
of the actual number
of protein-coding
from the genome
this
genes estimated
sequence. Using
the
actual
number
of
genes
figure,
predicted
apple
when
data
27,000. However,
may be approximately
are
from full cDNA sequences
and ho
incorporated
to Arabidopsis
genes is taken into account, it is
mology
Plant
Physiol.
(3,1,2,10)
Vol.
141,
2006
here represents
ap
likely the apple NR set presented
one-half
the number
of expressed
genes
proximately
found in apple.
A common
feature of the cDNA sequences
obtained
from apple, and indeed other plants
(Morgante et al.,
2002), is the high frequency of SSRs contained within
8,028 of the 43,938 apple NR sequences
them, with
di- or trinucleotide
(19%) containing
repeats. Dinucle
in
otide repeats were most
the
100 bp imme
frequent
start AUG, whereas
5' of the presumptive
diately
in the coding
trinucleotide
repeats were more common
common
most
far
the
of
dinucleotide
class
region. By
repeat were AG repeats, making
up 88.3% of all di
nucleotide
repeats. Least frequent were CG repeats at
0.1%. This bias toward AG and/or
against CG repeats
to
be
due
the
of
to be
may
tendency
CpG sequences
et
which
al.,
1998),
methylated
(Finnegan
potentially
inhibit transcription.
Another
fea
might
interesting
ture of apple SSRs is the difference
between
the
relative
of
to
the
AG
other
dinucleotide
frequency
repeat types in transcribed
sequences
compared with
those found in genomic DNA
(Guilford et al., 1997).
For example,
in apple genomic DNA,
the AG repeats
are
more
common
60%
than AC re
approximately
in
whereas
transcribed
AG repeats
sequences
peats,
are almost 22 times (i.e. 2,200%) more
common
than
AC repeats. A similar bias is found in Arabidopsis
for this
(Zhang et al., 2004). One possible
explanation
an
is
that
there
is
role
active
phenomenon
being
159
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
et al.
Newcomb
Figure 6.
Ile. Apple
Branched-chain
ester
via
biosynthesis
involved
enzymes
sequences
encoding
ester biosynthesis
in branched-chain
were
identi
Branched
Enzyme ESTs
Compounds
threonine
in paren
Numbers
(Wu et al., 2003).
ESTs refer to the number of apple NR
TC sequences,
and total
singletons,
threonine
under
sequences,
number of ESTs, respectively.
via isoleucine
deaminase
(3,3,0,3)
EC 4.3.1.19
i
2-ketobutyrate
acetolactate
(8,2,6,28)
synthase
EC 2.2.1.6
i
2-acetohydroxybutyrate
acetohydroxyacid
EC
M/
2,3 -dihydroxy-3
isomeroreductase
-methylvalerate
dihydroxyacid dehydratase
(2,0,2,10)
EC 4.2.1.9
\^
2-keto-3
(2,1,1,14)
1.1.1.86
-methylvalerate
aminotransferase
(11,5,6,34)
EC 2.6.1.42
isoleucine
aminotransferase
(11,5,6,34)
EC 2.6.1.42
acid
2-oxo-3methylpentanoic
pyruvate
decarboxylase
(11,6,5,257)
EC 4.1.1.1
2-methylbutanal
alcohol
EC
(37,19,18,153)
dehydrogenase
1.1.1.1
2-methylbutanol
alcohol acyl
EC 2.3.1.84
(3,1,2,10)
transferase
(2,0,2,17)
EC 3.1.1.1
?carboxylesterase
2-methylbutylacetate
in plant
species. This
played
by these AG repeats
so
are
common
in
could also account
for why
they
to other repeats. Fac
transcribed
sequences
compared
in regulatory
tors that bind AG repeats
regions are
in both animals and plants (Epplen et al., 1996;
known
and O'Brian, 2002; Iglesias et al., 2004). Other
Sangwan
is by hyper
SSRs affect regulation
ways
potential
structure
and/or
(Jacobsen
secondary
methylation
et al, 2000).
se
in the apple
SNPs were
detected
Numerous
Mb
13.0
of
quence dataset. From a cumulative
length
of contiguous NR sequences
18,408 bi-allelic
sampled,
SNPs occur with a fre
SNPs were detected.
Bi-allelic
quency of one in every 706 bp of sequence. This is a
due to two
relatively high level of variation
probably
factors. The apple NR sequences, while predominantly
from the cultivar Royal Gala, also contain sequences
from six other cultivars,
Braeburn,
including Aotea,
and Northern
Pacific Rose, Pinkie, M9,
Spy. Also,
a strong incompatibility
system selecting
apple utilizes
self-crosses.
Therefore,
high levels of hetero
against
are
to trans
zygosity
expected. The ratio of transitions
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
Physiol.
Vol.
141,
2006
Analyses
phenylalanine
ammonia
phenylalanine
EC
Sequence
Tags
from Apple
7. Anthocyanin
and flavanol bio
en
in
sequences
apple. Apple
synthesis
in anthocyanin
involved
enzymes
coding
Figure
Enzyme
Compounds
of Expressed
(8,2,6,91)
lysase
4.3.1.5
were
identified
and flavanol biosynthesis
the PIR
(e-05 cutoff)
using
by BLASTx
NREF database
(Wu et al., 2003). Numbers
ESTs refer to the
under
in parentheses
of apple NR sequences,
single
and total number of
tons, TC sequences,
number
cinnamate
cinnimate
(8,3,5,261)
4-hydroxylase
ESTs, respectively.
EC 1.14.13.11
p-coumarate
CoA
4-coumarate
EC
(14,5,9,63)
ligase
6.2.1.12
p-coumaroyl-CoA
chalcone
EC
synthase
(25,16,9,67)
isomerase
(9,4,5,40)
2.3.1.74
chalcone
chalcone
EC
Nk
5.5.1.6
naringenin
flavanone
EC
Nk
3-hydroxylase
(7,4,3,17)
3'-hydroxylase
(6,3,3,33)
1.14.11.9
dihydrokaempferol
flavanone
EC
1.14.13.21
dihydroquercetin
4-reductase
dihydroflavonol
EC
(8,3,5,22)
1.1.1.219
leucoanthocyanidin
(6,3,3,35)
synthase
anthocyanidin
EC 1.14.11.19
Nk
anthocyanidin
UDP Glyc
EC
^
cyanidin
(26,15,11,78)
2.4.1.91
3-glycosides
flavonol synthase
1
(j
\ ,6,84)
1.14.11.23
EC
quercetin
UDP Glyc
(26,15,11,78)
2.4.1.91
EC
quercetin glycosides
for an additional
in its preference
three codons,
is correlated with
Plant
Vol.
Physiol.
141,
2006
trinucleotide
repeat class
=
in
0.75). These are also correlated
(R2
frequency
et
that
codon
al.,
2004), arguing
(Zhang
Arabidopsis
trinucle
usage is selecting repeat classes because most
otide repeats are found in the coding region (Morgante
in apple
et al., 2002). CpG suppression
is also evident
with a XCG:XCC ratio of 0.65, similar to that of tomato
level of suppression
of the CpG
(0.58). This modest
from that of nearly no
differs markedly
dinucleotides
level
in Arabidopsis
(0.92) to the high
suppression
reflect different
found in grape (0.35). This may well
161
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Newcomb
et al.
AtAT1
tree of Arabidopsis
(AtATs)
Figure 8. Phylogenetic
and apple (numbers) members
of the acyl transferase
in the
(AT). Apple ATs that have duplicated
family
are colored
green, orthologous
apple
apple
lineage
ATs are colored
red, and apple ATs for which
assign
ment
apple
those
are colored
blue. The whole
is ambiguous
image on the right of the apple ATs identifies
librar
that include ESTs from fruit tissue cDNA
ies, whereas
ATs that do
the crossed
not
have
libraries.
12234 M
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
Physiol.
Vol.
141,
2006
Analyses
there are 59 NR
genes, of which
type of resistance
within
the
(e-20). In
sequences
apple sequence dataset
of plant-specific
Inter-Pro class (IPR007090)
another
were
in the apple
found
LRRs, 47 NR
sequences
dataset.
Disease-resistance
gene candidates
sequence
in other Inter-Pro classes; for ex
will also be common
Physiol.
Vol.
141,
2006
of Expressed
Sequence
Tags
from Apple
MATERIALS
AND
Library Construction
Tissues were
Havelock
North
METHODS
and EST Sequencing
163
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
et al.
Newcomb
plasmid
apple (cv Pinkie) leaf library by production of Ml 3 phage from the library and
isolation of phage DNA, as described by Sambrook et al. (1989), and ssDNA and
double-stranded
DNA was selected using QiaexII resin, according
to the
manufacturer's
instructions.
ssDNA was hybridized
with PCR-amplified
driver DNA before isolation of rare ssDNA using hydroxyapatite
chromatog
raphy. Rare ssDNA was made double stranded and transformed into DH10B
as described by Bonaldo et al. (1996). Plasmids from the
using electroporation,
phage cDNA libraries were mass excised, according to the manufacturer's
recommendations
(Stratagene). Plasmid extractions were then undertaken on
or the plasmid
individual bacterial colonies of either the phage-derived
derived
cDNA
data
Sequence data from this article can be found in the GenBank/EMBL
libraries under accession numbers CN848772
to CN851520,
to
CN851527
to CN860109, CN860111
to CN861528, CN861730
to
CN852114, CN854524
to CN865258,
CN862091
to CN870966,
CN865263
CN870969
CN862087,
to CN875894, CN875896
to CN881602, CN881608
to CN881609, CN881619
to
to CN886998, CN887004
to CN890357, CN890361
to
CN884429, CN884434
to CN896142,
CN890413
to CN900284,
CN896144
CN900286
CN890409,
to CN901293, CN901299
to CN906863, CN906869
to CN907638, CN907715
to
manufacturer's
For determination
to CN914912, CN916097
CN914230
to CN920835, CN920840
to
to CN925934,
CN925028
to CN929310,
CN925939
CN929396
CN925026,
to CN932721, CN932727
to CN933610, CN933676
to CN937515, CN937517
to CN943462, CN943466
to CN949201, CN949206
to CN949208, CN949216
to
to CV126104, CV126106
CV126090
to CV126115, DR033885
to
CN949629,
M13F
DR033893,
predominantly
were resolved
CN914192,
EB105831
to EB157590,
for Arabidopsis
proteins following MIPS (http://mips.gsf.de)
FunCat schema (Ruepp et al., 2004). Apple sequences encoding enzymes involved
in secondary metabolite biosynthetic pathways were identified by BLASTx (e-05
cutoff) using
2003).
Detection
the Protein
Information Resource
(PIR)NREF database
(Wu et al.,
performed
predicted
full-length
proteins.
Clones were
predicted
to EB178034.
ACKNOWLEDGMENTS
Received
Bioinformatics
and EB175250
LITERATURE
February
February
22,
CITED
of strawberry
J Exp Bot 53:
2073-2087
Altschul
80: 187-192
Bieleski
if
164
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Plant
Physiol.
Vol.
141,
2006
Analyses
and subtraction:
MF, Lennon G, Soares MB (1996) Normalization
two approaches
to facilitate gene discovery. Genome Res 6: 791-806
and their health benefits.
Boyer J, Liu RH (2004) Apple phytochemicals
Nutr J 3: 5
and the origin of the Pomi
Challice
JS (1974) Rosaceae
chemotaxonomy
dae. Bot J Linn Soc 69: 239-259
Bonaldo
for
S, Puryear J, Cairney
J (1993) A simple and efficient method
isolating RNA from pine trees. Plant Mol Biol Rep 11: 113-116
D (2004)
Chen G, Hackett
R, Walker
D, Taylor A, Lin Z, Grierson
Identification
of a specific isoform of tomato lipoxygenase
(TomloxC)
in the generation
involved
of fatty acid-derived
flavor compounds.
Plant Physiol 136: 2641-2651
AM (2005) Antisense
Cheng L, Zhou R, Reidel EJ, Sharkey TD, Dandekar
of starch synthesis
inhibition of sorbitol synthesis leads to up-regulation
Chang
without
Dandekar
of Expressed
Sequence
Rice Genome
(2005) The map-based
Sequencing
Project
sequence of the rice genome. Nature 436: 793-800
EM (2000) Ectopic
Jacobsen SE, Sakai H, Finnegan EJ, Cao X, Meyerowitz
of flower-specific
genes in Arabidopsis. Curr Biol 10:
hypermethylation
179-186
SR, Rounsley
SD, Bush DF, Levin IM, Last RL (2002)
Jander G, Norris
era. Plant Physiol
Arabidopsis
map-based
cloning in the post-genome
International
129: 440-450
E (2000) Lovastatin
a-farnesene
synthesis without
in 'Golden Su
fruit ripening
during
Sei 215: 105-110
preme'
Kim S-H, Lee J-R, Hong S-T, Yoo Y-K, An G, Kim S-R (2003) Molecular
genes preferentially
cloning and analysis of anthocyanin
biosynthesis
in apple skin. Plant Sei 165: 403-413
expressed
Knee M
fruits. In GB Seymour,
(1993) Pomme
JE Taylor, GA Tucker,
Ju Z, Curry
affecting
inhibits
ethylene
production
apples. JAm Soc Hortic
MA (1992) A method
R, Gomez-Lim
Lopez-Gomez
RNA from fruits rich in polysaccharides
using
27: 440-442
HortScience
targets for differential binding of nuclear proteins. FEBS Lett 389: 92-95
Fei ZJ, Tang X, Alba RM, White
CM, Martin GB, Tanksley
JA, Ronning
EST analysis of tomato and
SD, Giovannoni
JJ (2004) Comprehensive
comparative
genomics of fruit ripening. Plant J 40: 47-59
Fellman
TW, Mattinson
DS, Mattheis
JK, Miller
JP (2000) Factors that
influence
HortScience
of volatile
biosynthesis
35: 1026-1033
RF (1987) Apple
for Fruit Crops.
flavor
compound
in apple
fruits.
rootstocks.
John Wiley
In RC Rom, RF Carlson,
& Sons, New York, pp
expressed
a regulatory
sequence
tags.
perspective.
Guilford
and cultivar
94:
identification.
Theor Appl Genet
polymorphism
249-254
Harker FR, Gunson
FA, Jaeger SR (2003) The case for fruit quality: an
review of consumer attitudes, and preferences
for apples.
interpretive
Postharvest
Biol Technol 28: 333-347
Hellens
RP, Allan A, Friel E, Bolitho K, Grafton K, Templeton
MD,
Karunairetnam
S, Gleave AP, Laing WA (2005) Transient expression
vectors for functional genomics, quantification
of promoter activity and
1: 13
RNA silencing in plants. Plant Methods
A (1999) CAP3: a DNA
Huang X, Madan
sequence
assembly program.
Plant
Res 9: 868-877
Physiol.
Vol.
141,
2006
100: 975-984
WH,
source-sink
70: 335-339
Mario
for extraction
ripe mango
of intact
mesocarp.
Genome
from Apple
of ethylene
AM, Kader AA (2005a) Relationship
BG, Dandekar
and precursor
to volatile
related enzymes,
biosynthesis
production,
in apple peel and flesh tissues. J Agrie Food Chem 53:
availability
3133-3141
aroma: alcohol
AM (2005b) Apple
BG, Kader AA, Dandekar
Defilippi
Defilippi
Tags
R, Grando
in plant
genomes.
Nat Genet
sequence
expressed
5: 208-217
Integr Genomics
Ohno
165
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Newcomb
et al.
in crop
Raf alski A (2002) Applications
of single nucleotide
polymorphisms
genetics. Curr Opin Plant Biol 5: 94-100
Rice P, Longden
I, Bleasby A (2000) EMBOSS: The European Molecular
Biology Open Software Suite. Trends Genet 16: 276-277
Riechmann
JL, Heard J,Martin GLR, Jiang C, Keddie
J, Adam L, Pineda
O, Ratcliffe
R, et al (2000) Arabidopsis
OJ, Samaha RR, Creelman
analysis among eukary
transcription factors: genome-wide
comparative
otes. Science 290: 2105-2110
Rowan DD, Allen JM, Fielder S, Hunt MB (1999) Biosynthesis
of straight
in Red Delicious
and Granny Smith apples using
chain ester volatiles
deuterium-labeled
JAgrie Food Chem 47: 2553-2562
precursors.
of
DD, Lane HP, Allen JM, Fielder S, Hunt MB (1996) Biosynthesis
esters in red
and 2-methylbutanoate
2-methyl-2-butenyl,
2-methylbutyl,
substrates.
delicious and Granny Smith apples using deuterium-labeled
JAgrie Food Chem 44: 3276-3285
Rowan
129: 1788-1794
too few genes, too many metab
Schwab W (2003) Metabolome
diversity:
62: 837-849
olites? Phytochemistry
factor families have
Shiu S-H, Shih M-C, Li W-H
(2005) Transcription
much
higher
139: 18-26
expansion
rates
in plants
than in animals.
Plant Physiol
RD
DR, Friel EN, Karunairetnam
S, Newcomb
Souleyre EJF, Greenwood
(2005) An alcohol acyl transfer ase from apple (cv. Royal Gala), MpAATl,
in apple fruit flavour. FEBS J 272: 3123-3144
esters involved
produces
DG (1997)
F, Jeanmougin
F, Higgins
JD, Gibson TJ, Plewniak
Thompson
The CLUSTAL_X
windows
interface: flexible
for multiple
strategies
aided by quality analysis tools. Nucleic Acids Res
sequence alignment
24: 4876-4882
Van der Hoeven
MG
C, Giovannoni
R, Ronning
J, Tanksley
(2002)
Deductions
about the number, organization,
and evolution of genes in
the tomato genome based on analysis of a larger expressed
sequence tag
and selective genomic sequencing.
Plant Cell 14: 1441-1456
Y, Yamaki
S, Yamada K, Toyofuku
K, Tabuchi
T,
J, Kobae
in
Shiratake K (2004) Identification
of sorbitol transporters
expressed
the phloem of apple source leaves. Plant Cell Physiol 45: 1031-1041
Wolfe K, Wu XZ, Liu RH (2003) Antioxidant
activity of apple peels. JAgrie
Food Chem 51: 609-614
collection
Watari
R (2001) InterProScan?an
EM, Apweiler
integration platform
in InterPro. Bioinformatics
17:
for the signature-recognition
methods
847-878
Zhang L, Yuan D, Yu S, Li Z, Cao Y, Miao Z, Qian H, Tang K (2004)
of simple sequence repeats in coding and non-coding
regions
20: 1081-1086
thaliana. Bioinformatics
R (2003) Purification
and characterization
of
L, Wayne
from apple leaves. Plant Sei 165:
phosphatase
sorbitol-6-phosphate
Preference
of Arabidopsis
Zhou R, Cheng
227-232
of Plants
166 Plant
This content downloaded from 103.254.86.9 on Sun, 11 Oct 2015 08:36:15 UTC
All use subject to JSTOR Terms and Conditions
Ed 3.
Vol.
2006
Physiol.
141,