Sie sind auf Seite 1von 7

Term Paper Submission

Topic: Plant gene structure, its importance and applications

Course Code : BTY 885
Course name: Agricultural biotechnology
Submitted to:
Miss Neha Kakked
Submitted By:
Roma Srivastava
BT MBA Biotech
Table of contents
Introduction- Plant genome 3
Genetic material 4-6
DNA content 7
Plant gene structure 8-9
Control sequences of different plants 10
TATA box and AGGA box 11
Other regulatory elements 12
Repeated sequences of regulating gene expression 13
Applications 14-15
References 16
Genetically, a plant genome is the most complex one found in living systems. It
comprises three interacting genome. Aside from the nuclear genome, complete gene
tic systems are located in the plastids and the mitochondria. These organelles a
re semiautonomous bodies; they have their own organizational and functional prop
erties but do not synthesize all their own proteins. The nuclear genome plays an
important role in organelle biogenesis.
Techniques in molecular cloning and deoxyribonucleic acid (DNA) sequencing have
provided the tools for studying the structure of genes at the nucleotide level.
Hence our knowledge of the structure, organization and expression of a plant gen
ome has come largely through, the use of recombinant DNA techniques. This techno
logy allows isolation and characterization of specific pieces of DNA by cloning
the DNA sequences into bacterial cells in which they can be replicated and large
quantities obtained for analysis.
In addition to supplying much basic information concerning gene structure and ex
pression, recombinant DNA technology enables specific manipulation of genetic ma
terial and the transference of such material between different organisms. These
types of genetic manipulations may ultimately give us the opportunity to enginee
r agricultural crops and industrially important plants in order to, tailor them
more closely to fulfill the needs of human beings.

Several lines of indirect evidence have long suggested that DNA contains the gen
etic information of the living organism. Results obtained from several different
experimental procedures showed that most of the DNA is located in the chromosom
es, whereas the RNA and proteins are also abundant in the cytoplasm.
Moreover, a precise correlation exists between the amount of DNA per cell and th
e number of sets of chromosomes per cell. That is, most somatic cells of diploid
organisms for example, contain exactly twice the amount of DNA as the haploid g
erm cells or gametes of the same species. The molecular composition of the DNA i
n all the different cells of an organism is the same, whereas the composition of
RNA and protein varies both qualitatively and quantitatively from one cell type
to another.
Direct evidence showing that the genetic material is DNA rather than protein or
RNA was published by Avery, MacLeed and McCarty in 1944. They demonstrated that
the component of the cell responsible for the phenomenon of transformation in th
e bacterium Diplococcus pneumonia was DNA.
The genetic information of all living organisms, except the RNA viruses, are sto
red in DNA. Nucleic acids, originally called 'nuclein' because they were isolate
d from cell nuclei by F. Miescher, in 1869, are macromolecules composed of repea
ting subunits called nucleotides.
Each nucleotide is composed of
(1) a phosphate group,
(2)a five-carbon sugar, and
(3) a cyclic nitrogen are commonly found in DNA: adenine, guanine, thymine, and

RNA also usually contains adenine, guanine, and cytosine but has a different bas
e, uracil, in place of thymine. Adenine and guanine are double-ring bases called
purines; cytocine, thymine, and uracil are single ring bases called pyrimidines
. Hence both DNA and RNA contain four different subunits or nucleotides two puri
ne nucleotides and two pyrimidine nucleotides. RNA usually exists as a single st
randed polymer composed of a long sequence of nucleotides. DNA, however, has one
very important additional level of organization; it is usually a double strande
d molecule.
The correct structure of DNA was first deduced by Watson and Crick in 1953, as
a double helix in which the two polynucleotide chains are coiled about one anoth
er in a spiral. Each polynucleotide chain consists of a sequence of nucleotides
linked together by phosphodiester bonds, joining adjacent deoxyribose moieties.
The two polynucleotide strands are held together in their helical configuration
by hydrogen bonding between bases in opposing strands, the resultant base pairs
being stacked between the two chains perpendicular to the axis of the molecule l
ike the steps of a spiral staircase. The base pairing is specific adenine is alw
ays paired with thymine and guanine is always paired with cytosine. Thus, all ba
se pairs consist of one purine and one pyrimidine.
Once the sequence of bases on one strand of a DNA in a double helix is known, th
e sequence of bases in the other strand is also known because of specific base p
airing. The two strands of a DNA in a double helix are thus said to be complemen
However, the structure of DNA molecules changes as a function of their environm
ent. The exact conformation of a given DNA molecule or segment of a DNA molecule
will depend on the nature of the molecule with which it is interacting.
Certain DNA sequences have been shown to exist in a unique, left handed double h
elical form called Z-DNA (Z for the zigzag path of the sugar phosphate backbones
of the structure). In a high concentration of salts or in a dehydrated state DN
A exists in the A-form which has 11 nucleotide pairs per turn.
It is very unlikely that DNA molecules ever exist in the A-form in VIVO. The hel
ices of the A- and B-forms of DNA are wound clockwise. They can undergo conforma
tional shifts from the B-form to the Z-form and vice versa. Certain regulatory p
roteins are bound to the Z-form or B-form of a DNA sequence and cause it to shif
t to the B-form or Z-form.

Plant cells contain large amounts of DNA, the amount being very variable between
species from the smallest (Arabidopsis thaliana, 0.5 picogram (pg) per haploid
genome) to some members of Loranthaceae (with over 100 pg per haploid genome). E
ven the smallest plant genome is about five times larger than that found in Dros
ophila melanogaster and contains much more DNA than is required to specify all t
he proteins synthesized during the course, of development.

Function of this additional DNA - By using renaturation kinetics analysis it has

been possible to analyze various plant genomes for additional (repetitive) DNA.
Analysis of renaturation kinetics relies on the fact that when double-stranded
DNA is denatured into two single strands, then given a suitable temperature and
ionic environment, the two strands will anneal again. The process occurs by rand
om collision, so the role of reannealing depends on the initial concentrations o
f the different sequences present.
For example, in Pisum sativum only 15% of the DNA behaves as single or low copy
number sequences, while the remainder consists of sequences repeated many times.
In the larger plant genomes, in which there are higher percentages of repetitive
DNA, it is likely that interspersed repeat sequences will be found adjacent to
other repeats at least as often as they are found next to single copy DNA. It is
sometimes possible to distinguish families of moderately repetitive and highly
repetitive sequences. Many families of repetitive DNA have now been characterize
d in physical terms but their function, along with that of excess nonrepetitive
DNA, is not well understood. One exception is the family of genes coding for cyt
oplasmic ribosomal RNA; these genes are highly reiterated in plant genomes and a
re present in DNA located at the nucleol
Plant ribosomal RNA genes and a number of other structural genes from a variety
of species have now been analyzed in considerable detail. In common with many an
imal genes, some plant gene sequences have been found to have their coding seque
ncers interrupted by introns or intervening sequences.
These introns are transcribed but not represented in mature mRNA and hence are n
ot translated. No introns have been found in rRNA genes but they have been demon
strated in a number of other plant structural genes.
Introns and Exons
An intron is a noncoding section of a gene that is removed from RNA before tran
slation in cells of higher organisms. An exon is the coding section of a gene. T
he coding sequences are interrupted by intervening noncoding sequences. Introns
are detected as DNA sequences that lie within a gene yet do not appear in mature
The best way to detect and characterize introns is by comparing the complete nuc
leotide sequences of DNA to the mRNA using genomic and cDNA clones. Cloned genom
ic DNA is annealed with RNA and the products examined under an electron microsco
Since the intron sequences are not contained in the RNA, they will form single s
tranded loops in the heteroduplex molecule. If these loops are sufficiently larg
e (more than 50 nucleotides in length), they can be seen and their approximate p
ositions in the gene mapped from an electron micrograph.
The evolutionary and functional significance of an intron sequence has been much
discussed. One of the most appealing hypotheses, originally put forward by W. G
ilbert of Harvard University in 1981, is that the presence of introns is related
to an evolutionary process by which new genes arise from recombination events t
hat juxtapose previously existing genes or portions of genes.
Given the ability to splice transcripts from two or more DNA segments into a sin
gle mRNA, "compound" genes may form.The number of introns is highly variable in
both plant and animal genes. There are a number of plant genes with no introns a
t all, such as the zein storage protein genes of maize, two other soybean protei
n genes, one rare class leaf gene and most cab genes (nuclear genes encoding the
chlorophyll a/b-binding protein of the photosystem).
Other plant genes have two or three introns. Nucleotide sequence data for the Fr
ench bean (Phaseolus) genes, soybean leghemoglobin genes, a soybean actin gene,
soybean glycinin gene, a rare class leaf gene and genes for the small subunit of
RuBP carboxylase, has shown that all these plant genes have two to three intron
A few plant genes have larger numbers and or longer intron sequences. There are
nine introns in the maize Adh genes and five introns in the phytochrome gene. Wh
ere intervening sequences are present, all plant genes are identical in the firs
t and last two bases of the introns GT and AG respectively.
Conservation of the two dinucleotides at the intron/exon junctions in all eukary
otic genes suggests that similar RNA splicing mechanisms are involved. Compariso
ns of plant storage proteins with animal storage proteins have shown that plants
have simpler intergenic structures.


By comparing DNA, sequence data for several different plant genes it is possible
to identify a number of conserved regions which, by analogy with animal systems
, seem likely to be important for accurate transcription and processing of RNA.
Beginning at the 3' (downstream) end and working backward, one first encounters
the poly (A) addition signals.
Poly (A) is added post-transcriptionally to most eukaryotic mRNAs. In most anima
l systems it has been shown that a short sequence near the 3'-end of ' the mRNA
contains the information necessary for proper 3'-end processing and poly(A) addi
tion. This sequence is AATAAA. It is located at about 10-33 bases upstream from
the poly (A) tail. In vitro mutagenesis experiments have shown it to be required
for polyadenylation although other sequences further downstream may also be imp
ortant. Two sequences with homology to the AATAAA sequence near 3'-ends have bee
n observed in plants. Multiple polyadenylation sites occur more often in plant g
enes than in animal genes and there is more variation in the sequences of these
elements in plant genes.
Is the structure of the 3'-end of plant genes similar to animal or yeast genes?
The animal consensus sequence AATAAA has been found in all the plant genes of t
he zein multigene family examined so far, with the exception of the 849 subfamil
y of zein. This subfamily, which includes Z7, 849, and pZ22.3, has the sequence
AATAAT instead of the normal AATAAA.
Even though most plant genes have AATAAA, its location with respect of the poly(
A) tail differs from that in animal genes. In all the leg hemoglobin genes, most
of the zein genes and the alcohol dehydrogenase genes of maize the AATAAA is cl
oser to the stop codon of the protein coding sequences than to their poly (A) ta
Within the coding region plant genes do not seem to differ much from animal gene
s. Translation stop and intron splicing signals are similar and the initiator co
don (ATG in DNA, AUG in RNA) occurs within a consensus sequence very similar to
its counterpart in animal genes. Most of the transcriptional control sequences a
re located in the region 5' to .the start of transcription.
Upstream from the cap site one normally encounters one or two short sequence ele
ments that are common to many eukaryotic genes and that seem to be involved in n
ormal transcription. Of the sequences thought to be important in transcription i
nitiation, the TATA box or Goldberg Hogness Box is the best characterized.

This sequence, or one very similar to it, occurs at about nucleotide -30 (30 nuc
leotides upstream from the cap site) in almost all plant and animal genes. This
sequence is required for correct expression of eukaryotic genes in vitro and acc
urate, efficient initiation in vivo.
Since plant genes are also transcribed by RNA polymerase it is not surprising th
at all the plant genes have a sequence analogous to the TATA box. The distance b
etween the TAT A box (measured from the second T) and the cap site is 29-33 nucl
eotides, as in other eukaryotes.
Another sequence that may be involved in regulation of transcription of some euk
aryotic genes is the consensus sequence "CAAT" box or GG(CT)CAATCT. This sequenc
e is often found -40 and -180 nuleotide upstream of the cap site. The sequence o
f this element is much more variable but often includes the sequence CAAT, terme
d the "CAAT Box". In plants it is referred to as the AGGA box. Two zein genomic
clones, Z4 and Z7, have sequences with limited homology to the CAAT box. The leg
hemoglobin genes have three sequences upstream of the coding region, all with so
me homology to the animal sequence. All of plant sequences have an interesting s
ymmetry of adenines surrounding the trinucleotide (G/T)NG.
Other regulatory elements also occur in this region. In general, these elements
seem to modify the promoter specificity, making transcription of the gene respo
nsive to particular environmental signals. Promoter specificity can often be dem
onstrated by attaching promoter containing sequences of one gene to the coding s
equences of another and introducing the resultant hybrid gene into cells in whic
h it can be expressed.
Although a few cases are known wherein regulatory elements occur in the gene its
elf (a prime example is 5S rRNA genes transcribed by RNA polymerase III), transc
riptional responses usually seem to be determined by the sequences5' to the star
t of transcription, often called 5'-flanking sequences.
A good example is provided by the heat shock genes which are activated when cell
s are subjected to thermal stress. In both animals and plants it has been Possib
le to show that the 5' flanking sequences from a heat shock gene can confer resp
onsiveness to a gene not normally responsive to heat shock. This element seems t
o be required for responsiveness to heat shock and is located in a region which
can be shown by direct binding experiments to interact with a protein transcript
ion factor. For example, in Drosophila the flanking sequences of different heat
shock genes contain a common element that is 14 nucleotides long and located 11-
28 bases upstream from the TATA box.
Another most interesting class of control elements is the enhancer sequences. Th
ese elements were originally discovered in viral genes, in which they were shown
to be required for high levels of transcription. An "enhancer core" sequence su
ch as GGTGTGGAAAG, or more generally GTGGT/AT/AT/AG ('II A' means that either T
or A is found in this position), can be identified by sequence comparisons, alth
ough it is clear that sequences other than the core are also important.
Enhancer sequences are often associated with regions of alternating purines and
pyrimidines which are capable of forming Z-DNA. Enhancers like sequences in plan
ts stimulate the expression of genes such as RuBP carboxylase in cells grown in
light but not those grown in the dark.
In a strict sense, some of the control sequences just described is really repeat
ed sequences. Short elements, such as the TATA or CAAT boxes, appear in many pro
moters and hence are repeated, but because they are so short are not recognized
in hybridization experiments.

Enhancers may sometimes contain several repeats of a somewhat longer sequence (s

uch as the 72-nucleotide repeats of the SV40 enhancer), By themselves, however,
these do not reach sufficiently high copy numbers to be detected in classical re
association experiments and no extensive sequence homology exists among enhancer
s as a class, although some short homologies have been found. Therefore, none of
the gene regulatory elements discussed thus far can be considered "classical" r
epeated sequences.
Evidence for the involvement of "classical" repeated sequences in gene regulatio
n is far more circumstantial and is based on the occurrence of repeat sequence t
ranscripts in the RNA population. Although most mRNAs are transcripts of single
copy genes, repeat transcripts can often be found when Cloned repeat sequence pr
obes are used to increase sensitivity. This is especially true in nuclear RNAs,
wherein the RNA contains transcripts of many interspersed repeat sequences.
An example is the transcription factor which binds to the heat shock promoter in
Drosophila. Genes encod¬ing this factor would be examples of trans-acting control
elements. As their name implies, such elements are capable of affecting genes ot
her than those located on the same chromosome. Transacting regulatory genes are
well known from genetic studies but our understanding of the molecular nature of
such elements, and the factors which presumably mediate their effects is still
quite limited.
Alternative splicing of potato invertase premRNAs
Splicing of very small exons (mini-exons) in vertebrate pre-mRNAs often requires
sequences in the flanking intron(s) which promote inclusion of the mini-exon. S
uch elements, called intronic splicing enhancers induce spliceosome assembly at
splice sites which would otherwise not be utilised by the spliceosome. In plants
invertase is encoded by a family of genes the majority of which include a secon
d exon of only 9 nt. This plant mini-exon encodes a highly conserved motif thoug
ht to be essential to the function of the enzyme. It has been shown to be altern
atively spliced (skipped) under cold stress. To examine control of splicing of t
he mini-exon, we have expressed as transcript consisting of part of exon 1, intr
on 1, mini-exon 2, intron 2 and part of exon 3 and in tobacco protoplasts shown
inclusion of exon 2. Subsequently, a series of introon exchange constructs were
made where introns 1 and 2 were swapped with introns 4 and 6 of another invertas
e gene. Splicing analysis of these constructs has demonstrated that the intron u
pstream of the mini-exon (intron 1) is required for miniexon splicing.
A molecular characterisation of Sucrose phosphate synthase in Lolium multiflorum
The enzyme Sucrose phosphate synthase (SPS) catalyses the penultimate step in su
crose biosynthesis and is a key enzyme in the regulation of carbon partitioning
between sucrose and starch. SPS activity has been correlated with the partitioni
ng of carbon between sucrose, starch and amino acids however at the molecular le
vel the expression and regulation of this enzyme is largely unresolved. SPS gene
s have been cloned from a number of species and transgenic plants have been prod
uced. Thus far these studies have concerned relatively few species (Tomato, Arab
idopsis, Spinach).
The aims of this work are to;
(1) Isolate native Lolium SPS genes and study the regulation of their expression
(2) Over-express SPS cDNAâ s from Maize and Spinach to study the effects of this on t
he expression of the native genes and to determine whether increased foliar SPS
activity effects carbon partitioning.
By conventional screening of a cDNA bacteriophage library and PCR reactions we a
re attempting to clone Lolium SPS cDNAâ s to study the expression of this enzyme in w
ild type and transformed plants at various developmental stages. . We have selec
ted plasmids encoding Maize or Spinach SPS cDNAs suitable for the biolistic tran
sformation of Lolium multiflorum cell suspensions. Transformed plants will be ut
ilised to assess the impact of SPS over-expression on carbon metabolism at ambie
nt and low temperatures.
Functional studies into the role of ACCase I in Brassica napus
The first committed step in fatty acid biosynthesis is the ATP dependent carboxy
lation of acetyl Co A to malonyl CoA catalysed by Acetyl CoA carboxylase (ACCase
EC). Dicot plants contain two forms of this enzyme. ACCase I is a single multif
unctional polypeptide of 220-250 kDa, whereas ACCase II is a multisubunit form a
nalogous to those identified in prokaryotes. The proposed role for each form hav
e been based upon their respective cellular
localisations. ACCase I is cytosolic where it is believed to provide malonyl CoA
for fatty acid elongation and flavanoid synthesis. Whereas ACCase II have been
localised to the chloroplast the site for fatty acid synthesis. These studies ha
ve been confirmed by the presence of putative chloroplast transit peptides (CTP)
on cDNAs of ACCase II. However recently two isoforms of ACCase I from Brassica
napus have been reported that contain a predicted CTP that targets green fluores
cent protein (GFP) to chloroplasts of tobacco (1), suggesting a possible role fo
r ACCase I in fatty acid biosynthesis. In order to further examine the role of A
CCase I, antisense constructs have been made to ACCase I. Several lines showing
reduced enzyme activity have been identified.
The Regulation of Seed oil breakdown in Arabidopsis
Oilseeds such as Arabidopsis store carbon in the form of triacylglycerol (TAG) a
nd this is mobilised to fuel seedling establishment and post-germinative growth.
Fatty acids from TAG are catabolised by b-oxidation, producing acetyl-CoA that
is converted to sucrose through the glyoxylate cycle and gluconeogenesis. The ab
undance of key enzymes in lipid mobilisation peaks at 2 days post germination an
d northern blots demonstrate that this expression pattern is regulated predomina
ntly at the transcript level. Furthermore, promoter::reporter fusions for PED1 (
thiolase), malate synthase, isocitrate lyase and phosphoenolpyruvate carboxykina
show that the rate of transcription is the pivotal regulatory step. To further i
nvestigate the factors required for this regulation, a transgenic line expressin
g the luciferase reporter gene under the control of the malate synthase promoter
was mutagenised. 14 mutants designated
REGULATOR OF THE GLYOXYLATE CYCLE (RGC) have been identified that show an increa
se in luciferase activity after germination.

Verwandte Interessen