Sie sind auf Seite 1von 19


com Cloning Vectors


Cloning vector
A cloning vector is a small piece of DNA, taken from a virus, a plasmid, or the cell of a higher organism,
that can be stably maintained in an organism, and into which a foreign DNA fragment can be inserted
for cloning purposes. The vector therefore contains features that allow for the convenient insertion or
removal of DNA fragment in or out of the vector, for example by treating the vector and the foreign DNA
with a restriction enzyme that creates the same overhang, then ligating the fragments together. After a
DNA fragment has been cloned into a cloning vector, it may be further sub cloned into another vector
designed for more specific use.
There are many types of cloning vectors, but the most commonly-used ones are genetically
engineered plasmids. Cloning is generally first performed using Escherichia coli, and cloning vectors in E.
coli include plasmids, bacteriophages (such as phage ), Cosmids, and bacterial artificial
chromosomes (BACs). Some DNA however cannot be stably maintained in E. coli, for example very large
DNA fragment, and other organisms such as yeast may be used. Cloning vectors in yeast include yeast
artificial chromosomes (YACs).
Features of a cloning vector
All commonly-used cloning vectors in molecular biology have key features necessary for their function
such as a suitable cloning site and selectable marker. Others may have additional features specific to
their use. For reason of ease and convenience, cloning is often performed using E. coli, the cloning
vectors used therefore often have elements necessary for their propagation and maintenance in E.
coli such as a functional origin of replication (ori). The ColE1 origin of replication is found in many
plasmids. Some vectors also include elements that allow them to be maintained in another organism in
addition to E. coli, and these vectors are called shuttle vector.
Cloning site
All cloning vectors have features that allow a gene to be conveniently inserted into the vector or removed
from it. This may be a multiple cloning site (MCS) which contains many unique restriction sites. The
restriction sites in the MCS are first cleaved by restriction enzymes, and a PCR-amplified target gene,
also digested with the same enzymes, is then ligated into the vectors using DNA ligase. The target DNA
sequence can be inserted into the vector in a specific direction if so desired. The restriction sites may be
further used for sub-cloning into another vector if necessary.
Other cloning vectors may use topo isomerase instead of ligase and cloning may be done more rapidly
without the need for restriction digest of the vector or insert. In this TOPO cloning method a linearized
vector is activated by attaching topo isomerase I to its ends, and this "TOPO-activated" vector may then
accept a PCR product by ligating both the 5' ends of the PCR product, releasing the topo isomerase and Cloning Vectors


forming a circular vector in the process. Another method of cloning without the use of ligase is by DNA
recombination, for example as used in the Gateway cloning system, The gene, once cloned into the
cloning vector (called entry clone in this method), may be conveniently introduced into a variety of
expression vectors by recombination.
Selectable marker
A selectable marker is carried by the vector to allow the selection of
positively transformed cells. Antibiotic resistance is often used as marker, an example is the beta-
lactamase gene which confers resistance to the penicillin group of beta-lactam antibiotics like ampicillin.
Some vectors contain two selectable markers, for example the plasmid pACYC177 has both ampicillin
and kanamycin resistance gene. Shuttle vector which is designed to be maintained in two different
organisms may also require two selectable markers, although some selectable markers such as
resistance to zeocin and hygromycin B are effective in different cell types. Auxotrophic selection markers
that allow an auxotrophic organism to grow in minimal growth medium may also be used; examples of
these are LEU2 and URA3 which are used with their corresponding auxotrophic strains of yeast.
Reporter gene
In order to facilitate the screening of successful clone, some cloning vectors contain features that allow
successful clone to be identified. Such features present in cloning vectors may be the lacZ fragment for
complementation in blue-white selection, and/or marker gene or reporter genes in frame with and
flanking the MCS to facilitate the production of fusion proteins. Examples of fusion partners that may be
used for screening are the green fluorescent protein (GFP) and luciferase.
Elements for expression
A cloning vector need not contain suitable elements for the expression of a cloned target gene, many
however do, and may then work as an expression vector. The target DNA may be inserted into a site that
is under the control of a particular promoter necessary for the expression of the target gene in the chosen
host. Where the promoter is present, the expression of the gene is preferably tightly controlled
and inducible so that proteins are only produced when required. Some commonly used promoters are T7
promoters, lacpromoters. The presence of a promoter is necessary when screening techniques such
as blue-white selection are used.
Cloning vectors without promoter and ribosomal binding site (RBS) for the cloned DNA sequence are
sometimes used, for example when cloning genes whose products are toxic to E. coli cells. Promoter and
RBS for the cloned DNA sequence are also unnecessary when first making a genomic or cDNA library of
clones since the cloned genes are normally sub cloned into a more appropriate expression vector if their
expression is required. Cloning Vectors


Types of cloning vectors
A large number of cloning vectors are available, and choosing the vector may depend a number of
factors, such as the size of the insert, copy number and cloning method. Large insert may not be stably
maintained in a general cloning vector, especially for those with a high copy number, therefore cloning
large fragments may require more specialized cloning vector.

The pUC plasmid has a high copy number, contains a multiple cloning site, a gene for ampicillin antibiotic
selection, and can be used for blue-white screen.

Plasmids are the standard cloning vectors and the most commonly used. Most general plasmids may be
used to clone DNA insert of up to 15 kb in size. One of the earliest commonly used cloning vectors is
the pBR322 plasmid. Other cloning vectors include the pUC series of plasmids, and a large number of
different cloning plasmid vectors are available. Many plasmids have high copy number, for
examplepUC19 which has a copy number of 500-700 copies per cell, and high copy number is useful as it
produces greater yield of recombinant plasmid for subsequent manipulation. However low-copy-number
plasmids may be preferably used in certain circumstances, for example, when the protein from the cloned
gene is toxic to the cells.
Some plasmids contain an M13 bacteriophage origin of replication and may be used to generate single-
stranded DNA. These are called phagemid, and examples are the pBluescript series of cloning vectors.
The bacteriophage used for cloning are the phage and M13 phage. There is an upper limit on the
amount of DNA that can be packed into a phage (a maximum of 53 kb), therefore to allow foreign DNA to
be inserted into phage DNA, phage cloning vectors need to have some non-essential genes deleted, for Cloning Vectors


example the genes for lysogeny in phage . There are two kinds of phage vectors - insertion vector and
replacement vector. Insertion vectors contain a unique cleavage site whereby foreign DNA with size of 5
11 kb may be inserted. In replacement vectors, the cleavage sites flank a region containing genes not
essential for the lytic cycle, and this region may be deleted and replaced by the DNA insert in the cloning
process, and a larger sized DNA of 824 kb may be inserted.
There is also a lower size limit for DNA that can be packed into a phage, and vectors that are too small
cannot be properly packaged into the phage. This property can be used for selection - vector without
insert may be too small, therefore only vectors with insert may be selected for propagation.
Cosmids are plasmids that incorporate a segment of bacteriophage DNA that has the cohesive end site
(cos) which contains elements required for packaging DNA into particles. It is normally used to clone
large DNA fragments between 28 to 45 Kb.
Bacterial artificial chromosome
Insert size of up to 350 kb can be cloned in bacterial artificial chromosome (BAC). BACs are maintained
in E. coli with a copy number of only 1 per cell. BACS are based on F plasmid, another artificial
chromosome called the PAC is based on the P1 phage.
Yeast artificial chromosome
Insert of up to 3,000 kb may be carried by yeast artificial chromosome.
Human artificial chromosome
Human artificial chromosome may be potentially useful as a gene transfer vectors for gene delivery into
human cells, and a tool for expression studies and determining human chromosome function. It can carry
very large DNA fragment (there is no upper limit on size for practical purposes), therefore it does not have
the problem of limited cloning capacity of other vectors, and it also avoids possible insertional
mutagenesis caused by integration into host chromosomes by viral vector.
Screening: example of the blue/white screen
Many general purpose vectors such as pUC19 usually include a system for detecting the presence of a
cloned DNA fragment, based on the loss of an easily scored phenotype. The most widely used is the
gene coding for E. coli -galactosidase, whose activity can easily be detected by the ability of the enzyme
it encodes to hydrolyze the soluble, colour less substrate X-gal (5-bromo-4-chloro-3-indolyl-beta-d-
galactoside) into an insoluble, blue product (5,5'-dibromo-4,4'-dichloro indigo). Cloning a fragment of DNA
within the vector-based lacZ sequence of the -galactosidase prevents the production of an active
enzyme. If X-gal is included in the selective agar plates, transformant colonies are generally blue in the Cloning Vectors


case of a vector with no inserted DNA and white in the case of a vector containing a fragment of cloned

SV40 is an abbreviation for Simian vacuolating virus 40 or Simian virus 40, a polyomavirus that is
found in both monkeys and humans. Like other polyomaviruses, SV40 is a DNA virus that has the
potential to cause tumors, but most often persists as a latent infection.
SV40 became a highly controversial subject after it was revealed that millions were exposed to the virus
after receiving a contaminated polio vaccine.
The virus was first identified by Bernice E. Eddy (based on a work of her and Sarah
Stewart about Polyoma viruses) in 1960 in cultures of rhesus monkey kidney cells that were being used
to produce polio vaccineIt was named for the effect it produced on infected green monkey cells, which
developed an unusual number of vacuoles. This observation was repeated and confirmed by Hilleman
and Sweet who were employed by Merck in their vaccine division. The complete viral genome was
sequenced by Walter Fiers and his team at the University of Ghent (Belgium) in 1978.
The virus is
dormant and is asymptomatic in Rhesus monkeys. The virus has been found in
many macaque populations in the wild, where it rarely causes disease. However, in monkeys that
are immunodeficientdue to, for example, infection with Simian immunodeficiency virusSV40 acts
much like the human JC and BK polyomaviruses, producing kidney disease and sometimes
a demyelinating disease similar to PML. In other species, particularly hamsters, SV40 causes a variety of
tumors, generally sarcomas. In rats, the oncogenic SV40 Large T-antigen was used to establish a brain
tumor model for PNETs and medulloblastomas.
The molecular mechanisms by which the virus reproduces and alters cell function were previously
unknown, and research into SV40 vastly increased biologists' understanding ofgene expression and the
regulation of cell growth.
SV40 consists of an unenveloped icosahedral virion with a closed circular dsDNA genome of 5kb. The
virion adheres to cell surface receptors of MHC class 1 by the virion glycoprotein VP1. Penetration into
the cell is through a caveolin vesicle. Inside the cell nucleus, the cellular RNA polymerase II acts to Cloning Vectors


promote early gene expression. This results in an mRNA that is spliced into two segments. The small and
large T antigens result from this. The large T antigen has two functions: 5% will go to the plasma
membrane of the cell and 95% will return to the nucleus. Once in the nucleus the large T antigen binds
three viral DNA sites, I, II, and III. Binding of sites I and II autoregulates early RNA synthesis. Binding to
site II takes place in each cell cycle. Binding site I initiates DNA replication at the origin of replication.
Early transcription gives two spliced RNAs that are both 19s. Late transcription gives both a longer 16s,
which synthesizes the major viral capsid protein VP1; and the smaller 19s, which gives VP2, and VP3
through leaky scanning. All of the proteins, besides the 5% of large T, return to the nucleus because
assembly of the viral particle happens in the nucleus. Eventual release of the viral particles is cytolytic
and results in cell death.
Multiplicity Reactivation
SV40 is capable of multiplicity reactivation (MR). MR is the process by which two, or more, virus genomes
containing otherwise lethal damage interact within an infected cell to form a viable virus genome.
Yamamato and Shimojo observed MR when SV40 virions were irradiated with UV light and allowed to
undergo multiple infection of host cells. Hall studied MR when SV 40 virions were exposed to the DNA
crosslinking agent 4, 5, 8-trimethylpsoralen.Under conditions where only a single virus particle entered
each host cell, approximately one DNA cross-link was lethal to the virus, and could not be repaired. In
contrast, when multiple viral genomes infected a host cell, psoralen induced DNA cross-links were
repaired; that is, MR occurred. Hall suggested that the virions with cross-linked DNA were repaired by
recombinational repair. Michod et al. reviewed numerous examples of MR in different viruses, and
suggested that MR is a common form of sexual interaction that provides the advantage of
recombinational repair of genome damages.
The early promoter for SV40 contains three elements. The TATA box is located approximately 20 base-
pairs upstream from the transcriptional start site. The 21 base-pair repeats contain six GC boxes and are
the site that determines the direction of transcription. Also, the 72 base-pair repeats are transcriptional
enhancers. When the SP1 protein interacts with the 21 bp repeats it binds either the first or the last three
GC boxes. Binding of the first three initiates early expression and binding of the last three initiates late
expression. The function of the 72 bp repeats is to enhance the amount of stable RNA and increase the
rate of synthesis. This is done by binding (dimerization) with the AP1 (activator protein 1) to give a
primary transcript that is 3' polyadenylated and 5' capped.
Theorized role in human disease Cloning Vectors


The hypothesis that SV40 might cause cancer in humans has been a particularly controversial area of
research. Several different methods have been used to detect SV40 in a variety of human cancers,
although how reliable these detection methods are, and whether SV40 has any role in causing these
tumors, remains unclear. As a result of these uncertainties, academic opinion remains divided, with some
arguing that this hypothesis is not supported by the data, and others arguing that some cancers may
involve SV40. However, the United States National Cancer Institute announced in 2004 that although
SV40 does cause cancer in some animal models, "substantial epidemiological evidence has accumulated
to indicate that SV40 likely does not cause cancer in humans" This announcement is based on two recent
p53 Damage and carcinogenicity
SV40 is believed to suppress the transcriptional properties of the tumor-suppressing p53 in humans
through the SV40 Large T-antigen and SV40 Small T-antigen. p53 is responsible for initiating regulated
cell death ("apoptosis"), or cell cycle arrest when a cell is damaged. A mutated p53 gene may contribute
to uncontrolled cellular proliferation, leading to atumor.
SV40 may act as a co-carcinogen with crocidolite asbestos to cause both Peritoneal and Pleural
When SV40 infects nonpermissive cells, such as 3T3 mouse cells, the dsDNA of SV40 becomes
covalently integrated. In nonpermissive cells only the early gene expression occurs and this leads to
transformation, or oncogenesis. The nonpermissive host needs the Large T-antigen and the Small t-
antigen in order to function. The Small T-antigen interacts with and integrates with the cellular
phosphatase pp2A. This causes the cell to lose the ability to initiate transcription.
Polio vaccine contamination
Soon after its discovery, SV40 was identified in the oral form of the polio vaccine produced between 1955
and 1961 produced by American Home Products (dba Lederle). This is believed to be due to two sources:
1) SV40 contamination of the original seed strain (coded SOM); 2) contamination of the substrate -
primary kidney cells from infected monkeys used to grow the vaccine virus during production. Both
the Sabin vaccine (oral, live virus) and the Salk vaccine (injectable, killed virus) were affected; the
technique used to inactivate the polio virus in the Salk vaccine, by means of formaldehyde, did not
reliably kill SV40.
It was difficult to detect small quantities of virus until the advent of PCR; since then, stored samples of
vaccine made after 1962 have tested negative for SV40, but no samples prior to 1962 could initially be
found. Then, in 1997, Herbert Ratner of Oak Park, Illinois, gave some vials of 1954 Salk vaccine to Cloning Vectors


researcher Michele Carbone. Ratner, the Health Commissioner of Oak Park at the time the Salk vaccine
was introduced, had kept these vials of vaccine in a refrigerator for over forty years.
Upon testing this vaccine, Carbone discovered that it contained not only the SV40 strain already known to
have been in the Salk vaccine (containing two 72-bp enhancers) but also the same slow-growing SV40
strain currently being found in some malignant tumors and lymphomas (containing one 72-bp
enhancers). It is unknown how widespread the virus was among humans before the 1950s, though one
study found that 12% of a sample of German medical students in 1952 had SV40 antibodies.
Although horizontal transmission between people has been proposed,
it is not clear if this actually
happens and if it does, how frequently it occurs.
An analysis presented at the Vaccine Cell Substrate Conference in 2004
suggested that vaccines used
in the former Soviet bloc countries, China, Japan, and Africa, could have been contaminated up to 1980,
meaning that hundreds of millions more could have been exposed to the virus unknowingly.

Baculo viral vectors:

Baculoviruses a diverse group of insect viruses. Autograpaha californica is a multinuclear polyhedral virus
(ACMNPV), and Bombax mori polyhedron viruses (BMNPV) are just two such examples.
Genome size of these viruses is about 128 to 200 kbp (ACMNPV =128 KBP) AND IN CIRCULAR
MODE. Insect cell lines used for the expression of viral genomes are Spodofora fugiperda (Sf-9 cell
lines). On infection viruses multiply within epithelial cells of gut.
Viral genes are expressed in temporal fashion, such as early, late and very late.
Early gene products initiate its DNA replication and also activate the expression of late genes. This
results in the multiplication of the genome as well as viruses, which in turn are budded off as enveloped
viruses (Env), which further infect new set of cells.
Expression of very late genes is about 18 to 20 hrs after infection, which code for polyhydrin proteins. At
this stage the host cell nuclear membrane proliferates and primary viruses are occluded among the
polyhydrin matrix proteins (29KD).
In order to use them for expression foreign genes first clone polyhydrin gene along with flanking regions
into a plasmid for recombination purposes. Cloning Vectors



~~~ = End of a circular plasmid.
V-DNA = Viral DNA as flanking regions.

Then delete the polyhydrin gene and in its place put a desired gene under a proper polyhydrin
promoter. Then co-infect insect lines with plasmid DNA and the viral DNA. In insect cells homologous
recombination results in the incorporation of the desired gene. The viral DNA replicates to 800-1000
copies per cell and later generates primary virions, which can further infect.
Mouse IgG against Pseudomonas aeuriginosa lipoprotein was expressed in Baculo viral
recombinants. In this case Transfer vector was constructed in such a way both light and heavy chain
genes are cloned separately under specific polyhydrin promoters with flanking DNA from the virus. When
wild type viral DNA and the transfer DNA are co infected into insect cell, homologous recombination
results in the integration of both IgG genes under promoters which are expressed in insect cells. In order
to scale up the production one can directly infect larvae instead if cells.
For example recombinant Trichplasi larvae containing human Adenosine deaminase gene, found the
expressed protein level was found to be 3-5% of the cellular proteins. For continuous production the
desired gene has to be to be constructed under the early gene promoter for they dont require any viral
gene products for expression. It is also possible to integrate the gene construct into host cells and
express continuously to get more of the recombinant protein. Advantage of using Baculo viral viruses is
one can clone a DNA fragme4nt of the size of 15-20 kbp long. Direct injecting modified virus can achieve
infection and growth. Proteins expressed in insect are found to have all characteristic post-translational
modification, but in some case glycosylation is same as that of mammalian system. Cloning Vectors


pAC UW31 transfer Vector:

Derived from Baculo viral genome.

~~-M13 oriAmp^+->---ColE1 ori----ACNPV 2099SV40P-X<-EcoR1,
BglII, Xba1--Phy/P-BamH1X ACNPV 2493bp----~~

~~ = Circular end of the transfer vector.
X = foreign genes.
Two gene can be expressed understand protein-protein interaction.

Bac pAK transfer vector:

~~---<Bsu-361---i-P/PolyH-Lac-Z- I I- - -ORF1629-Ori-Amp^+~~

ORF is required for viral replication.
Promoter Polyhydrin drives Lac-Z for color for nonrecombinant plaques.

pBac pAK 8 and 9: transfer vectors:

This vector is designed for high level of expression. It consists of strong polyhydrin vector flanked by
homologous sequences for recombination. The MCS has 18 sites and the other features like M13 origin,
Amp^+ and ColE1 Ori.

~~--M13ori --pUC ori----V-DNAP-PH-MCS-poly (A) V-DNA--~~~
~~ = Circular ends.
V-DNA = flanking viral DNA.
McsinPAK8=Bamh1.Ssc1.Pst1.stu1.Xho1.Bst-B.Xba1.BglII.Asp71811.Sma1.Eag1.Not1.Pac1. Pac1.
In PAK the mcs are in reverse.

Retroviral derived vectors:

There are a large number f RNA viruses, among them retroviruses are important, which are also called
retroviruses. When they infect, if the cells are compatible viral replicates and integrates into the host
genome as a cDNA but little longer than the original size of the viral RNA. It is only after activation, the
cells, which have the integrated viral genome, produce viral particles.
When the viral genomic RNA is replicated it produces ds cDNA with long terminal repeats at their
respective ends, hence the size of this is slightly larger than the viral genome.
Such ds viral DNA can be retrieved. The LTR segments and viral packaging sequences can be used to
construct an expression vector. The LTR sequences also contain promoter elements. In between the
LTR segments one can introduce required genes under specific promoters. And also one can have a
selection marker gene under specific promoter. If the vector has viral packaging sequences the viral DNA
can be packaged into a viral particle and obtain them in large numbers and the same can be used for
infecting a particular tissue. Upon infection the viral particle release the DNA into cytoplasm and the
circular DNA enters into the Nucleus, where using LTR sequences the DNA integrates into the host
genome. The transfected cells can be screened with a selection marker gene and the cells can be
amplified and the same can be used to obtain the gene product or the transgenic cells can be used for
genes therapy.

How to develop a live recombinant virus: Cloning Vectors


In order to develop a live recombinant virus first one has to transfer viral capsid genes into specific hoist
cells, where the genes are inserted in expression mode but regulatable. Such cells can be maintained
and expanded; such cells are called helper cell lines.
Such cells are transfected with above-mentioned viral construct. Then the cells are stimulated to produce
capsid proteins. As the vector DNA has sequences for packaging, the capsid proteins bind to such
sequences and complete packaging leads to full viral particle production. If such cell line also have
specific envelop protein genes, then the envelope proteins that incorporate into the viral envelop, which
can targeted to specific type of tissue.
When the tissue is infected with the recombinant viruses, the released DNA goes into the nucleus where
it gets integrated into the chromosomal DNA using LTR sequences and rest of the DNA gets degrades.
Application of this method is very important for one can deliver the viruses into specific tissues, without
causing any adverse effects. Any constructs with LTR sequences can also used directly transfer the
construct into the cells by direct transfer by any one of the Transfection methods. The transfected DNA
gets integrated into the host genome using LTR sequences. It is greatly facilitated if the cells are
mitotically active. This method has been employed in gene therapy.

-U3-R-U5-I~~-P---X-I-I-geneTtrI---PKan+ --TtrU3-R-U5-

U3RU5 = LTR sequences,
~~ = Packaging sequences
II = Introns.
Ttr = transcriptional terminator. Cloning Vectors


Expression vector
An expression vector, otherwise known as an expression construct, is usually a plasmid or virus
designed for protein expression in cells. The vector is used to introduce a specific gene into a target cell,
and can commandeer the cell's mechanism for protein synthesis to produce the protein encoded by the
gene. The plasmid is engineered to contain regulatory sequences that act
as enhancer and promoter regions and lead to efficient transcription of the gene carried on the expression
vector. The goal of a well-designed expression vector is the production of significant amount of
stable messenger RNA, and therefore proteins. Expression vectors are basic tools for biotechnology and
the production of proteins, for example insulin which is important for medical treatments of diabetes.
Elements of expression vectors Cloning Vectors


An expression vector has features that any vector may have, such as an origin of replication, a selectable
marker, and a suitable site for the insertion of a gene such as the multiple cloning site. The cloned gene
may be transferred from a specialized cloning vectors to an expression vector, although it is possible to
clone directly into an expression vector. The cloning process is normally performed in Escherichia coli,
and vectors used for protein expression in organisms other than E.coli may have, in addition to a suitable
origin of replication for its propagation in E. coli, elements that allow them to be maintained in another
organism, and these vectors are called shuttle vector.
Elements for expression
An expression vectors must have elements necessary for protein expression. These include a strong
promoter, the correct translation initiation sequence such as a ribosomal binding site and start codon, a
strong termination codon, and a transcription termination sequence. There are differences in the
machinery for protein synthesis between prokaryotes and eukaryotes, therefore the expression vectors
must have the elements for expression that is appropriate for the chosen host. For example, prokaryotes
expression vectors would have a Shine-Dalgarno sequence at its translation initiation site for the binding
of ribosomes, while eukaryotes expression vectors would contain the Kozak consensus sequence.
The promoter initiates the transcription and is therefore the point of control for the expression of the
cloned gene. The promoters used in expression vector are normally inducible, meaning that protein
synthesis is only initiated when required by the introduction of an inducer such as IPTG. Protein
expression however may also be constitutive (i.e. protein is constantly expressed) in some expression
vectors. Low level of constitutive protein synthesis may occur even in expression vectors with tightly-
controlled promoters.
Protein tags
After the expression of the gene product, it is usually necessary to purify the expressed protein. However,
separating the protein of interest from the great majority of proteins of the host cell can be a protracted
process. To make this purification process easier, a purification tag may be added to the cloned gene.
This tag could be histidine (His) tag, other marker peptides, or a fusion partners such as glutathione S-
transferase or maltose-binding protein. Other fusion proteins such as green fluorescent protein may act
as a reporter gene.
The expression vector is tranformed or transfected into the host cell for protein synthesis. Some
expression vectors may have element for transformation or the insertion of DNA into the host
chromosome, for example the vir genes for plant transformation, and integrase sites for chromosomal
insertion. Cloning Vectors


Some vectors may include targeting sequence that may target the expressed protein to a specific location
such as the periplasmic space of bacteria.
Expression systems
Different organism may be used to express a target protein, the expression vector used therefore will
have elements specific for use in the particular organism. The most commonly-used organism for protein
expression is the bacterium Escherichia coli. However not all proteins can be successfully expressed in E.
coli, and other systems may therefore be used.

An example of a bacterial expression vector is the pGEX-3x plasmid
The expression host of choice for the expression of many proteins is Escherichia coli as the production of
heterologous protein in E. coliis relatively simple and convenient, as well as being rapid and cheap. A
large number of E. coli expression plasmids are also available suitable for a wide variety of needs. Other
bacteria used for protein expression include Bacillus subtilis.
Most proteins are expressed in the cytoplasm of E. coli, but where necessarily, for example when the
protein can only fold correctly in an oxidizing environment due to the presence of disulphide bonds, the
protein may be targeted to the periplasmic space by the use of an N-terminal signal sequence. Other
more sophisticated systems are being developed; such systems may allow for the expression of proteins
previously thought impossible in E. coli, such as glycosylated proteins.
The promoters used for these vector are usually based on the promoter of the lac operon or
the T7 promoter, and they are normally regulated by the operator. These promoters may also be hybrids
of different promoters, for example, the tac promoter is a hybrid of trpand lac promoters. Note that most
commonly-used lac or lac-derived promoters are based on the lacUV5 mutant which is insensitive to
catabolite repression. This mutant allows for expression of protein under the control of the lac promoter Cloning Vectors


when the growth medium contains glucose since glucose would inhibits protein expression if wild-
type lac promoter is used.
Examples of E. coli expression vectors are the pGEX series of vectors where glutathione-S-transferase is
used as a fusion partner, and the pET series of vectors which uses a T7 promoter.
It is possible to simultaneously express two or more different proteins in E. coli using different plasmids.
However, when 2 or more plasmids used, each plasmid needs to use a different antibiotic selection as
well as a different origin of replication, otherwise the plasmids may not be stably maintained. Another
approach would be to use a single bicistronic vector or design the coding sequences in tandem as a bi- or
poly-cistronic construct.
A yeast commonly used for protein expression is Pichia pastoris. Examples of yeast expression vector
in Pichia are the pPIC series of vectors, and these vectors use the AOX1promoter which is inducible
with methanol. The plasmids may contain elements for insertion of foreign DNA into the yeast genome
and signal sequence for the secretion of expressed protein. Proteins with disulphide bonds and
glycosylation can be efficiently produced in yeast. Another yeast used for protein expression
is Kluyveromyces lactis and the protein is expressed driven by a variant of the strong lactase LAC4

Saccharomyces cerevisiae is also commonly used for protein expression, for example in yeast two-hybrid
system for the study of protein-protein interaction. The vectors used in yeast two-hybrid system contain
fusion partners for two cloned genes that allow the transcription of a reporter gene when there is
interaction between the two proteins expressed from the cloned genes.
Baculovirus, a rod-shaped virus which infect insect cells, is used as the expression vector in this system.
Insect cell lines derived from Lepidopterans (moths and butterflies), such as Spodoptera frugiperda, are
used as host. The shuttle vector is called bacmid, and protein expression is under the control of a strong
promoter pPolh. It is normally used for production of glycoproteins, although the glycosylations may be
different from those found in vertebrates. It is safer to use than mammalian virus as it has a limited host
range and does not infect vertebrates.
Many plant expression vectors are based on the Ti plasmid of Agrobacterium tumefaciens.
In these
expression vectors, DNA to be inserted into plant is cloned into the T-DNA, a stretch of DNA flanked by a Cloning Vectors


25-bp direct repeat sequence at either end, and which can integrate into the plant genome. The T-DNA
also contains the selectable marker. TheAgrobacterium provides a mechanism for tranformation,
integration of into the plant genome, and the promoters for its vir genes may also be used for the cloned
Some plant viruses may be used as vectors since Agrobacterium does not work for all plants. Examples
of plant virus used are the tobacco mosaic virus (TMV), potato virus X, andcowpea mosaic virus.
protein may be expressed as a fusion to the coat protein of the virus and is displayed on the surface of
assembled viral particles, or as an unfused protein that accumulates within the plant. Expression in plant
using plant vectors is often constitutive, and a commonly-used constitutive promoter in plant expression
vectors is the cauliflower mosaic virus (CaMV) 35S promoter.
Cultured mammalian cell lines such as the Chinese hamster ovary (CHO), HEK and COS cell lines are
used to produce protein. Vectors are transfected into the cells and the DNA may be integrated into the
genome by homologous recombination in the case of stable transfection, or the cells may be transiently
transfected. Examples of mammalian expression vectors include the adenoviral vectors, the pSV and the
pCMV series of vectors. The promoters for cytomegalovirus (CMV) and SV40 are commonly used in
mammalian expression vectors to drive protein expression.
Cell-free systems
E. coli cell lysate containing the cellular components required for transcription and translation are used in
this in vitro method of protein expression. The advantage of such system is that protein may be produced
much faster than those produced in vivo, but it is more expensive. Vectors used for E. coli expression can
be used in this system although specifically-design vectors for this system are also available. Eukaryotic
cell extracts may also be used in other cell-free systems.
Laboratory use
Expression vector in an expression host is now the usual method used in laboratories to produce proteins
for research. Most proteins are produced in E. coli, but for glycosylated proteins and those with disulphide
bonds, yeast, baculovirus and mammalian systems may be used.
Production of peptide and protein pharmaceuticals
Most protein pharmaceuticals are now produced through recombinant DNA technology using expression
vectors. These peptide and protein pharmaceuticals may be hormones, vaccines, antibiotics, antibodies,
and enzymes. The first human recombinant protein used for disease management was insulin and it was Cloning Vectors


introduced in 1982. Biotechnology allows these peptide and protein pharmaceuticals, some of which were
previously rare or difficult to obtain, to be produced in large amount. It also reduces the risks of
contaminants such as host viruses, toxins and prions. For example, growth hormone extracted
from pituitary glands harvested from human cadavers had caused CreutzfeldtJakob disease in patients
receiving treatment for dwarfism due to prion contamination, and viral contaminants in clotting factor
VIII isolated from human blood had resulted in the transmission of viral diseases such
as hepatitis and AIDS, and such risk is reduced or removed completely when these proteins are produced
in non-human cell-lines.
Transgenic plants
In recent years, expression vectors have been used to introduce specific genes in organisms, for example
in agriculture it is used to produce transgenic plants. Expression vectors have been used to introduce
a vitamin A precursor, beta-carotene, into rice plants. This product is called golden rice. This process has
also been used to introduce a gene into plants that produces an insecticide, called Bacillus thuringiensis
toxin or Bt toxin which reduces the need for farmers to apply insecticides since it is produced by the
modified organism. In addition expression vectors are used to extend the ripeness of tomatoes by altering
the plant so that it produces less of the chemical that causes the tomatoes to rot.There has been
controversy over using expression vectors to modify crops due to the fact that there might be unknown
health risks, possibilities of companies patenting certain genetically modified food crops, and ethical
concerns. Nevertheless, this technique is still being used and heavily researched.
Gene therapy
Gene therapy is a promising treatment for a number of diseases where a "normal" gene carried by the
vector is inserted into the genome, to replace an "abnormal" gene or supplement the expression of
particular gene. Viral vectors are generally used but other nonviral methods of delivery are being
developed. The treatment is still a risky option due to the viral vector used which can cause ill-effects, for
example giving rise to insertional mutation that can result in cancer. However, there have been promising

Paired-end tag ( PET)
Paired-end tags (PET) (sometimes "Paired-End diTags", or simply "ditags") are the short sequences at
the 5 and 3 ends of a DNA fragment which are unique enough that they (theoretically) exist together only
once in a genome, therefore making the sequence of the DNA in between them available upon search (if
full-genome sequence data is available) or upon further sequencing (since tag sites are unique enough to
serve as primer annealing sites). Paired-end tags (PET) exist in PET libraries with the intervening DNA Cloning Vectors


absent, that is, a PET "represents" a larger fragment of genomic or cDNA by consisting of a short 5' linker
sequence, a short 5' sequence tag, a short 3' sequence tag, and a short 3' linker sequence. It was shown
conceptually that 13 bp is sufficient to map tags uniquely.

However, longer sequences are more practical for mapping reads uniquely.
The endonucleases(discussed below) used to produce PETs give longer tags (18/20 bp and 25/27 bp)
but sequences of 50100 base pairs would be optimal for both mapping and cost efficiency.
extracting the PETs from many DNA fragments, they are linked (concatenated) together for efficient
sequencing. On average, 2030 tags could be sequenced with the Sangermethod, which has a longer
read length. Since the tag sequences are short, individual PETs are well suited for next-generation
sequencing that has short read lengths and higher throughput. The main advantages of PET sequencing
are its reduced cost by sequencing only short fragments, detection of structural variants in the genome,
and increased specificity when aligning back to the genome compared to single tags, which involves only
one end of the DNA fragment.
Constructing the PET library

Workflow of Cloning and Cloning-free based PET library construction. Cloning Vectors


PET libraries are typically prepared in two general methods: cloning based and cloning-free based.
Cloning based
Fragmented genomic DNA or complementary DNA (cDNA) of interest is cloned into plasmid vectors. The
cloning sites are flanked with adaptor sequences that contain restriction sites for endonucleases
(discussed below). Inserts are ligated to the plasmid vectors and individual vectors are
then transformed into E. coli making the PET library. PET sequences are obtained by purifying plasmid
and digesting with specific endonuclease leaving two short sequences on the ends of the vectors. Under
intramolecular (dilute) conditions, vectors are re-circularized and ligated, leaving only the ditags in the
vector. The sequences unique to the clone are now paired together. Depending on the next-generation
sequencing technique, PET sequences can be left singular, dimerized, or concatenated into long chains.
Cloning-free based
Instead of cloning, adaptors containing the endonuclease sequence are ligated to the ends of fragmented
genomic DNA or cDNA. The molecules are then self-circularized and digested with endonuclease,
releasing the PET. Before sequencing, these PETs are ligated to adaptors to which PCR primers anneal
for amplification. The advantage of cloning based construction of the library is that it maintains the
fragments or cDNA intact for future use. However, the construction process is much longer than the
cloning-free method. Variations on library construction have been produced by next-generation
sequencing companies to suit their respective technologies.
Unlike other endonucleases, the MmeI (type IIS) and EcoP15I (type III) restriction endonucleases cut
downstream of their target binding sites. MmeI cuts 18/20 base pairs downstream and EcoP15I cuts
25/27 base pairs downstream. As these restriction enzymes bind at their target sequences located in the
adaptors, they cut and release vectors that contain short sequences of the fragment or cDNA ligated to
them, producing PETs.