Sie sind auf Seite 1von 20

expressed sequence tags

What are ESTs & How are they made?


ESTs are small pieces of DNA sequence (usually 200
to 500 nucleotides long)

Generated by sequencing either one or both ends of an
expressed gene.

expressed sequence tags

The idea is to sequence bits of DNA that represent
genes expressed in certain cells, tissues, or organs
from different organisms.

And use these "tags" to fish a gene out of a portion of
chromosomal DNA by matching base pairs.

expressed sequence tags

Why use EST?

Gene identification is very difficult in organisms.

Because most of our genome is composed of introns
interspersed with a relative few DNA coding sequences, or
genes.

These genes are expressed as proteins.

Each gene (DNA) must be converted, or transcribed, into
messenger RNA.

The resulting mRNA guides the synthesis of a protein.

Interestingly, mRNAs in a cell do not contain sequences from
the regions between genes, nor from the non-coding introns that
are present within many genes.

Therefore, isolating mRNA is key to finding expressed genes
in the vast expanse of the human genome.
Next problem:

mRNA is very unstable outside of a cell

Convert it to complementary DNA (cDNA).

cDNA is a much more stable compound and, importantly,
because it was generated from a mRNA in which the introns
have been removed, cDNA represents only expressed DNA
sequence.
From cDNAs to ESTs

Once cDNA is made, we can then sequence a few hundred
nucleotides from either end of the molecule to create two
different kinds of ESTs.

Sequencing only the beginning portion of the cDNA produces
what is called a 5' EST.

Sequencing the ending portion of the cDNA molecule produces
what is called a 3' EST.
An overview of how ESTs are generated
A cDNA library is constructed from a tissue or cell line of
interest.

The libraries are constructed by isolating mRNA from the
tissue or cell line of interest.

The mRNA is then reverse-transcribed into cDNA.

The resulting cDNA is cloned into a vector.
Individual clones are picked from the library, and one
sequence is generated from each end of the cDNA insert.

Thus, each clone normally has a 5' and 3' EST associated
with it.

The sequences average ~ 400 bases in length.

Because the ESTs are short, they generally represent only
fragments of genes, not complete coding sequences.
How to Access ESTs ?
Submitted to all three international sequence databases
(GenBank, EMBL, and DDBJ), under the data-sharing
agreement .

All ESTs can be accessed through all of these databases,
regardless of where the sequence was originally submitted.

The same ESTs are also available from the NCBIs dbEST,
the database of Expressed Sequence Tags.
Like other sequences in GenBank, ESTs can be accessed
through Entrez.

Single ESTs are retrieved by accession or gi number.

Advanced searches with multiple search terms can be
limited to ESTs by selecting the Properties limit and entering
EST.
How to Access ESTs ?
Interest for ESTs

ESTs represent the most extensive available survey of the
transcribed portion of genomes.

ESTs are indispensable for gene structure prediction, gene
discovery and genomic mapping.

Characterization of splice variants and alternative
polyadenylation.

High-volume and high-throughput data production at low cost.

There are 69,713,950 of EST entries in GenBank (dbEST)
(June 1, 2011):
8,315,231 entries of human ESTs; dbEST release 060111

Limitations of EST
Data
Data are not of as high a quality as sequences determined by
conventional means.

High error rates (~ 1/100) because of the sequence reading
single-pass.

ESTs may contain substitutions, deletions, or insertions
compared with the parent mRNA sequence.

ESTs may contain bacterial, mitochondrial, or vector
sequence contamination.

A single EST represents only a partial gene sequence.

Not a dened gene/protein product.

Not curated in a highly annotated form.

High redundancy in the data)huge number of sequences to
analyze.

Das könnte Ihnen auch gefallen