Beruflich Dokumente
Kultur Dokumente
sequence logo
A sequence logo consists of a stack of letters at each position. The relative sizes of the letters
indicates their frequency in the sequences. The total height of the letters depicts the information
content of the position, in bits.
Highly conserved = low entropy = tall stack. Very variable = high entropy =
low stack.
A consensus logo is a simplified variation of a sequence logo that can be embedded in text format.
Like a sequence logo, a consensus logo is created from a collection of aligned protein or DNA/RNA
sequences and conveys information about the conservation of each position of a sequence
motif or sequence alignment
Gene and promoter prediction
PROMOTER ELEMENTS
1. Core promoter - the minimal portion of the promoter required to properly initiate
transcription
Approximately -34
2. Proximal promoter - the proximal sequence upstream of the gene that tends to
contain primary regulatory elements
Approximately -250
Prokaryotic promoters
In prokaryotes, the promoter consists of two short sequences at -10 and -35 positions
upstream from the transcription start site.
The sequence at -10 is called the Pribnow box, or the -10 element, and usually
consists of the six nucleotides TATAAT. The Pribnow box is absolutely
essential to start transcription in prokaryotes.
The other sequence at -35 (the -35 element) usually consists of the six
nucleotides TTGACA. Its presence allows a very high transcription rate.
Eukaryotic promoters
Eukaryotic promoters are extremely diverse and are difficult to characterize. They
typically lie upstream of the gene and can have regulatory elements several kilobases
away from the transcriptional start site. In eukaryotes, the transcriptional complex can
cause the DNA to bend back on itself, which allows for placement of regulatory
sequences far from the actual site of transcription. Many eukaryotic promoters,
contain a TATA box (sequenceTATAAA), which in turn binds a TATA binding protein
which assists in the formation of the RNA polymerase transcriptional complex. The
TATA box typically lies very close to the transcriptional start site (often within 50
bases).
Promoters: (Softberry) - choose from BPROM (bacterial), TSSP (plant) and TSSG &
TSSW (human)
Virtual Footprint - offers two types of analyses (a) Regulon Analysis - analysis of a
whole prokaryotic genome with one regulator pattern and (b) Promoter analysis -
Analysis of a promoter region with several regulator patterns (Reference: R. Mnch et
al. 2005. Bioinformatics 2005 21: 4187-4189).
PePPER (University of Groningen, The Netherlands) is a webserver for prediction of
prokaryote promoter elements and regulons (Reference: de Yong, A. et al. 2012. BMC
Genomics 13:299). It is also available here. Also seeProkaryotic promoters.
B. Eukaryotic
There are two important aspects to any program for gene identification: one is the type of
information used by the program, and the other is the algorithm that is employed to combine
that information into a coherent prediction. Three types of information are used in predicting
gene structures: signals in the sequence, such as splice sites; content statistics, such as
codon bias; and similarity to known genes. The first two types have been used since the early
days of gene prediction, whereas similarity information has been used routinely only in recent
years. One of the reasons that the accuracy of gene-prediction programs have improved in the
last few years is the enormous increase in the number of examples of known coding sequences.
This much larger sample size allows for more reliable statistical measures to be developed, as
well as a much greater likelihood of encountering a gene that is related to one that has been
identified previously.