Sie sind auf Seite 1von 12

SEQUENCE

FEATURES OF
RNA
POLYMERASE
PAUSE SITES

HEMANTH PRABHAR
11B205

ABSTRACT:
During the highly regulated transcription process
which results in the production of mRNA, the polymerase
that moves over the DNA and it undergoes 1) pause, 2)
arrest or 3) termination when it encounters certain
sequence motifs. This was identified because of the
pausing of the polymerase molecule at certain sites. Also
transcription arrest is found to happen due to the
slipping-out of the RNA-DNA hybrids due to certain
structures called R loops.
ABOUT THE PROJECT:
The project is about identifying the effect of sequence
features of DNA such as tandem repeats and R loops on
polymerase pausing.
BACKGROUND WORK:
Gene transcription takes place in three phases.
Initiation,
Elongation and

Termination
Elongation step is the most regulated and regulation
takes place by RNA polymerase pausing. The two
causes for polymerase pausing are:
1)Thermodynamic stability of RNA-DNA hybrid (A-T
rich)
2)Structures formed by RNA will displace the RNA from
the polymerase catalytic site (1)
STEPS INVOLVED:
The initial pausing happens at A-T rich region.
Polymerase backtracking to
thermodynamically stable G-C region
RNA cleavage takes place and stable pausing
occurs
Pausing also happens due to the presence of
structures called R loops. R loop is a structure in
which one strand of DNA is partially or completely
hybridized with RNA leaving the other strand
unpaired(2)
R loops a cause pausing downstream of the poly A site. G
rich regions are present downstream of poly A region and
stabilize R loops. It has been proposed that R loops may be critical
for RNAPII to pause downstream of the poly A site(3)
AIM:

To show that Polymerase pausing takes place due to sequence


features such as R loops and also due to the presence of tandem
repeats.

METHODOLOGY:
A total of 2200 genes were downloaded from BIOMART taking 100
genes randomly from each chromosome to ensure that the data set is
random.
TOOLS USED:
GRO-seq or global run-on sequencing software is used for the
study to locate paused genes throughout the human genome. GROseq is a relatively new methodology for documenting transcribed
regions in the human genome by isolation and large-scale
sequencing of nascent RNAs.
The GRO-seq methodology sequences nascent RNAs on a large
scale by isolation and then it documents transcribed regions in the
human genome. Thus polymerase locations can be identified
precisely and their active promoters and directionality can also be
identified (4)

The software that are used for this purpose are


BOWTIE,
INTEGRATED GENOMICS VIEWER.
BOWTIE is a program that is used for aligning short DNA
sequences to the human genome. It aligns short DNA
sequences to the human genome(5)

INTEGRATIVE GENOMICS VIEWER allows the viewing of


several types of data files involved in any NGS analysis that
employs a reference genome, including how reads from a dataset
are mapped (6)
The gene sequences that were downloaded from Biomart are
aligned with NCBIs genetic association database and results
published by core et al regarding the polymerase pause sites.
Bowtie software was used for this purpose. The gene list navigator
option is used to compute the average read density for a gene. With
the help of this the efficiency of transcription may be found. A
threshold value is set to determine if a gene is paused or not. A
typical representation is shown below with the upper graph
representing the visualized gene and the second is a duplicate.

R LOOP DATABASE is a recent computational analysis of


human genomes has identified possible DNA hotspots for the
formation of R loops. Such RLFSs are widespread throughout the
human genome. The results of the study are compiled together as
R-loop database. The R-loop status for each gene is queried for
within the 1st intron and 1st exon regions of the gene alone, and
the total number of RLFS for each gene is computed(7)

EFFECTS OF REPEAT INSTABILITY Repeat instability found


in and around the pause sites are thought to be the cause of
transcription pausing. It has also been seen that RNA-DNA hybrids
of G-C rich sequences (R loops) and DNA slip-outs due to
CAG/CTG triplet repeats also cause transcription arrest(8)

IDENTIFYING TANDEM REPEATS WITHIN THE


SEQUENCES:
The sequences for all the genes used in the study are batch
downloaded using Ensembles Biomart tool. The program
Tandem Repeats Finder is used to identify sequence repeats
within all the sequences by a simple probabilistic model. Since it is
known that sequence repeats are very common in the human
genome, very stringent parameters are used to select significant
repeats using the program.

TRANSCRIPTION FACTORS:
The transcription factors for these may also have an effect on the
pausing of the polymerases. DBD is a database of predicted
transcription factors in completely sequenced

genomes.
The predicted transcription factors all contain
assignments to sequence specific DNA-binding
domain families. The predictions are based on
domain assignments from the SUPERFAMILY and
Pfam hidden Markov model libraries. Benchmarks of
our transcription factor predictions show they are
accurate and have wide coverage on a genomic scale
(9)
The DBD consists of predicted transcription factor
repertoires for 930 completely sequenced genomes.
The transcription factors for the human genome was
taken and the transcription factors for the genes
were listed.

It was found that there was no correlation between the transcription


factors and the pause sites as they were mostly random and did not have
a consistent pattern.

RESULTS:
The results for the correlation of tandem repeats with the pause sites

TABLE 1:
Pausing status

Tandem
repeats present

No tandem
repeats

total

positive

322

134

456

negative

703

196

899

total

1025

330

1355

The results for the correlation of R loops with the pause sites
TABLE 2:
Pausing status

R loop present

R loop absent

total

positive

286

61

347

negative

1256

597

1853

total

1542

658

2200

Also the transcription factors are not found to have any correlation
with the pause sites of the polymerase.
DISCUSSION:
1. The presence of R loops, tandem repeats as well as the
transcription factors are not found to have any influence on the
pausing status of the polymerase. This hold true for the whole
genome as the data set is chosen randomly.
2. Though the R-loops and sequence repeats may play a role in
determining pausing, there is no concluding evidence obtained
from this result.
3. While DNA sequence must contain all information needed for
regulation of genes, our understanding of the protein machinery
interprets this information is limited. In particular, it is unclear
what sequence features specify the location and duration of
promoter-proximal Pol II pausing on a given gene.
4. The transcription factors are random and do not form a consistent
order for the pause sites and hence they do not have any effect on
the pausing of polymerase.

BIBLIOGRAPHY:
1) Sequence features of RNA polymerase pause sites by M.
Rajeeva Lochan
2) Nechaev, S. & Adelman, K. Pol II waiting in the starting gates:
regulating the transition from transcription initiation into
productive elongation. Biochim Biophys Acta 1809, 3445 (2011).
3) R-loop-mediated genomic instability is caused by impairment of
replication fork progression by Wenjian Gan, Jie Liu, Keng Shen.
4) Nascent rna sequencing reveals widespread and divergent
initiation at human promoters by core lj, list jt (2008)
5) http://bowtie-bio.sourceforge.net/index.shtml
6) http://broadinstitute.org/software/igv/download
7) .Quantitative model of R-loop forming structures reveals a novel
level of RNA-DNA Interactome complexity Wongsurawat T et al.,
Nucleic Acids Research, 2011, doi:10.1093/nar/gkr1075
8) McIvor, Elizabeth I, Urszula Polak, and Marek Napierala. New
Insights into Repeat Instability. RNA Biology 7, no. 5 (2010):
551558

9) DBD - taxonomically broad transcription factor


predictions: new content and functionality Derek
Wilson, Varodom Charoensawan, Sarah K.
Kummerfeld, Sarah A. Teichmann
Nucleic Acids Research 2008 36(Database
issue):D88-D92; doi:10.1093/nar/gkm964