You are on page 1of 16

DNA Transcription

The genetic code is frequently referred to as a "blueprint" because it contains the instructions a cell requires in order to sustain itself. We now know that there is more to these instructions than simply the sequence of letters in the nucleotide code, however. For example, vast amounts of evidence demonstrate that this code is the basis for the production of various molecules, including RNA and protein. Research has also shown that the instructions stored within DNA are "read" in two steps: transcription and translation. In transcription, a portion of the double-stranded DNA template gives rise to a single-stranded RNA molecule. In some cases, the RNA molecule itself is a "finished product" that serves some important function within the cell. Often, however, transcription of an RNA molecule is followed by a translation step, which ultimately results in the production of a protein molecule.

Vizualizing Transcription
The process of transcription can be visualized by electron microscopy (Figure 1); in fact, it was first observed using this method in 1970. In these early electron micrographs, the DNA molecules appear as "trunks," with many RNA "branches" extending out from them. When DNAse and RNAse (enzymes that degrade DNA and RNA, respectively) were added to the molecules, the application of DNAse eliminated the trunk structures, while the use of RNAse wiped out the branches. DNA is double-stranded, but only one strand serves as a template for transcription at any given time; the other strand is referred to as the noncoding strand. In most organisms, the strand of DNA that serves as the coding template for one gene may be noncoding for other genes within the same chromosome.

The Transcription Process


The process of transcription begins when an enzyme called RNA polymerase (RNA pol) attaches to the template DNA strand and begins to catalyze production of complementary RNA. Polymerases are large enzymes composed of approximately a dozen subunits, and when active on DNA, they are also typically complexed with other factors. In many cases, these factors signal which gene is to be transcribed. Three different types of RNA polymerase exist in eukaryotic cells, whereas bacteria have only one. In eukaryotes, RNA pol I transcribes the genes that encode most of the ribosomal RNAs (rRNAs), and RNA pol III transcribes the genes for one small rRNA, plus the transfer RNAs that play a key role in the translation process, as well as other small regulatory RNA molecules. Thus, it is RNA pol II that transcribes the messenger RNAs, which serve as the templates for production of protein molecules.

Transcription Initiation
The first step in transcription is initiation, when the RNA pol binds to the DNA upstream (5) of the gene at a specialized sequence called a promoter. In bacteria, promoters are usually composed of three sequence elements, whereas in eukaryotes, there are as many as seven elements.

In prokaryotes, most genes have a sequence called the Pribnow box, with the consensus sequence TATAAT positioned about ten base pairs away from the site that serves as the location of transcription initiation. Not all Pribnow boxes have this exact nucleotide sequence; these nucleotides are simply the most common ones found at each site. Although substitutions do occur, each box nonetheless resembles this consensus fairly closely. Many genes also have the consensus sequence TTGCCA at a position 35 bases upstream of the start site, and some have what is called an upstream element, which is an A-T rich region 40 to 60 nucleotides upstream that enhances the rate of transcription (Figure 2). In any case, upon binding, the RNA pol "core enzyme" binds to another subunit called the sigma subunit to form a holoezyme capable of unwinding the DNA double helix in order to facilitate access to the gene. The sigma subunit conveys promoter specificity to RNA polymerase; that is, it is responsible for telling RNA polymerase where to bind. There are a number of different sigma subunits that bind to different promoters and therefore assist in turning genes on and off as conditions change. Eukaryotic promoters are more complex than their prokaryotic counterparts, in part because eukaryotes have the aforementioned three classes of RNA polymerase that transcribe different sets of genes. Many eukaryotic genes also possess enhancer sequences, which can be found at considerable distances from the genes they affect. Enhancer sequences control gene activation by binding with activator proteins and altering the 3-D structure of the DNA to help "attract" RNA pol II, thus regulating transcription. Because eukaryotic DNA is tightly packaged as chromatin, transcription also requires a number of specialized proteins that help make the coding strand accessible. In eukaryotes, the "core" promoter for a gene transcribed by pol II is most often found immediately upstream (5) of the start site of the gene. Most pol II genes have a TATA box (consensus sequence TATTAA) 25 to 35 bases upstream of the initiation site, which affects the transcription rate and determines location of the start site. Eukaryotic RNA polymerases use a number of essential cofactors (collectively called general transcription factors), and one of these, TFIID, recognizes the TATA box and ensures that the correct start site is used. Another cofactor, TFIIB, recognizes a different common consensus sequence, G/C G/C G/C G C C C, approximately 38 to 32 bases upstream (Figure 3). The terms "strong" and "weak" are often used to describe promoters and enhancers, according to their effects on transcription rates and thereby on gene expression. Alteration of promoter strength can have deleterious effects upon a cell, often resulting in disease. For example, some tumor-promoting viruses transform healthy cells by inserting strong promoters in the vicinity of growth-stimulating genes, while translocations in some cancer cells place genes that should be "turned off" in the proximity of strong promoters or enhancers. Enhancer sequences do what their name suggests: They act to enhance the rate at which genes are transcribed, and their effects can be quite powerful. Enhancers can be

thousands of nucleotides away from the promoters with which they interact, but they are brought into proximity by the looping of DNA. This looping is the result of interactions between the proteins bound to the enhancer and those bound to the promoter. The proteins that facilitate this looping are called activators, while those that inhibit it are called repressors. Transcription of eukaryotic genes by polymerases I and III is initiated in a similar manner, but the promoter sequences and transcriptional activator proteins vary.

Strand Elongation
Once transcription is initiated, the DNA double helix unwinds and RNA polymerase reads the template strand, adding nucleotides to the 3 end of the growing chain. At a temperature of 37 degrees Celsius, new nucleotides are added at the rate of about 1520 amino acids per second in bacteria (Dennis & Bremer, 1974), while eukaryotes proceed at a much slower pace of approximately five to eight amino acids per second (Izban & Luse, 1992).

Transcription Termination
Terminator sequences are found close to the ends of coding sequences. Bacteria possess two types of these sequences. In rho-independent terminators, inverted repeat sequences are transcribed; they can then fold back on themselves in hairpin loops, causing RNA pol to pause and resulting in release of the transcript. On the other hand, rho-dependent terminators make use of a factor called rho, which actively unwinds the DNA-RNA hybrid formed during transcription, thereby releasing the newly synthesized RNA (Figure 4). In eukaryotes, termination of transcription occurs by different processes, depending upon the exact polymerase utilized. For pol I genes, transcription is stopped using a termination factor, through a mechanism similar to rho-dependent termination in bacteria. Transcription of pol III genes ends after transcribing a termination sequence that includes a polyuracil stretch, by a mechanism resembling rho-independent prokaryotic termination. Termination of pol II transcripts, however, is more complex. Transcription of pol II genes can continue for hundreds or even thousands of nucleotides beyond the end of a coding sequence. The RNA strand is then cleaved by a complex that appears to associate with the polymerase. Cleavage seems to be coupled with termination of transcription and occurs at a consensus sequence. Mature pol II mRNAs are polyadenylated at the 3-end, resulting in a poly(A) tail; this process follows cleavage and is also coordinated with termination. Both polyadenylation and termination make use of the same consensus sequence, and the interdependence of the processes was demonstrated in the late 1980s by work from several groups. One group of scientists working with mouse globin genes showed that introducing mutations into the consensus sequence AATAAA, known to be necessary for poly(A) addition, inhibited both polyadenylation and transcription termination. They measured the extent of termination by hybridizing transcripts with the different poly (A) consensus sequence mutants with wild-type transcripts, and they were able to see a

decrease in the signal of hybridization, suggesting that proper termination was inhibited. They therefore concluded that polyadenylation was necessary for termination (Logan et. al., 1987). Another group obtained similar results using a monkey viral system, SV40 (simian virus40). They introduced mutations into a poly(A) site, which caused mRNAs to accumulate to levels far above wild type (Connelly & Manley, 1988). The exact relationship between cleavage and termination remains to be determined. One model supposes that cleavage itself triggers termination; another proposes that polymerase activity is affected when passing through the consensus sequence at the cleavage site, perhaps through changes in associated transcriptional activation factors. Thus, research in the area of prokaryotic and eukaryotic transcription is still focused on unraveling the molecular details of this complex process, data that will allow us to better understand how genes are transcribed and silenced.

Figure 2 : In most prokaryotic promoters, the actual sequence is not TATAAT. The sequences shown are found in six E. coli promoters, including those of genes for tryptophan biosynthesis (trp), tyrosine tRNA (tRNATyr), lactose metabolism (lac), a recombination protein (recA), rRNA (rrnDI), and arabinose metabolism (araB, A, D). These sequences are on the non-template strand and read 5-prime to 3-prime, left to right. 2005 W. H. Freeman Pierce, Benjamin. Genetics: A Conceptual Approach, 2nd ed. (New York: W. H. Freeman and Company), 357. Used with permission. All rights reserved.

Figure 3: The promoters of genes transcribed by RNA polymerase II consist of a core promoter and a regulatory promoter that contain consensus sequences. Not all the consensus sequences shown are found in all promoters. Used with permission. 2005 by W. H. Freeman and Company. All rights reserved.

Figure 4: Termination by bacterial rho-independent terminators is a multistep process. Termination by bacterial rho-independent terminators is a multistep process Used with permission. 2005 by W. H. Freeman and Company. All rights reserved.

References and Recommended Reading


Connelly, S., & Manley, J. L. A functional mRNA polyadenylation signal is required for transcription termination by RNA polymerase II. Genes and Development 4, 440452 (1988) Dennis, P. P., & Bremer, H. Differential rate of ribosomal protein synthesis in Escherichia coli B/r. Journal of Molecular Biology 84, 407422 (1974) Dragon. F., et al. A large nucleolar U3 ribonucleoprotein required for 18S ribosomal RNA biogenesis. Nature 417, 967970 (2002) doi:10.1038/nature00769 (link to article) Izban, M. G., & Luse, D. S. Factor-stimulated RNA polymerase II transcribes at physiological elongation rates on naked DNA but very poorly on chromatin templates. Journal of Biological Chemistry 267, 1364713655 (1992) Kritikou, E. Transcription elongation and termination: It ain't over until the polymerase falls off. Nature Milestones in Gene Expression 8 (2005) Lee, J. Y., Park, J. Y., & Tian, B. Identification of mRNA polyadenylation sites in genomes using cDNA sequences, expressed sequence tags, and trace.Methods in Molecular Biology 419, 2337 (2008) Logan, J., et al. A poly(A) addition site and a downstream termination region are required for efficient cessation of transcription by RNA polymerase II in the mouse beta maj-globin gene. Proceedings of the National Academy of Sciences 23, 83068310 (1987) Nabavi, S., & Nazar, R. N. Nonpolyadenylated RNA polymerase II termination is induced by transcript cleavage. Journal of Biological Chemistry 283, 1360113610 (2008)

Translation: DNA to mRNA to Protein


The genes in DNA encode protein molecules, which are the "workhorses" of the cell, carrying out all the functions necessary for life. For example, enzymes, including those that metabolize nutrients and synthesize new cellular constituents, as well as DNA polymerases and other enzymes that make copies of DNA during cell division, are all proteins. In the simplest sense, expressing a gene means manufacturing its corresponding protein, and this multilayered process has two major steps. In the first step, the information in DNA is transferred to a messenger RNA (mRNA) molecule by way of a process called transcription. During transcription, the DNA of a gene serves as a template for complementary base-pairing, and an enzyme called RNA polymerase III catalyzes the formation of a pre-mRNA molecule, which is then processed to form mature mRNA (Figure 1). The resulting mRNA is a single-stranded copy of the gene, which next must be translated into a protein molecule. During translation, which is the second major step in gene expression, the mRNA is "read" according to the genetic code, which relates the DNA sequence to the amino acid sequence in proteins (Figure 2). Each group of three base pairs in mRNA constitutes a codon, and each codon specifies a particular amino acid (hence, it is a triplet code). The mRNA sequence is thus used as a template to assemblein orderthe chain of amino acids that form a protein. But where does translation take place within a cell? What individual substeps are a part of this process? And does translation differ between prokaryotes and eukaryotes? The answers to questions such as these reveal a great deal about the essential similarities between all species.

Figure 1 : DNA Is Transcribed to Form RNA. DNA is partially unwound by RNA polymerase to serve as a template for RNA synthesis. The RNA transcript is formed and then peels away, allowing the DNA that has already been transcribed to rewind into a double helix. Three distinct processes -initiation, elongation, and termination -- constitute DNA transcription. RNA polymerase is much larger in reality than indicated here, covering about 50 base pairs.

Figure 2 : The genetic code consists of 64 codons and thea mino acids specified by these codons. The codons are written 5'-->3', as they appear in the mRNA. AUG is an initiation codon; UAA, UAG, and UGA are termination (stop) codons.

Where Translation Occurs


Within all cells, the translation machinery resides within a specialized organelle called the ribosome. In eukaryotes, mature mRNA molecules must leave the nucleus and travel to the cytoplasm, where the ribosomes are located. On the other hand, in prokaryotic organisms, ribosomes can attach to mRNA while it is still being transcribed. In this situation, translation begins at the 5' end of the mRNA while the 3' end is still attached to DNA. In all types of cells, the ribosome is composed of two subunits: the large (50S) subunit and the small (30S) subunit (S, for svedberg unit, is a measure of sedimentation velocity and, therefore, mass). Each subunit exists separately in the cytoplasm, but the two join together on the mRNA molecule. The ribosomal subunits contain proteins and specialized RNA moleculesspecifically, ribosomal RNA (rRNA) and transfer RNA (tRNA). The tRNA molecules are adaptor moleculesthey have one end that can read the triplet code in the mRNA through complementary base-pairing, and another end that attaches to a specific amino acid (Chapeville et al., 1962; Grunberger et al., 1969). The idea that tRNA was an adaptor molecule was first proposed by Francis Crick, co-discoverer of DNA structure, who did much of the key work in deciphering the genetic code (Crick, 1958).

Within the ribosome, the mRNA and aminoacyl-tRNA complexes are held together closely, which facilitates base-pairing. The rRNA catalyzes the attachment of each new amino acid to the growing chain.

The Beginning of mRNA Is Not Translated


Interestingly, not all regions of an mRNA molecule correspond to particular amino acids. In particular, there is an area near the 5' end of the molecule that is known as the untranslated region (UTR) or leader sequence. This portion of mRNA is located between the first nucleotide that is transcribed and the start codon (AUG) of the coding region, and it does not affect the sequence of amino acids in a protein (Figure 3). So, what is the purpose of the UTR? It turns out that the leader sequence is important because it contains a ribosome-binding site. In bacteria, this site is known as the ShineDalgarno box (AGGAGG), after scientists John Shine and Lynn Dalgarno, who first characterized it. A similar site in vertebrates was characterized by Marilyn Kozak and is thus known as the Kozak box. In bacterial mRNA, the 5' UTR is normally short; in human mRNA, the median length of the 5' UTR is about 170 nucleotides. If the leader is long, it may contain regulatory sequences, including binding sites for proteins, that can affect the stability of the mRNA or the efficiency of its translation.

Figure 3: A transcription unit includes a promoter, an RNA-coding region, and a terminator. Used with permission. Copyright 2005 by W. H. Freeman and Company. All rights reserved.

Translation Begins After the Assembly of a Complex Structure


The translation of mRNA begins with the formation of a complex on the mRNA (Figure 4). First, three initiation factor proteins (known as IF1, IF2, and IF3) bind to the small subunit of the ribosome. This preinitiation complex and a methionine-carrying tRNA then bind to the mRNA, near the AUG start codon, forming the initiation complex. Although methionine (Met) is the first amino acid incorporated into any new protein, it is not always the first amino acid in mature proteinsin many proteins, methionine is removed after translation. In fact, if a large number of proteins are sequenced and

compared with their known gene sequences, methionine (or formyl methionine) occurs at the N-terminus of all of them. However, not all amino acids are equally likely to occur second in the chain, and the second amino acid influences whether the initial methionine is enzymatically removed. For example, many proteins begin with methionine followed by alanine. In both prokaryotes and eukaryotes, these proteins have the methionine removed, so that alanine becomes the N-terminal amino acid (Table 1). However, if the second amino acid is lysine, which is also frequently the case, methionine is not removed (at least in the sample proteins that have been studied thus far). These proteins therefore begin with methionine followed by lysine (Flinta et al., 1986). Table 1 shows the N-terminal sequences of proteins in prokaryotes and eukaryotes, based on a sample of 170 prokaryotic and 120 eukaryotic proteins (Flinta et al., 1986). In the table, M represents methionine, A represents alanine, K represents lysine, S represents serine, and T represents threonine. Table 1: N-Terminal Sequences of Proteins Percent of ProkaryoticPercent of Eukaryotic N-Terminal Proteins with ThisProteins with This Sequence Sequence Sequence MA* 28.24% 19.17%

MK**

10.59%

2.50%

MS*

9.41%

11.67%

MT*

7.65%

6.67%

* Methionine was removed in all of these proteins ** Methionine was not removed from any of these proteins Once the initiation complex is formed on the mRNA, the large ribosomal subunit binds to this complex, which causes the release of IFs (initiation factors). The large subunit of the ribosome has three sites at which tRNA molecules can bind. The A (amino acid) site is the location at which the aminoacyl-tRNA anticodon base pairs up with the mRNA codon, ensuring that correct amino acid is added to the growing polypeptide chain. The P (polypeptide) site is the location at which the amino acid is transferred from its tRNA to the growing polypeptide chain. Finally, the E (exit)

site is the location at which the "empty" tRNA sits before being released back into the cytoplasm to bind another amino acid and repeat the process. The initiator methionine tRNA is the only aminoacyl-tRNA that can bind in the P site of the ribosome, and the A site is aligned with the second mRNA codon. The ribosome is thus ready to bind the second aminoacyl-tRNA at the A site, which will be joined to the initiator methionine by the first peptide bond.

Figure 4 : The initiation of translation. Translation begins with the formation of an initiation complex.

The Elongation Phase


The next phase in translation is known as the elongation phase (Figure 5). First, the ribosome moves along the mRNA in the 5'-to-3'direction, which requires the elongation factor G, in a process called translocation. The tRNA that corresponds to the second codon can then bind to the A site, a step that requires elongation factors (in E. coli, these are called EF-Tu and EF-Ts), as well as guanosine triphosphate (GTP) as an energy source for the process. Upon binding of the tRNA-amino acid complex in the A site, GTP is cleaved to form guanosine diphosphate (GDP), then released along with EF-Tu to be recycled by EF-Ts for the next round. Next, peptide bonds between the now-adjacent first and second amino acids are formed through a peptidyl transferase activity. For many years, it was thought that an enzyme catalyzed this step, but recent evidence indicates that the transferase activity is a catalytic function of rRNA (Pierce, 2000). After the peptide bond is formed, the ribosome shifts, or translocates, again, thus causing the tRNA to occupy the E site. The tRNA is then released to the cytoplasm to pick up another amino acid. In addition, the A site is now empty and ready to receive the tRNA for the next codon. This process is repeated until all the codons in the mRNA have been read by tRNA molecules, and the amino acids attached to the tRNAs have been linked together in the growing polypeptide chain in the appropriate order. At this point, translation must be terminated, and the nascent protein must be released from the mRNA and ribosome.

Figure 5 : The elongation of translation comprises three steps.

Termination of Translation
There are three termination codons that are employed at the end of a protein-coding sequence in mRNA: UAA, UAG, and UGA. No tRNAs recognize these codons. Thus, in the place of these tRNAs, one of several proteins, called release factors, binds and facilitates release of the mRNA from the ribosome and subsequent dissociation of the ribosome.

Comparing Eukaryotic and Prokaryotic Translation


The translation process is very similar in prokaryotes and eukaryotes. Although different elongation, initiation, and termination factors are used, the genetic code is generally identical. As previously noted, in bacteria, transcription and translation take place simultaneously, and mRNAs are relatively short-lived. In eukaryotes, however, mRNAs have highly variable half-lives, are subject to modifications, and must exit the nucleus to

be translated; these multiple steps offer additional opportunities to regulate levels of protein production, and thereby fine-tune gene expression.

References and Recommended Reading


Chapeville, F., et al. On the role of soluble ribonucleic acid in coding for amino acids, Proceedings of the National Academy of Sciences 48, 10861092 (1962) Crick, F. On protein synthesis. Symposia of the Society for Experimental Biology 12, 138163 (1958) Flinta, C., et al. Sequence determinants of N-terminal protein processing. European Journal of Biochemistry 154, 193196 (1986) Grunberger, D., et al. Codon recognition by enzymatically mischarged valine transfer ribonucleic acid. Science 166, 16351637 (1969) doi:10.1126/science.166.3913.1635 Kozak, M. Point mutations close to the AUG initiator codon affect the efficiency of translation of rat preproinsulin in vivo. Nature 308, 241246 (1984) doi:10.1038308241a0 (link to article) ---. Point mutations define a sequence flanking the AUG initiator codon that modulates translation by eukaryotic ribosomes. Cell 44, 283292 (1986) ---. An analysis of 5'-noncoding sequences from 699 vertebrate messenger RNAs. Nucleic Acids Research 15, 81258148 (1987) Pierce, B. A. Genetics: A conceptual approach (New York, Freeman, 2000) Shine, J., & Dalgarno, L. Determinant of cistron specificity in ribosomes. Nature 254, 3438 (1975) doi:10.1038/254034a0 (link to article) bacterial