Transcription terminates at particular base sequences

What tells RNA polymerase to stop adding nucleotides to a growing RNA transcript? Just as initiation sites specify the start of transcription, particular base sequences in the DNA specify its termination. The mechanisms of termination are complex and of more than one kind. For some genes, the newly formed transcript simply falls away from the DNA template and the RNA polymerase. For others, a helper protein pulls the transcript away. In prokaryotes, in which there is no nuclear envelope and ribosomes can be near the chromosome, the translation of mRNA often begins near the 5′ end of the mRNA before transcription of the mRNA molecule is complete. In eukaryotes, the situation is more complicated. First, there is a spatial separation of transcription (in the nucleus) and translation (in the cytoplasm). Second, the first product of transcription is a premRNA that is longer than the final mRNA and must undergo considerable processing before it can be translated.

The Genetic Code

How do transcription and translation produce specific and functional protein products? These processes require a geneticcode that relates genes (DNA) to mRNA and mRNA to the amino acids of proteins. The genetic code specifies which amino acids will be used to build a protein. You can think of the genetic information in an mRNA molecule as a series of sequential, nonoverlapping three-letter “words.” Each sequence of three nucleotide bases (the three “letters”) along the chain specifies a particular amino acid. Each three-letter “word” is called a codon. Each codon is complementary to the corresponding triplet in the DNA molecule from which it was transcribed. Thus, the genetic code is the means of relating codons to their specific amino acids. The complete genetic code is shown in Figure . Notice that there are many more codons than there are different amino acids in proteins. Combinations of the four available “letters” (the bases) give 64 (43) different three-letter codons, yet these codons determine only 20 amino acids. AUG, which codes for methionine, is also the start codon, the initiation signal for translation. Three of the codons (UAA, UAG, UGA) are stop codons, or termination signals for translation; when the translation machinery reaches one of these codons, translation stops, and the polypeptide is released from the translation complex. After describing the properties of the genetic code, we will examine some of the scientific thinking and experimentation that went into deciphering it.

The genetic code is redundant but not ambiguous

After the start and stop codons, the remaining 60 codons are far more than enough to code for the other 19 amino acids— and indeed there are repeats. Thus we say that the genetic code is redundant; that is, an amino acid may be represented by more than one codon. The redundancy is not evenly divided among the amino acids. For example, methionine and tryptophan are represented by only one codon each, whereas leucine is represented by six different codons. The term “redundancy” should not be confused with “ambiguity.” To say that the code was ambiguous would mean that a single codon could specify either of two (or more) different amino acids; there would then be doubt whether to put in, say, leucine or something else. The genetic code is not ambiguous. Redundancy in the code means that there is more than one clear way to say, “Put leucine here.” In other words, a given amino acid may be encoded by more than one codon, but a codon can code for only one amino acid. But just as people in different places prefer different ways of saying the same thing—”Good-bye!” “See you!” “Ciao!” and “So long!” have the same meaning—different organisms prefer one or another of the redundant codons.

The genetic code is (nearly) universal

Over 40 years of experiments on thousands of organisms from all the living domains and kingdoms reveal that the genetic code appears to be nearly universal, applying to all the species on our planet. Thus the code must be an ancient one that has been maintained intact throughout the evolution of living organisms. Exceptions are known: within mitochondria and chloroplasts, the code differs slightly from that in prokaryotes and in the nuclei of eukaryotic cells; in one group of protists, UAA and UAG code for glutamine rather than functioning as stop codons. The significance of these differences is not yet clear. What is clear is that the exceptions are few and slight. The common genetic code means that there is also a common language for evolution. As natural selection resulted in one species replacing another, the raw material of genetic variation has remained the same. The common code also has profound implications for genetic engineering since it means that a human gene is in the same language as a bacterial gene. The differences are more like dialects of a single language than entirely different languages. So the transcription and translation machinery of a bacterium could theoretically utilize genes from a human as well as its own genes. The codons in Figure are mRNA codons. The base sequence on the DNA strand that was transcribed to produce the mRNA is complementary and antiparallel to these codons. Thus, for example, 3′-AAA-5′in the template DNA strand corresponds to phenylalanine (which is encoded by the mRNA codon 5′-UUU-3′), and 3′-ACC-5′in the template DNA corresponds to tryptophan (which is encoded by the mRNA codon 5′-UGG-3′). How assign these codons to specific amino acids?

Biologists deciphered the genetic code by using artificial messengers

Molecular biologists broke the genetic code in the early 1960s. The problem was perplexing: How could more than 20 “code words” be written with an “alphabet” consisting of only four “letters”? How, in other words, could four bases (A, U, G, and C) code for 20 different amino acids? That the code was a triplet code, based on three-letter codons, was considered likely. Since there are only four letters (A, G, C, U), a one-letter code clearly could not unambiguously encode 20 amino acids; it could encode only four of them. A two-letter code could contain only 4 x 4 = 16 codons—still not enough. But a triplet code could contain up to 4 x 4 x 4 = 64 codons. This was more than enough to encode the 20 amino acids. Marshall W. Nirenberg and J. H. Matthaei, at the National Institutes of Health, made the first decoding breakthrough in 1961 when they realized that they could use a simple artificial polynucleotide instead of a complex natural mRNA as a messenger. They could then identify the polypeptide that the artificial messenger encoded. Scientists prepared an artificial mRNA in which all the bases were uracil (poly U). When poly U was added to a test tube containing all the ingredients necessary for protein synthesis (ribosomes, all the amino acids activating enzymes, tRNAs and other factors), a polypeptide formed. This polypeptide contained only one kind of amino acid: phenylalanine (Phe). Poly U coded for poly Phe. Accordingly, UUU appeared to be the mRNA code word—the codon—for phenylalanine. Following up on this success, Nirenberg and Matthaei soon showed that CCC codes for proline and AAA for lysine. (Poly G presented some chemical problems and was not tested initially.) UUU, CCC and AAA were three of the easiest codons; different approaches were required to work out the rest. Other scientists later found that simple artificial mRNAs only three nucleotides long — each amounting to a codon — could bind to a ribosome, and that the resulting complex could then cause the binding of the corresponding tRNA with its specific amino acid. Thus, for example, simple UUU caused the tRNA carrying phenylalanine to bind to the ribosome. After this discovery, complete deciphering of the genetic code was relatively simple. To find the “translation” of a codon, Nirenberg could use a sample of that codon as an artificial mRNA and see which amino acid became bound to it.