ALGE 3 - Gene Expression

1. Introduction

The information content of DNA, the genetic material, is in the form of specific sequences of nucleotides along the DNA strands. How is this information related to an organism’s inherited traits? What does a gene actually say? How is this message translated by cells into a specific trait?

2. What is a Gene?

A gene is a unit of genetic material that controls the inheritance of one “unit character” or one attribute of phenotype. A gene is a segment of double stranded DNA which carries information for the formation of a specific polypeptide. The following are the properties of a gene. It is stable from generation to generation. Despite its stability, it has a variable degree of mutability. It has alternative forms called alleles that produce distinct phenotypic effects. It can undergo self replication.

3. Directional Flow of Genetic Information

The flow of genetic information in cells generally proceeds from DNA to RNA to protein. The principle of directional information flow from DNA to RNA to protein is known as the central dogma of molecular biology.

Thus the flow of genetic information involves transcription of information carried by DNA into a form of mRNA using a strand of DNA as a template; followed by translation from the nucleotide sequence of mRNA molecule to the amino acid sequence of a polypeptide chain.

Since the structure, function, development and reproduction of an organism depends on the properties of proteins in each cell and tissue, we can say that genes determine an organism’s inherited traits.

4. Ribonucleic Acids

4.1 Differences between RNA and DNA

RNA is a single polynucleotide chain, often folded into a complex structure. It is shorter than DNA with a lower molecular mass. The pentose sugar is ribose. Nitrogenous bases present are A, G, C, and U. Ratio of A:U and C:G varies. It is manufactured in the nucleus but found throughout the cell. Its amount varies from cell to cell and within a cell, according to metabolic activity. It is chemically less stable than DNA and more reactive because of the additional –OH side group of ribose. Its occurrence is temporary and has 3 main forms messenger RNA (mRNA), transfer RNA (tRNA) and ribosomal RNA (rRNA).

DNA is a double polynucleotide chain. It is longer than RNA with a higher molecular mass. The pentose sugar is deoxyribose. Nitrogenous bases present are A, T, G, and C. The ratio of A:T and C:G is one. It is found almost entirely in the nucleus and some in the mitochondrion and chloroplasts. The amount is constant for all cells of a species except gametes and spores. It is chemically more stable and more resistant to alkali degradation because of the absence of the –OH side group. It is a permanent feature and has only one basic form but with an almost infinite variety within that form.

4.2 Functions and Types of RNA

Messenger RNA carries information specifying amino acid sequences of proteins from DNA to ribosomes. Transfer RNA serves as an adaptor molecule in protein synthesis; translates mRNA codons into amino acids. Ribosomal RNA plays structural and catalytic roles in ribosomes. Small nuclear RNA plays structural and catalytic roles in spliceosomes, the complexes of protein and RNA that splice pre mRNA in the eukaryotic nucleus.

5. Transcription

Transcription is the synthesis of a single-stranded RNA molecule under the direction of DNA. Both DNA and RNA are nucleic acids, thus they share the same ‘language’ and the information is simply transcribed or copied from one molecule to another. Only one of the 2 DNA strands is transcribed into an RNA molecule following complementary base-pairing rules (A with U and G with C).

During transcription, RNA is synthesized in the 5’ to 3’ direction. The 3’ to 5’ DNA strand that is read to make the RNA strand is called the template strand / sense strand / non-coding strand.

The 5’ to 3’ DNA strand complementary to the template strand that has the same polarity as the resulting RNA is called the non-template strand / anti-sense strand / coding strand.

Specific sequences of nucleotides along the DNA mark where transcription of a gene begins and ends. The segment of DNA that is transcribed into a single, continuous RNA molecule is called a transcription unit.

In both prokaryotes and eukaryotes, transcription is catalyzed by an enzyme called RNA polymerase. Prokaryotes have a single type of RNA polymerase that synthesizes all types of RNA. Eukaryotes have 3 types of RNA polymerases in their nuclei- I, II and III. The one used for messenger RNA (mRNA) synthesis is RNA polymerase II

In transcription of a protein-coding gene, the resulting mRNA is a faithful transcript of the gene’s protein-building instructions. It then carries a genetic message from DNA to the protein-synthesizing machinery of the cell. In prokaryotes, the mRNA transcript functions directly as the mRNA molecule for translation. In eukaryotes, the mRNA transcript must be modified by a series of events known as mRNA processing in order to produce he mature mRNA.

5.1 Transcription Process in Prokaryotes

A prokaryotic protein-coding gene may be divided into 3 sequences with respect to its transcription.

The promoter is a sequence upstream (5’) of the start of the RNA coding sequence with which RNA polymerase interacts to determine the start point for transcription.

The RNA coding sequence is the sequence of DNA base pairs transcribed by the RNA polymerase into single-stranded mRNA transcript.

The terminator is a sequence downstream (3’) of the end of he RNA coding sequence which specifies where transcription stops.

There are 3 stages of transcription of the protein coding gene.

Binding of RNA polymerase to a DNA promoter sequence, triggering local unwinding of the DNA double helix and initiating synthesis of the mRNA chain.

Elongation of the mRNA chain as RNA polymerase catalyzes the polymerization of nucleotides in an order determined by their base pairing with the DNA template strand.

Termination of mRNA synthesis, release of the completed mRNA molecule and dissociation of RNA polymerase from the DNA template.

5.1.1 RNA Polymerase Binding and Initiation of Transcription

The first step in mRNA synthesis is the binding of RNA polymerase to a DNA promoter site, that is, a specific sequence upstream (towards the 5’ end) of the mRNA coding sequence. The promoter determines where mRNA synthesis starts and which DNA strand is to serve as the template strand.

A prokaryotic promoter consists of a transcription start point, the – 10 sequence (pribnow box0 approximately 10 bases upstream of the start point denoted as +1, and the – 35 sequence approximately 35 bases upstream of the start point.

Once the RNA polymerase has bound to a promoter site and locally unwound the DNA double helix, initiation of mRNA synthesis can occur. One of the 2 exposed segments of single-stranded DNA serves as the template for the synthesis of mRNA, using incoming ribonucleoside triphosphate molecules as substrates,

As soon as the 1^st 2 incoming ribonucleoside triphosphate molecules are hydrogen bonded to the complementary bases of the DNA template strand at the start point, RNA polymerase catalyses the formation of a phosphodiester bond between the 3’-hydroxyl group of the 1^st molecule and the 5’-phosphate of the second, accompanied by the release of pyrophosphate.

5.1.2 Elongation of the RNA Chain

As RNA polymerase moves along the DNA, it continues to untwist the double helix, exposing about 10-20 bases at a time for pairing with RNA nucleotides.

The enzyme moves along the template DNA strand form the 3’ towards the 5’ end. Because complementary base pairing between the DNA template strand and the newly formed mRNA chain is anti-parallel, the mRNA is elongated in the 5’ to 3’ direction as each successive nucleotide is added to the 3’ end of the growing chain.

As the RNA chain grows, the most recently added nucleotides remain base paired with the DNA template strand, forming a short RNA-DNA hybrid. The polymerase continues to move on, rewinding the DNA behind (causing the new mRNA to peel away from the DNA template) and unwinding the DNA ahead as it goes.

A single gene can be transcribed simultaneously by several molecules of RNA polymerase following each other. This increases the number of mRNA molecules and helps the cell make a protein in large amounts.

Unlike DNA polymerase, RNA polymerase has no 3’ to 5’ exonuclease activity which would allow it to correct mistakes by removing mismatched base pairs as RNA elongation proceeds. Thus RNA synthesis is subject to a higher error rate than DNA synthesis. This does not appear to create a problem, since multiple RNA molecules are usually transcribed from each gene, and so a small number of inaccurate copies can be tolerated.

5.1.3 Termination of Transcription

Elongation of the growing mRNA chain proceeds until shortly after RNA polymerase transcribes a sequence called the terminator or termination signal that triggers the end of transcription.

Transcription usually stops at the end of the termination signal; when the RNA polymerase reaches that point, it releases the mRNA and DNA. In prokaryotes, the initial transcript is the mature mRNA molecule and it is immediately translated on ribosomes.

5.2 Transcription and mRNA Processing in Eukaryotes

Transcription in eukaryotes involves the same 3 stages as described for prokaryotes, but the process is more complicated than in prokaryotes.

3 different RNA polymerases transcribe the DNA in the eukaryotic nucleus. Each synthesizes 1 or more classes of RNA.

Eukaryotic promoters are more varied than prokaryotic promoters. Some eukaryotic promoters are located downstream from the transcription start point.

Binding of eukaryotic RNA polymerase to the promoter requires the participation of proteins called transcription factors. Only after transcription factors are bound to the promoter does the RNA polymerase bind to it. The completed assembly of transcription factors and RNA polymerase bound to the promoter is called a transcription initiation complex.

At around -25, a short sequence called the TATA box (also called the Goldberg-Hogness box) is found. The binding stage of transcription by RNA polymerase II usually starts with the the attachment of a particular transcription factor to the TATA box. The TATA element seems to specify a particular start point for transcription.

Enhancer elements located at a large distance from the promoter are also required for maximal transcription of the gene to occur.

In the eukaryotic cell, the polymerase continues past a termination signal and the transcript is cleaved at 10-35 nucleotides downstream from the termination signal. This gives rise to a pre mRNA.

Newly formed eukaryotic mRNA molecules typically undergo extensive processing after transcription.

5.2.1 mRNA Processing

Enzymes in the eukaryotic nucleus modify per mRNA in various ways before the genetic message is dispatched to the cytoplasm. During processing, both ends of the primary transcript are altered. In most cases, certain interior sections of the molecule are cut out and the remaining parts are joined together.

5.2.1.1 Altering of mRNA Ends

5’ capping: The 5’ end of the pre mRNA is capped with a modified form of guanine nucleotide (7-methyl guanosine). This helps to protect the mRNA from degradation by hydrolytic enzymes. After the mRNA reaches the cytoplasm, the cap is essential for the ribosome to bind to the 5’ end of the mRNA, an initial step of translation.

3’ capping: An enzyme adds a poly (A) tail consisting of 30-200 adenine nucleotides. This protects the mRNA from degradation and this cap may be recognized by specific proteins involved in exporting mRNA from the nucleus to the cytoplasm.

5.2.1.2 RNA Splicing

Most eukaryotic genes and their mRNA transcripts have long non-coding stretches of nucleotides, regions that are not translated. Most non-coding sequences are interspersed between coding segments of the pre mRNA.

The non-coding segments that lie between coding regions are called intervening sequences or introns. The other regions are called exons because they are eventually expressed, that is, translated into amino acid sequences.

Before the mRNA leaves the nucleus, the introns are excised from the molecule and the exons are joined together to form an mRNA with a continuous coding sequence.

RNA splicing is very precise; signals are RNA splicing are short nucleotide sequences at the ends of introns. These are called splice sites.

Particles called small nuclear ribonucleoproteins or snRNPs recognize these splice sites. snRNPs are located in the cell nucleus. They are composed of snRNA and protein molecules.

Several different snRNPs join with additional proteins to from a spliceosome which interacts with splice sites at the ends of introns and cuts at specific sites to release the intron. It immediately joins together the 2 exons that flank the intron.

The idea of a catalytic role for snRNA arose from the discovery of ribozymes, RNA molecules that function as enzymes.

5.3 Structure of Mature mRNA molecule

The mature, biologically active mRNA molecule has 3 main parts. At the 5’ end is a leader sequence or the 5’ untranslated region (UTR). Within this leader sequence is the coded information that the ribosome reads t orient it correctly for beginning protein synthesis; none of the bases of the leader sequence are translated into amino acids. The actual coding sequence of the mRNA determines the amino acid sequence of a protein during translation. At the 3’ end is the 3’ UTR.

6. Translation (Protein Synthesis)

Translation is the synthesis of a polypeptide which occurs under the direction of mRNA. There is a change in language: the cell must translate the base sequence of an mRNA molecule into the amino acid sequence of a polypeptide. The DNA base sequence information that specifies the amino acid sequence of a polypeptide is the genetic code.

The mRNA molecule is translated in the 5’ to 3’ direction and the polypeptide is made in the N terminal to the C terminal direction. The sites of translation are ribosomes, complex particles that facilitate the orderly linking of amino acids into a polypeptide chain.

6.1 The Nature of the Genetic Code

Triplets of nucleotide bases are the smallest units of uniform length that can code for all amino acids. The flow of information from gene to protein is based on a triplet code that is a set of 3 nucleotides (a codon) in mRNA code for one amino acid in a polypeptide chain.

A direct linear correspondence between the complete base sequence of a gene and the amino acid sequence of the polypeptide chain it encodes is typical for bacterial genes. This is known as the colinearity of gene and polypeptide. However, most eukaryotic genes contain introns interspersed between exons and hence do not show a complete linear correspondence with their polypeptide product.

The code is a triplet code. An amino acid is specified by 3 nucleotides that make up one mRNA codon.

The code is comma-free; that is it is continuous. The mRNA is read continuously, 3 nucleotides at a time, without skipping any nucleotides of the message.

The code is non-overlapping. The reading frame advances 3 nucleotides at a time along the mRNA strand, so that each nucleotide is red only once.

The code is almost universal. All organisms share the same genetic language. Thus we can isolate an mRNA from one organism, translate it by using the machinery isolated from another organism, and produce the protein as if it had been translated in the original organism. However, the mitochondria of some mammals and the nuclear genome of the protozoan Tetrahymena have minor changes to the code.

The code is degenerate. More than one codon occurs for each amino acid except AUG which codes for methionine and UGG which codes for tryptophan. This multiple coding is called the degeneracy of the code. When the first 2 nucleotides in a codon are identical and the third letter is U or C, the codon always codes for the same amino acid. Despite the degeneracy of the code, not all codons are used equally. Rather, some codons are used repeatedly while others are almost never used.

The code has start and stop signals. In both prokaryotes and eukaryotes, AUG (which codes for methionine) is usually the start codon for protein synthesis. Only 61 out of 64 codons specify amino acids; these are called sense codons. The other 3 codons (UAG, UAA and UGA) do not specify an amino acid; they are called stop codons / nonsense codons / chain-terminating codons.

The code is essentially unambiguous. It is very rare that one codon specifies more than one amino acid.

Wobble occurs in the anti-codon. The complete set of 61 sense codons can be read by fewer than 61 distinct tRNAs because of the wobble in the anti-codon. The wobble hypothesis states that the base at the 5’ end of the anti-codon is not as constrained as the other 2 bases. This allows for less exact base pairing so that the base at the 5’ ns of the anti-codon can pair with one of 3 different bases at he 3’ end of the codon. If the tRNA contains the modified nucleoside inosine at the 5’ end of the anti-codon, then 3 different codons can be read by that one tRNA.

6.2 Structure of Ribosomes

Ribosomes bind to mRNA and facilitate the binding of the tRNA to the mRNA so that the polypeptide chain can be synthesized. In both prokaryotes and eukaryotes, the ribosomes consist of 2 unequally sized subunits. Each subunit contains at least one rRNA and a large number of ribosomal proteins.

The E.coli ribosome has a size of 70S and the sizes of the 2 subunits are 50S and 30S. The S value is a measure of the rate of sedimentation of a component in the centrifuge; this is related both to the molecular weight and the 3D shape of the component. Eukaryotic ribosomes are larger and more complex than their prokaryotic counterparts. Mammalian ribosomes have a size of 80S consisting of a 60S and a 40S subunit.

6.3 Structure of tRNA

A tRNA molecule consists of a single RNA strand that is only about 75-90 nucleotides long. Post-transcriptional modification of prokaryotic and eukaryotic pre-tRNAs are the addition of 5’-CCA-3’ sequence to the 3’ end and the extensive chemical modification of a number of nucleotides at many places within the chain.

The tRNA folds back upon itself to form a molecule reinforced by hydrogen bonds between nucleotide bases in certain regions of the tRNA strand and complementary bases of other regions. This results in the cloverleaf structure of tRNA that has 4 base paired stems and 4 loops.

The tRNA twists and folds into a compact 3D L-shaped structure. The loop protruding form one end of the L includes he anti-codon; the base triplet that recognizes and binds to a specific mRNA codon. From the other end of the L-shaped tRNA molecule protrudes the sequence 5’-CCA-3’. The tRNA molecule is liked to its corresponding amino acid by an ester bond that joins the specific amino acid to the 3’-OH group of the adenine nucleotide at the 3’ end.

6.4 The Translation Process

The cellular machinery for translating mRNAs into polypeptides involves 5 major components. Ribosomes that carry our the process of polypeptide synthesis; tRNA molecules that bring amino acids to the ribosome-mRNA complex, where they are polymerized into protein chains in the translation process; aminoacyl-tRNA synthetases that attach amino acids to the appropriate tRNA molecules; mRNA molecules that encode the amino acid sequence information for polypeptides to be synthesized and protein factors that facilitate several steps in the translation process.

The correct amino acid sequence is achieved as the result of the following. The specific binding of each amino acid to its own specific tRNA and the specific binding between the codon of the mRNA and the complementary anti-codon in the tRNA.

6.4.1 Attachment of Amino Acid to tRNA (Activation of Amino Acids)

Before a tRNA molecule can being its appropriate amino acid to the ribosome, the amino acid must first be covalently attached to the tRNA. The enzymes responsible for linking amino acids to their corresponding tRNA molecules are called aminoacyl-tRNA synthetases. Cells have 20 different types of these enzymes, one for each of the 20 amino acids used in protein synthesis.

Since degeneracy is common in the genetic code (a single amino acid may be specified by more than one codon), all the tRNAs that are specific for a particular amino acid must have a common recognition site, so that the amino acid can be added by the same aminoacyl-tRNA synthetase.

These enzymes catalyze the joining of an amino acid to its proper tRNAs via an ester bond, accompanied by the hydrolysis of ATP to AMP and pyrophosphate. The process of aminoacylation of a tRNA molecule is called amino acid activation because it not only links an amino acid to it proper tRNA but also activates it for subsequent peptide bond formation. The product is referred to as a charged tRNA or aminoacyl tRNA.

6.4.2 Initiation of Translation

In both prokaryotes and eukaryotes, translation usually begins at the AUG initiator codon in the mRNA which specifies methionine ( in a few mRNAs, translation begins at a GUG codon).

6.4.2.1 In Prokaryotes

The initiation steps of translation brings together mRNA, the initiator tRNA bearing the derivative N-formylmethionine and the 2 subunits of the ribosome.

First, the small (30S) ribosomal subunit binds to s specific sequence within the leader segment at the 5’ (upstream) end of the mRNA. This binding places the mRNA’s 5’-AUG-3’ start codon at the ribosome’s P site, where it can bind to the initiator tRNA with the 5’-CAU-3’ anti-codon.

The union of mRNA, initiator tRNA and the small ribosomal subunit is followed by the attachment of the large (50S) ribosomal subunit, completing the translation initiation complex.

Proteins called initiation factors are required to bring all these components together. The cell also spends energy in the form of a GTP molecule to form the initiation complex.

At the completion of the initiation process, the initiator tRNA occupies the P site of the ribosome, and the vacant A site is ready for the next aminoacyl tRNA. The methionine forms the amino terminal end of the growing polypeptide.

6.4.2.2 In Eukaryotes

Ribosome-binding sites are not found in eukaryotic mRNAs. An initiation complex binds to the cap at the 5’ end of the mRNA and moves along he mRNA, scanning for the initiator AUG codon.

6.4.3 Elongation of the Polypeptide Chain

The elongation stage of polypeptide chain synthesis involves a repetitive 3 step cycle.

6.4.3.1 Codon Recognition

Binding of an aminoacyl tRNA to the A site of the ribosome brings a new amino acid into position to be joined to the polypeptide chain. At the onset of the elongation stage, the AUG start codon in the mRNA is located at the ribosomal P site and the second codon is at the A site. Elongation begins when an aminoacyl tRNA whose anti codon is complementary to the second codon binds to the A site. Hydrogen bonds form between the codon and the anti-codon. The binding of this new aminoacyl tRNA requires 2 protein elongation factors and is driven by hydrolysis of 2 GTP molecules.

6.4.3.2 Peptide Bond Formation

Peptide bond formation links this amino acid to the initiating amino acid or growing polypeptide. The first step is the breakage of bonds between the carboxyl group of the amino acid and the tRNA at the P site. The second step is the formation of he peptide bond between the amino group of the amino acid at the A site and the carboxyl group of the amino acid at the P site. Peptide bond formation is catalyzed by a ribozyme (an enzyme made entirely of RNA). An rRNA molecule of the large ribosomal subunit retains peptidyl transferase activity, thus functioning as a ribozyme.

6.4.3.3 Translocation

The mRNA is advanced at a distance of 3 nucleotides by the process of translocation to bring he next codon into position for translation. The tRNA at the A site, now attached to the growing polypeptide, is translocated to the P site. As the tRNA moves, its anti-codon remains hydrogen bonded to the mRNA codon; the mRNA moves along with it and brings the next codon to be translated into the A site. The empty tRNA that was in the P site moves to the E site and from there leaves the ribosome. The translocation process requires an elongation factor and energy which is provided by the hydrolysis of a GTP molecule. The mRNA moves through the ribosome in one direction only 5’ end first; this is equivalent to the ribosome moving 5’à3’ on the mRNA.

6.4.4 Termination of Translation

The elongation process continues to read one codon after another and adding successive amino acids to the polypeptide chain, until one of the 3 possible stop codons in the mRNA arrives at the ribosomal A site. There are no tRNA molecules that recognize stop codons. Instead, a protein called a release factor binds directly to the stop codon at the A site.

The release factor causes the addition of a water molecule instead of an activated amino acid. This reaction hydrolyses the completed polypeptide from the tRNA that is at the P site, producing a free carboxyl group at the C terminus of the polypeptide. As the polypeptide is released, the other components of the ribosomal complex come apart.

6.4.4.1 Polyribosomes or Polysomes

A single mRNA is used to make many copies of a polypeptide simultaneously, because several ribosomes work on translating the same message at the same time. Once a ribosome moves past the initiation codon, a second ribosome can attach to the mRNA and thus several ribosomes may trail along the same mRNA. Such strings of ribosomes are called polyribosomes. Polyribosomes are found in both prokaryotes and eukaryotes.

6.5 Post-translational Modifications

After polypeptide chains have been synthesized, they are chemically modified e.g.by attachment of sugars, lipids, phosphate groups and other additions, before they can perform their normal functions.

In prokaryotes, the N-formyl group at the M terminus of polypeptide chains is always removed. Moreover, the methionine to which it was attached is often removed as is the methionine that starts eukaryotic polypeptides.

A single polypeptide chain may be enzymatically cleaved into 2 or more pieces. This is an example for insulin synthesis.

6.6 Protein Targeting in the Eukaryotic cell

In eukaryotic cells active in protein synthesis, 2 populations of ribosomes are evident; free and bound. Free ribosomes are suspended in the cytosol and mostly synthesize proteins that dissolve in the cytosol and function there. Bound ribosomes are attached to the cytosolic side of the ER. They make proteins of the endomembrane system and proteins that are secreted from the cell.

Synthesis of all proteins begins in the cytosol, when a free ribosome starts to translate an mRNA molecule. The process continues to completion unless the growing polypeptide cues the ribosome to attach to the ER.

The proteins destined for the endomembrane system or for secretion have a signal peptide which targets the protein to the ER. A signal recognition particle binds to the signal peptide of the newly forming polypeptide and then binds to the ER. Thus the ribosome-mRNA-polypeptide complex anchors on the ER surface. The polypeptide progressively crosses the ER membrane and enters the ER lumen.

The signal peptide is usually removed by an enzyme. The rest of the completed polypeptide, if it is to be a secretory protein, is released into the solution within the cisternal space. If the polypeptide is to be a membrane protein, it remains partially embedded in the ER membrane.

Other kinds of signal peptides are used to target polypeptides to mitochondria, chloroplasts and other organelles that are not part of the endomembrane system.