Posted on behalf of Retread
For years, stretches of DNA not coding for protein were called noncoding DNA. As we came to know more about DNA, sites coding for just where the transcription of DNA to messenger RNA (mRNA) should begin, along with the DNA coding for the RNA in ribosomes were grandfathered in. Then about 30 years ago, we found that most genes coding for proteins contained large stretches of DNA not coding for amino acids at all.
Dystrophin, the defective gene causing Duchenne muscular dystrophy contains 3685 amino acids, but the gene stretches over 2.2 million contiguous positions in DNA. It only takes 11,055 positions to code 3685 amino acids. However the 11,055 occur in 79 stretches (called exons), separated by 78 much larger stretches of DNA (called introns). The whole 2.2 megaBases is transcribed into mRNA and then the introns are lopped (spliced) out by a gigantic protein and RNA machine called the spliceosome, a molecular machine even larger and more complicated than the ribosome (300 proteins, 5 RNAs [see: Science vol. 307 pp. 863-864, 2005]).
Ever since the human genome project ended, people have wondered why we have so few protein coding genes (around 20,000 at last count). The humble E. Coli contains 4300 [see: Nature vol. 385 p. 472, 1997]. Not to worry, we make lots of different proteins from the same gene, by using different combinations of exons – some exons are skipped by the spliceosome when it removes introns. Different tissues (or different states of the same tissue) skip different exons depending on (as yet obscure) conditions, so lots of different variants of the same protein are made. The process is called alternative splicing and is quite common – it happens in 92-94% of human protein genes according to a recent paper [see: Nature vol. 456 pp. 470-476, 2008 and here].
What determines which exons are left in the final product and which are skipped? This is where it gets really interesting. There exist stretches of DNA called exonic splicing enhancers (ESEs) and other stretches inhibiting the splicing in of a particular exon – the exonic splicing inhibitors (ESIs). Where are the ESEs and ESIs found? In the exons themselves.
So what? And what does this have to do with synonymous codons? The commonest genetic disease of Caucasians is cystic fibrosis (CF). Using the 12th exon of CFTR (the gene mutated in CF), when one synonymous codon was switched to another, 25% of the time it resulted in skipping of exon 12 and a defective protein [see: Proc. Natl Acad. Sci. USA vol. 102 pp. 6368-6372, 2005]. So synonymous codons aren’t synonymous at all. A completely different cellular use of synonymous codons will follow in the next post, but why should chemists be interested in any of this?
Because DNA isn’t sitting there passively waiting to be read in just one way. All sorts of new chemistry is involved. There is not enough space in this post for the next two examples, but their chemistry does not involve protein-DNA interaction.
So even if we had 15 amino acids and a stop codon to begin with (as per the last post) we could never give up that extra position and all that redundancy now. We need the coding overkill because it is being used for other things. This work also has profound implications for our understanding of protein evolution. That’s also for next time.