Posted on behalf of Retread
Making DNA is metabolically expensive. 4 ATPs are consumed making adenine (and that’s even when you start with 5 phosphoribosyl alpha pyrophosphate – PPRP). This is why parasites living inside cells have such small genomes. As soon as they figure out a way to get the host to do their metabolic work, they jettison the (now redundant) DNA. The leprosy organism which lives inside cells sheathing nerve processes has only two-thirds the DNA of its cousin, the tuberculosis organism. There are many similar examples and not all are bacterial.
As you know, ‘the’ genetic code is made of nucleotides which come in four varieties (abbreviated A, T, G, C). There are 16 possible combinations when nucleotides are taken 2 at a time, 64 combinations taken 3 at a time. 64 combinations is clearly is overkill for just 20 amino acids. So most amino acids have multiple combinations of 3 nucleotides (called codons) which code for them – these are the synonymous codons. Two amino acids (leucine, arginine) have 6 synonymous codons, 2 have none (e.g., just one codon – methionine and tryptophan), the rest fall inbetween.
If proteins contained only 15 amino acids, you could cut genome size by one-third – that’s 4 billion or so ATPs/cell if the 3 other nucleotides are as expensive to make as adenine. As the late senator Dirksen used to say – a billion here, a billion there, pretty soon you’re talking real money (this was in a older, happier pre-bailout time).
Why 15 and not 16 amino acids? Because you need a codon to tell the machinery when to stop – such codons were known as ‘nonsense’, back in the day when all the genome was thought to do was code for protein.
Look at the side chains of the 20 amino acids with your chemist’s eye. Some are so similar as to be redundant. Glutamic acid and aspartic acid are chemically the same, differing only by a methylene group – get rid of one. Glutamine and asparagine are just the amides of the two acids (why they aren’t called glutamide and asparamide is beyond me). Get rid of one of them. Similarly threonine and serine differ only by an extra methyl group. Not only that but the several hundred different enzymes which add phosphate to them (inappropriately called kinases) don’t bother to tell them apart – get rid of one. Do we really need 4 different hydrocarbon side chains (methyl, isopropyl, sec-butyl, isobutyl)? Maddeningly sec-butyl belongs to isoleucine, and isobutyl belongs to leucine. Get rid of two of them – probably a long one and a short one. Other chemists might choose different amino acids to let go.
Removing these 5 amino acids from the total cuts the DNA required to code for them down by one-third, saving all that synthetic ATP. Of course, synonymous codons disappear in the process. Nonetheless, we should be able to build pretty decent proteins from the 15 amino acids we have left. No chemical functionality present in the original 20 has been lost.
Clearly this hasn’t happened in the real world. Just why not is probably a matter of history, and an endless source of armchair speculation (like this post). Could there be a reason for all this coding redundancy, or at least could there be mechanisms to keep it in place?
I think such mechanisms exist, but you’ll have to give up the protein-centric notion that all DNA does is code for protein. Even better, there is excellent recent hard experimental data to back this up. But that’s the subject of the next post.