Chemiotics: How many proteins can we make?

Posted on behalf of Retread

The mass of the earth is given by my physics book (Halliday 6th Ed.) as 6 × 10^27 grams. If we made just one molecule of each protein containing n amino acids linked together, when would we run out of material? Make a guess. I found the results surprising.

Assume the earth is made of nothing but hydrogen, oxygen, nitrogen, carbon and sulfur. Clearly not true, but we’re going for what mathematicians call an upper bound. If mathematicians can get away with things like “consider a spherical cow” I can get away with this. (The cognoscenti may wish to go for a least upper bound). Proteins are linear chains of 20 different amino acids ranging in mass from glycine at 79 Daltons to tryptophan at 204. When linked together by an amide (peptide) bond, 18 Daltons of mass is lost (water is split out). So figure the average amino acid at 100 Daltons (roughly).

So there are 20 × 20 = 400 distinct proteins of 2 amino acids, 8000 with 3, 160,000 with 4, 3,200,000 with just 5. Shorties like this are called peptides (or polypeptides) and just when you start calling them proteins seems to be a matter of taste.

We’re figuring the mass of the typical amino acid at 100 Daltons, but a Dalton doesn’t have much mass. It is 1/12 the mass of a single atom of carbon-12, Avogadro’s number (about 6 × 10^23) of which have a mass of 12 grams. So one Dalton has a mass of 10^-24 grams (roughly).

The number of distinct proteins containing n amino acids is 20^n. The mass of each protein (in Daltons) is (roughly) 100 x n — depending on the amino acids chosen. The mass of the collection of distinct proteins of length n in grams is (20^n) x (100 x n) x (10^-24). It’s clear that we’re over 1 gram for the collection at only 24 amino acids (as 20^24 is much larger than 10^-24. How far over? 2^24 × 100 × 24 = 40,265,318,400 = 4 × 10^10 grams.

As noted, the mass of the earth is 6 × 10^27 grams. So we’re not too far away at 24 amino acids. Certainly no farther away than another 17 amino acids as 20^17 is much greater than 10^17.

So, the mass of the earth (which isn’t all carbon, hydrogen, etc… ) isn’t enough to make just one molecule of each of the possible proteins 41 amino acids long. 41 amino acids is a very small protein (some would call it a polypeptide). Just about every protein of biological interest is much larger. The champ is a muscle protein called titin which has 27,000+ amino acids.

So what? It means that chemists will never be able to explore more than a tiny morsel of the space of possible proteins. Perhaps computationally we will (I doubt it), but that’s the subject of a future post.