How many reactions are there? This question has been burning on our minds. Especially given that it is looking increasingly feasible to automate the invention of novel chemical reactions. So how many reactions are there? Which reactions should we focus on first? We realized that there were no codified strategies to address these questions and decided to develop an enumeration algorithm to map out all of the conceivable reactions in a particular area. Since reactions impact molecular properties, such a map could one day be used in discovery, for instance, to identify a reaction that was better able to target drugs to specific organs, like the brain or the liver.
Our mapping concept is simple enough: keep the starting materials of a reaction constant, such as an amine and carboxylic acid, and instead change the reaction conditions to produce various products with different properties. An amine and carboxylic acid are most traditionally coupled to form an amide bond, but our findings suggest there are hundreds of other ways to bring these functional groups together. Beyond a fascination with hacking the amide coupling, we chose amines and carboxylic acids because they are cheap, common, and available in more flavors than many other building block reagents. We are excited that our study will provide a bounded space to invent new amine–acid coupling reactions, likely using new automated methods for reaction discovery such as high-throughput experimentation. Indeed, we used a high-throughput strategy to invent a novel esterification reaction, which we applied to simple molecules and to a complex drug substrate.
Behind the scenes, one of the big developments of this project was defining an interface between traditional chemical synthesis and data science. We are a brand-new lab, this is our first paper (!), and building a community where coders and chemists could share ideas was key. Using python, we mapped 320 different amine–acid coupling transformations. Nearly all of these reactions are unknown, yet the vast majority produce substructures that appear in drugs and natural products – indicating these novel reactions could accelerate synthesis of important molecules.
From the perspective of a medicinal chemist, these reactions might be used to fine-tune the physicochemical properties of druglike molecules. An exciting aspect is that our approach offers atom-level resolution of properties, whereas the traditional approach of swapping in new building blocks changes 10-15 atoms at a time. Again, our unique combination of computational and chemical expertise came in handy. We first used cheminformatics to develop a proof of concept, then, we demonstrated the concept experimentally. Two simple molecules, o-toluic acid and p-toluidine were coupled using six different reaction conditions to produce six different products (three shown in Figure 2). One product is basic, another is acidic; one product is highly lipophilic, while others are hydrophilic, revealing a wide range of physicochemical properties can be accessed from the same two starting materials. We next repeated these experiments on complex molecules to achieve late-stage diversification of drugs and natural products.
So how many reactions are there? It’s hard to say, but we believe that our map of the amine-acid coupling system is a step in the right direction towards answering this question. Link to the article here: https://www.nature.com/article...