Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydrates

How to specialise deep learning models in the prediction of complex reactions using small data sets.

Like Comment
Read the Paper

We are currently witnessing a machine learning-driven revolution in chemistry - not only for de novo molecule design but also in reaction prediction, synthesis planning and lab automation. Most of the recent work in chemical reaction prediction, the task of predicting the most likely products given precursors (reactants and reagents), uses a public dataset of reactions from patents, called the USPTO MIT dataset, as a benchmark. This common dataset allows comparing different methods with each other. One drawback is however that the USPTO MIT dataset mostly contains simple reactions, and lacks complex transformations involving stereochemistry. 

The most successful approach for reaction prediction to date is the Molecular Transformer [1]. This transformer architecture, initially introduced for neural machine translation [2], works with a text-based representation of molecules called SMILES. In contrast to human language transformer, the Molecular transformer does not learn a translation function from one language (e.g. English) to another (e.g. French) but a translation function from precursors to products (Figure 1). Similar to a human, the model learns from examples. By repeatedly seeing millions of chemical reactions, it captures the underlying patterns of the transformations. After training, given an unseen input (precursors), the trained model can predict the most likely outcomes of the reaction (products).   

Figure 1: Analogy between human language and chemistry, enabled by the textual representation of molecules and chemical reactions. 

When we started experimenting with complex chemical reactions, we realized that stereochemistry is one major weakness of reaction prediction models.  Prediction models are often trained with simple reactions which almost completely lack stereochemistry. For instance, we found that reaction prediction was particularly poor for reactions involving carbohydrates. Carbohydrates represent some of the most complex molecules in terms of stereochemistry because nearly every carbon atom is functionalized and stereogenic, resulting in a reactivity that can be surprising for non-experts. For example, mannose and glucose differ only by one stereocentre, which dramatically changes their reactivity. 

Figure 2: Comparing the different training strategies. Transfer learning enables good performance on specific reaction classes, given a small representative training data set. 

In "Transfer Learning Enables the Molecular Transformer to Predict Regio- and Stereoselective Reactions on Carbohydrates" [3], we investigated different strategies to push the limits of current reaction prediction models and teach them to predict more complex reactions, which are challenging to predict even for human experts. Inspired by the current trends in Natural Language Processing, we applied transfer learning and explored two real-world scenarios (Figure 2). Transfer learning exploits the knowledge extracted from abundant generic data (here patent reactions) to improve predictions on a specific task where less data is available (carbohydrates reactions). While in multi-task training strategy the model has access to both data sets simultaneously (48 hours on 1 GPU), in the sequential training strategy a model pretrained on the patent reactions is then specialised on the carbohydrate reactions (1.5 hours on 1 GPU). The latter training is not only much faster but particularly useful when the generic data set (e.g. proprietary data) cannot be shared. 

We demonstrated how transfer learned Molecular Transformers capture numerous of the subtle differences between monosaccharides and protecting groups. As a consequence, the prediction quality of crucial regio- and stereoselective transformations such as glycosylation, regioselective protection and epimerization significantly increases. 

Moreover, we reported the first experimental validation of reaction prediction models on an in house developed and previously unpublished 14 step synthesis of a lipid-linked oligosaccharide. To further assess the method, the models were evaluated on a recently published total synthesis [4] and showed similar increase in performance. 

We anticipate that the methods and models we presented in our work will not only accelerate carbohydrate synthesis but be applicable to any complex reaction subspace of interest. 

[1] P Schwaller, T Laino, T Gaudin, P Bolgar, C Bekas, AA Lee. Molecular Transformer: A Model for Uncertainty-Calibrated Chemical Reaction Prediction. ACS Cent. Sci., 5, 9, 1572–1583 (2019)
[2] A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones, A Gomez, L Kaiser, I Polosukhin. Attention is all you need.  Adv. Neural Inf, Process. Syst., 30, 59986008 (2017)
[3] G Pesciullesi, P Schwaller, T Laino, JL Reymond. Transfer learning enables the molecular transformer to predict regio- and stereoselective reactions on carbohydratesNat. Commun. (2020)
[4] A Behera, D Rai, S Kulkarni. Total Syntheses of Conjugation-Ready Trisaccharide Repeating Units of Pseudomonas aeruginosa O11 and Staphylococcus aureus Type 5 Capsular Polysaccharide for Vaccine Development. J. Am. Chem. Soc., 142, 1, 456–467 (2020)

Philippe Schwaller

PhD student, IBM Research - Europe | University of Bern