Enumerating the DEL design chemical space
The Chemical drug-able space is very large but new technologies like DNA encoded libraries are sampling a bigger fraction of it. However, we still need to learn how to navigate the DEL chemical space to get the best out of it.
We were in a DEL videoconference discussing the cheminformatic challenges that the DEL group had, such as how to better evaluate diversity and properties of the DEL libraries and select building blocks. When the meeting was nearly over, Miguel stood up, moved to the white board and started drawing some kind diagram. He then described his dream: a tool capable of enumerating the complete DEL chemical space to help the DEL team select the libraries providing the best compounds. We started to laugh because, on purpose, Miguel used the famous sentence “I have a dream …”. Both Christos (over the phone) and I explained that a tool like that was nearly impossible because there were more molecules in the DEL space that we could possibly enumerate. However, Miguel is a persevering person and started to list all the benefits this imaginary tool would have. The meeting ended but we did not leave the room immediately. Rather, we concentrated on discussing Miguel’s drawing for a while and by the end of the meeting I told Miguel that there might be a way. A week later we had a plan.
The plan was based on changing our focus from enumerating the DEL compound chemical space into enumerating the DEL design space. Each design could comprise millions, even billions, of compounds so there are fewer designs than compounds.
DEL designs comprise of reactions and building blocks. We already had experience in managing those concepts at large scale in the Proximal Lilly Collection (PLC)1 and had ways to code them appropriately. Christos had developed many of these tools. He had also developed internal techniques to evaluate library quality and diversity. However, PLC enumerations were based on single step reactions, whereas a DEL compound is a different beast. It is built via multi-step synthesis with multiple reactions using many different building blocks. Christos had already prototyped PLC tool extensions to address requests for specific DEL compound structure enumeration so we knew that, at least conceptually, the compound enumeration problem could be solved.
The first step in the solution of the problem was to replace the building block (BB) by the building block type (BBT) concept. A BBT is a generic BB with homogeneous functionality, so we can assume that all BBs belonging to the same BBT will present the same reactivity. Then we designed a way to couple BBTs with reactions generating designs as a kind of graph where reactions are edges and BBTs are nodes.
Christos developed a modified version of the PLC technology to classify BBs by their functional groups. With that, we determined the BBTs we had available and I built the first version of eDESIGNER, which gave millions of designs. Then Christos developed an enumeration engine for them and started to profile compound enumeration samples. A major breakthrough came when we incorporated the libDESIGN concept that groups primary designs (eDESIGNs) using synthetic chemistry considerations. This solved the problem of design redundancy and resulted in a substantial decrease of designs to work with. With this version we started to produce in the lab libraries coming from our libDESIGNS.
The final improvement came from a new suggestion by Miguel. He wanted to have a DEL collection with physiochemical properties similar to our diversity screening collection but covering a different and larger chemical space. The solution was to introduce an algorithmic constraint that allowed filtering libDESINGS to only those meeting a heavy atom count distribution while maintaining sufficient size. This process automatically selected the BBs required to build each library, all without actual compound enumeration.
Christos then developed a new way of selecting libraries based on spread design methodology.2 This method is used now to define which libraries go to production and influences building block purchasing campaigns and on-DNA reaction optimization / development campaigns, that were also part of the initial “dream”.
Two years later that initial dream and lively brainstorming session led to the introduction of eDESIGNER:3 A tool that capitalizes on cheminformatics and high-throughput calculation advancements to map and assess DEL designs that can be experimentally produced. We strongly believe that in the near future, eDESIGNER and similar algorithms combining expert synthetic knowledge with computational intelligence will enable DEL practitioners design and select appropriate libraries to access desirable chemical space areas and advance discovery efforts.
- Nicolaou, C.A., Watson, I.A., Hu, H. & Wang, J. The Proximal Lilly Collection: Mapping, Exploring and Exploiting Feasible Chemical Space. J Chem Inf Model 56, 1253-1266 (2016).
- Higgs, R.E., Bemis, K.G., Watson, I.A. & Wikel, J.H. Experimental Designs for Selecting Molecules from Large Chemical Databases. Journal of Chemical Information and Computer Sciences 37, 861-870 (1997).
- Martín, A., Nicolaou, C.A. & Toledo, M.A. Navigating the DNA encoded libraries chemical space. Commun Chem 3, 127 (2020). https://doi.org/10.1038/s42004-020-00374-1