Our goal is to discover new catalysts that can help us convert renewable energy into fuels or other valuable chemicals. In the field of computational catalysis, we strive for this goal by performing atomic simulations that give us insight into potential catalysts. This type of research often involves performing thousands of simulations across multiple computing clusters, where each calculation can take hours, days, or even weeks to complete. This results in a substantial amount of overhead time needed to manage these simulations, which is compounded on top of the time needed to analyze the data and design new simulations. The classical solution to this problem is to throw more graduate students at it. We, along with a few others, believe in a different solution.
“Choose a lazy person to do a difficult job because a lazy person will find an easy way to do it.”
I am lazy. That is why my advisor and I created the framework illustrated in this paper. In short: We built upon existing software created by Lawrence Berkeley National Labs (LBNL) and Spotify to create a new framework that can manage our simulations for us. We especially thank Anubhav Jain et al. at LBNL for their work—i.e., The Materials Project, pymatgen, and FireWorks—which provided the foundation for our ability to automate our work.
After we automated the boring stuff, we thought to ourselves: “Why not automate another one of our jobs, like designing new studies?” This is what spurred us to integrate active machine learning and optimization into our framework. The result is an even stronger machine that can not only manage our calculations, but also design new studies and then automatically perform them. Of course, the studies that our machine designs are much more naive than the studies that veteran researchers can design, such as those in Jens Nørskov’s group. We are still very far from being able to automate everything that an experienced computationalist does.
The real strength of our framework is not measured by how well a human performs against the framework, but by how well a human performs with the framework. I no longer need to spend time managing calculations, which frees time to focus on other research tasks. I also no longer worry about making sure that I am using my computing allocations, because the framework automatically fills in all of my unused allocations with systematically chosen simulations. And if I don’t like the suggestions of the framework, I am free to guide the framework to investigate specific search spaces that I see fit. Once the framework is built, there are not many downsides to using it. The downside comes with building the framework in the first place.
"It's better to wait for a productive programmer to become available than it is to wait for the first available programmer to become productive."
Unfortunately, the field of computational catalysis is not as rife with productive programmers as silicon valley is. We are left with engineers and chemists who are not trained in software engineering; this includes us. We do our best to create efficient, well-documented, and well-tested code, but it takes time and patience to teach ourselves these new skills while simultaneously practicing them. Thankfully, people in the computational catalysis community—such as Heather Mayes, Andy Peterson, or Aj Medford—are realizing this new struggle and are beginning to integrate the appropriate training at the academic level. It is our hope that, one day, we can begin working with fresh graduate students that are already adept programmers.