Machine learning a non-observable concept
Oxidation states are at the core of chemistry, but not quantum mechanical observables: can we leverage a large crystallographic database to predict the oxidation states for cations in metal-organic frameworks?
Chemists use many fuzzy concepts  in their reasonings, for example to explain reactivity. One of the most widely used “fuzzy concepts” is the oxidation state. The oxidation state is a concept that goes back to the early days of chemistry (Lavousier, Woehler) and assigns “charges” to atoms assuming an ionic split of the electrons. This quantity, however, is not a quantum mechanical observable. For this reason we don’t have a fundamental equation on how to compute the oxidation state, but we have to introduce rules or references to assign those numbers.
One of the most popular rules is the bond valence sum method, which goes back to Pauling and was developed by Brown and states that oxidation states can be estimated using a sum of bond valence terms. Those bond valence terms are a simple exponential function of the bond length.
There are many different parameterizations of this exponential function and, in practice, we found that this technique is not satisfactory for the assignment of oxidation states of metal cations in metal-organic frameworks (MOFs). Experimentally, the oxidation state can be measured using, for example, XPS. Also here there is no underlying fundamental principle, except for a tabulation of the signatures of all metals in the different oxidation states.
In practice, chemists usually assign the oxidation states of the metal centres in their compounds with a method of their choice (e.g., using the bond valence sum method, spectroscopic evidence, or intuition). These assignments are encoded in the chemical names of the MOFs in the Cambridge Crystallographic Database. As the oxidation state is such an important concept in chemistry, one can assume that while in individual cases errors may occur, collectively chemists will assign a correct oxidation state. This is not much different from the popular TV-shows “How to become a millionaire”, in case we do not know the answer, our life line is to ask an audience of all chemists. A more practical way than organizing a game show, is to harvest those assignments from the Cambridge Crystallographic Database and then use a machine learning approach to correlate the local environments of metal centres to the oxidation state. We found that this approach outperforms the bond valence method and, interestingly, “reasons” similar to the way chemists do.
Our work not only provides a useful tool to assign oxidation states but also showcases how machine learning can be used to provide a “consensus definition” for some of the “fuzzy” concepts we encounter so often in chemistry.
 Gonthier, J. F., Steinmann, S. N., Wodrich, M. D., & Corminboeuf, C. (2012). Quantification of “fuzzy” chemical concepts: a computational perspective. Chemical Society Reviews, 41(13), 4671. https://doi.org/10.1039/c2cs35037h