Machine learning a non-observable concept

Oxidation states are at the core of chemistry, but not quantum mechanical observables: can we leverage a large crystallographic database to predict the oxidation states for cations in metal-organic frameworks?

Like Comment
Read the Paper

Chemists use many fuzzy concepts [1] in their reasonings, for example to explain reactivity. One of the most widely used “fuzzy concepts” is the oxidation state. The oxidation state is a concept that goes back to the early days of chemistry (Lavousier, Woehler) and assigns “charges” to atoms assuming an ionic split of the electrons. This quantity, however, is not a quantum mechanical observable. For this reason we don’t have a fundamental equation on how to compute the oxidation state, but we have to introduce rules or references to assign those numbers. 

One of the most popular rules is the bond valence sum method, which goes back to Pauling and was developed by Brown and states that oxidation states can be estimated using a sum of bond valence terms. Those bond valence terms are a simple exponential function of the bond length. 

There are many different parameterizations of this exponential function and, in practice, we found that this technique is not satisfactory for the assignment of oxidation states of metal cations in metal-organic frameworks (MOFs). Experimentally, the oxidation state can be measured using, for example, XPS. Also here there is no underlying fundamental principle, except for a tabulation of the signatures of all metals in the different oxidation states. 

In practice, chemists usually assign the oxidation states of the metal centres in their compounds with a method of their choice (e.g., using the bond valence sum method, spectroscopic evidence, or intuition). These assignments are encoded in the chemical names of the MOFs in the Cambridge Crystallographic Database. As the oxidation state is such an important concept in chemistry, one can assume that while in individual cases errors may occur, collectively chemists will assign a correct oxidation state. This is not much different from the popular TV-shows “How to become a millionaire”, in case we do not know the answer, our life line is to ask an audience of all chemists.  A more practical way than organizing a game show, is to harvest those assignments from the Cambridge Crystallographic Database and then use a machine learning approach to correlate the local environments of metal centres to the oxidation state. We found that this approach outperforms the bond valence method and, interestingly, “reasons” similar to the way chemists do. 

Our work not only provides a useful tool to assign oxidation states but also showcases how machine learning can be used to provide a “consensus definition” for some of the “fuzzy” concepts we encounter so often in chemistry. 

[1] Gonthier, J. F., Steinmann, S. N., Wodrich, M. D., & Corminboeuf, C. (2012). Quantification of “fuzzy” chemical concepts: a computational perspective. Chemical Society Reviews, 41(13), 4671. https://doi.org/10.1039/c2cs35037h

Kevin Maik Jablonka

PhD Student, EPFL

Comments

Go to the profile of Isatou Sarr
16 days ago

Hi,

Excellent technique and great paper. Just a comment:

the oxidation of state of elements is critical in reporting chemical names of compounds as well as the ionic charge therefore it is a fundamental in chemistry although decoding the oxidation state of a single element is easy, in the case of compounds made up of an array of elements, it is difficult to decipher.

Bond valence theory has been used to evaluate the oxidation state of a compound depending on the distances between the atoms of its constituent elements, but it presents with potential flaws particularly in grouped materials with crystal structures since the geometry of the metal complex also plays an important role in determining the molecular stability as well as valency of the compound. Since reproducibility is a key feature of science thus AI algorithms that can more efficiently determine oxidation states by material categorization and determination of organic frameworks is important in minimizing the error rates in assigning oxidation state to elements/compounds.

Oxidation states has been problematic in research due to latent interactions with host/environmental elements such as reactive oxygen specie/electron cloud thus it plays on shell/half-life of products rendering them less potent in ''latent'' ways  and by extension creating variability of response particularly in therapeutics. Correct valency/oxidation state assignment will avail the opportunity for the development of more robust health intervention tools with enhanced predictive response value for progressive science as well as good overall patient outcome both in the inter and intra population level.

Thank you.

Go to the profile of Isatou Sarr
16 days ago

Hi,

Excellent technique and great paper. Just a comment:

the oxidation of state of elements is critical in reporting chemical names of compounds as well as the ionic charge therefore it is a fundamental in chemistry although decoding the oxidation state of a single element is easy, in the case of compounds made up of an array of elements, it is difficult to decipher.

Bond valence theory has been used to evaluate the oxidation state of a compound depending on the distances between the atoms of its constituent elements, but it presents with potential flaws particularly in grouped materials with crystal structures since the geometry of the metal complex also plays an important role in determining the molecular stability as well as valency of the compound. Since reproducibility is a key feature of science thus AI algorithms that can more efficiently determine oxidation states by material categorization and determination of organic frameworks is important in minimizing the error rates in assigning oxidation state to elements/compounds.

Oxidation states has been problematic in research due to latent interactions with host/environmental elements such as reactive oxygen specie/electron cloud thus it plays on shell/half-life of products rendering them less potent in ''latent'' ways  and by extension creating variability of response particularly in therapeutics. Correct valency/oxidation state assignment will avail the opportunity for the development of more robust health intervention tools with enhanced predictive response value for progressive science as well as good overall patient outcome both in the inter and intra population level.

Thank you.