oxidation state
MirrorMind: Empowering OmniScientist with the Expert Perspectives and Collective Knowledge of Human Scientists
Zeng, Qingbin, Fan, Bingbing, Chen, Zhiyu, Ren, Sijian, Zhou, Zhilun, Zhang, Xuhua, Zhen, Yuanyi, Xu, Fengli, Li, Yong, Liu, Tie-Yan
The emergence of AI Scientists has demonstrated remarkable potential in automating scientific research. However, current approaches largely conceptualize scientific discovery as a solitary optimization or search process, overlooking that knowledge production is inherently a social and historical endeavor. Human scientific insight stems from two distinct yet interconnected sources. First is the individual cognitive trajectory, where a researcher's unique insight is shaped by their evolving research history and stylistic preferences; another is the collective disciplinary memory, where knowledge is sedimented into vast, interconnected networks of citations and concepts. Existing LLMs still struggle to represent these structured, high-fidelity cognitive and social contexts. To bridge this gap, we introduce MirrorMind, a hierarchical cognitive architecture that integrates dual-memory representations within a three-level framework. The Individual Level constructs high-fidelity cognitive models of individual researchers by capturing their episodic, semantic, and persona memories; the Domain Level maps collective knowledge into structured disciplinary concept graphs; and the Interdisciplinary Level that acts as an orthogonal orchestration engine. Crucially, our architecture separates memory storage from agentic execution, enabling AI scientist agents to flexibly access individual memories for unique perspectives or collective structures to reason. We evaluate MirrorMind across four comprehensive tasks, including author-level cognitive simulation, complementary reasoning, cross-disciplinary collaboration promotion, and multi-agent scientific problem solving. The results show that by integrating individual cognitive depth with collective disciplinary breadth, MirrorMind moves beyond simple fact retrieval toward structural, personalized, and insight-generating scientific reasoning.
Enhancing Materials Discovery with Valence Constrained Design in Generative Modeling
Cheng, Mouyang, Luo, Weiliang, Tang, Hao, Yu, Bowen, Cheng, Yongqiang, Xie, Weiwei, Li, Ju, Kulik, Heather J., Li, Mingda
Diffusion-based deep generative models have emerged as powerful tools for inverse materials design. Yet, many existing approaches overlook essential chemical constraints such as oxidation state balance, which can lead to chemically invalid structures. Here we introduce CrysVCD (Crystal generator with Valence-Constrained Design), a modular framework that integrates chemical rules directly into the generative process. CrysVCD first employs a transformer-based elemental language model to generate valence-balanced compositions, followed by a diffusion model to generate crystal structures. The valence constraint enables orders-of-magnitude more efficient chemical valence checking, compared to pure data-driven approaches with post-screening. When fine-tuned on stability metrics, CrysVCD achieves 85% thermodynamic stability and 68% phonon stability. Moreover, CrysVCD supports conditional generation of functional materials, enabling discovery of candidates such as high thermal conductivity semiconductors and high-$ฮบ$ dielectric compounds. Designed as a general-purpose plugin, CrysVCD can be integrated into diverse generative pipeline to promote chemical validity, offering a reliable, scientifically grounded path for materials discovery.
MP-ALOE: An r2SCAN dataset for universal machine learning interatomic potentials
Kuner, Matthew C., Kaplan, Aaron D., Persson, Kristin A., Asta, Mark, Chrzan, Daryl C.
Covering 89 elements, MP-ALOE was created using active learning and primarily consists of off-equilibrium structures. We benchmark a machine learning interatomic potential trained on MP-ALOE, and evaluate its performance on a series of benchmarks, including predicting the thermochemical properties of equilibrium structures; predicting forces of far-from-equilibrium structures; maintaining physical soundness under static extreme deformations; and molecular dynamic stability under extreme temperatures and pressures. MP-ALOE shows strong performance on all of these benchmarks, and is made public for the broader community to utilize.
Boltzmann Classifier: A Thermodynamic-Inspired Approach to Supervised Learning
Amin, Muhamed, Brooks, Bernard R.
We present the Boltzmann classifier, a novel distance based probabilistic classification algorithm inspired by the Boltzmann distribution. Unlike traditional classifiers that produce hard decisions or uncalibrated probabilities, the Boltzmann classifier assigns class probabilities based on the average distance to the nearest neighbors within each class, providing interpretable, physically meaningful outputs. We evaluate the performance of the method across three application domains: molecular activity prediction, oxidation state classification of transition metal complexes, and breast cancer diagnosis. In the molecular activity task, the classifier achieved the highest accuracy in predicting active compounds against two protein targets, with strong correlations observed between the predicted probabilities and experimental pIC50 values. For metal complexes, the classifier accurately distinguished between oxidation states II and III for Fe, Mn, and Co, using only metal-ligand bond lengths extracted from crystallographic data, and demonstrated high consistency with known chemical trends. In the breast cancer dataset, the classifier achieved 97% accuracy, with low confidence predictions concentrated in inherently ambiguous cases. Across all tasks, the Boltzmann classifier performed competitively or better than standard models such as logistic regression, support vector machines, random forests, and k-nearest neighbors. Its probabilistic outputs were found to correlate with continuous physical or biological properties, highlighting its potential utility in both classification and regression contexts. The results suggest that the Boltzmann classifier is a robust and interpretable alternative to conventional machine learning approaches, particularly in scientific domains where underlying structure property relationships are important.
From structure mining to unsupervised exploration of atomic octahedral networks
Xian, R. Patrick, Morelock, Ryan J., Hadar, Ido, Musgrave, Charles B., Sutton, Christopher
Networks of atom-centered coordination octahedra commonly occur in inorganic and hybrid solid-state materials. Characterizing their spatial arrangements and characteristics is crucial for relating structures to properties for many materials families. The traditional method using case-by-case inspection becomes prohibitive for discovering trends and similarities in large datasets. Here, we operationalize chemical intuition to automate the geometric parsing, quantification, and classification of coordination octahedral networks. We find axis-resolved tilting trends in ABO$_{3}$ perovskite polymorphs, which assist in detecting oxidation state changes. Moreover, we develop a scale-invariant encoding scheme to represent these networks, which, combined with human-assisted unsupervised machine learning, allows us to taxonomize the inorganic framework polytypes in hybrid iodoplumbates (A$_x$Pb$_y$I$_z$). Consequently, we uncover a violation of Pauling's third rule and the design principles underpinning their topological diversity. Our results offer a glimpse into the vast design space of atomic octahedral networks and inform high-throughput, targeted screening of specific structure types.
Composition based oxidation state prediction of materials using deep learning
Fu, Nihang, Hu, Jeffrey, Feng, Ying, Morrison, Gregory, Loye, Hans-Conrad zur, Hu, Jianjun
Oxidation states are the charges of atoms after their ionic approximation of their bonds, which have been widely used in charge-neutrality verification, crystal structure determination, and reaction estimation. Currently only heuristic rules exist for guessing the oxidation states of a given compound with many exceptions. Recent work has developed machine learning models based on heuristic structural features for predicting the oxidation states of metal ions. However, composition based oxidation state prediction still remains elusive so far, which is more important in new material discovery for which the structures are not even available. This work proposes a novel deep learning based BERT transformer language model BERTOS for predicting the oxidation states of all elements of inorganic compounds given only their chemical composition. Oxidation states (OS) are the charges of atoms after their ionic approximation of their bonds, which are the fundamental attributes of elements that help to explain redox reactions, reactivity, chemical bonding, and chemical properties of different elements and compounds. In electrochemistry, oxidation states are used to represent relevant compounds and ions in Latimer and Frost diagrams, and they can also be used to calculate the charge neutrality of chemical compounds to screen potential hypothetical materials generated by computational design algorithms. Oxidation states have also been used to study the complexes of transition metals.
MnEdgeNet -- Accurate Decomposition of Mixed Oxidation States for Mn XAS and EELS L2,3 Edges without Reference and Calibration
Accurate decomposition of the mixed Mn oxidation states is highly important for characterizing the electronic structures, charge transfer, and redox centers for electronic, electrocatalytic, and energy storage materials that contain Mn. Electron energy loss spectroscopy (EELS) and soft X-ray absorption spectroscopy (XAS) measurements of the Mn L2,3 edges are widely used for this purpose. To date, although the measurement of the Mn L2,3 edges is straightforward given the sample is prepared properly, an accurate decomposition of the mix valence states of Mn remains non-trivial. For both EELS and XAS, 2+, 3+, 4+ reference spectra need to be taken on the same instrument/beamline and preferably in the same experimental session because the instrumental resolution and the energy axis offset could vary from one session to another. To circumvent this hurdle, in this study, we adopted a deep learning approach and developed a calibration-free and reference-free method to decompose the oxidation state of Mn L2,3 edges for both EELS and XAS. To synthesize physics-informed and ground-truth labeled training datasets, we created a forward model that takes into account plural scattering, instrumentation broadening, noise, and energy axis offset. With that, we created a 1.2 million-spectrum database with a three-element oxidation state composition label. The library includes a sufficient variety of data including both EELS and XAS spectra. By training on this large database, our convolutional neural network achieves 85% accuracy on the validation dataset. We tested the model and found it is robust against noise (down to PSNR of 10) and plural scattering (up to t/{\lambda} = 1). We further validated the model against spectral data that were not used in training.
Machine Learning guided high-throughput search of non-oxide garnets
Schmidt, Jonathan, Wang, Haichen, Schmidt, Georg, Marques, Miguel
Garnets, known since the early stages of human civilization, have found important applications in modern technologies including magnetorestriction, spintronics, lithium batteries, etc. The overwhelming majority of experimentally known garnets are oxides, while explorations (experimental or theoretical) for the rest of the chemical space have been limited in scope. A key issue is that the garnet structure has a large primitive unit cell, requiring an enormous amount of computational resources. To perform a comprehensive search of the complete chemical space for new garnets,we combine recent progress in graph neural networks with high-throughput calculations. We apply the machine learning model to identify the potential (meta-)stable garnet systems before systematic density-functional calculations to validate the predictions. In this way, we discover more than 600 ternary garnets with distances to the convex hull below 100~meV/atom with a variety of physical and chemical properties. This includes sulfide, nitride and halide garnets. For these, we analyze the electronic structure and discuss the connection between the value of the electronic band gap and charge balance.
Machine learning cracks the oxidation states of crystal structures
Chemical elements make up pretty much everything in the physical world. As of 2016, we know of 118 elements, all of which can be found categorized in the famous periodic table that hangs in every chemistry lab and classroom. Each element in the periodic table appears as a one-, two-letter abbreviation (e.g. O for oxygen, Al for aluminum) along with its atomic number, which shows how many protons there are in the element's nucleus. The number of protons is enormously important, as it also determines how many electrons orbit the nucleus, which essentially makes the element what it is and gives it its chemical properties.