chemical composition
Ancient sharks once swam in this landlocked state
'Sharkansas' contains entire fossilized skeletons dating back 320 million years. Breakthroughs, discoveries, and DIY tips sent six days a week. Arkansas is hundreds of miles from the Gulf of Mexico, but it's home to countless sharks . A trove of the fossilized predator's remains are embedded within the Fayetteville Shale --a roughly 350-million-year-old geological formation in the state's northwestern corner. Because a shark's cartilage skeleton decomposes so quickly, they usually only leave teeth behind when they die.
Explainable AI for Curie Temperature Prediction in Magnetic Materials
Ajaib, M. Adeel, Nasir, Fariha, Rehman, Abdul
Traditional approaches based on quantum mechanical computations or empirical models are often limited in scalability and accuracy. In recent years, machine learning (ML) has emerged as a promising alternative for property prediction across materials science domains [1-9]. Building on this momentum, several recent studies have proposed the use of ML models trained on curated magnetic datasets. In particular, the recent study [10] introduced the NE-MAD database, which aggregates experimentally measured magnetic transition temperatures and compositions. Similarly, the study by [11] utilized two of the largest available datasets of experimental Curie temperatures--comprising over 2,500 materials for training and more than 3,000 entries for validation--to compare machine learning strategies for predicting Curie temperature solely from chemical composition. Our work is inspired by these prior efforts and aims to improve the predictive accuracy and gain insights into model in-terpretability. We develop a pipeline that starts from the NE-MAD dataset, augments it with compositional and elemental features, and evaluates several ML models. A key contribution of our work is the integration of explainable AI (XAI) through SHAP (SHapley Additive exPlanations) analysis, which allows us to quantify how each input feature contributes to the model's prediction. Moreover, we benchmark our models on external datasets from literature to demonstrate generalization.
Learning simple heuristic rules for classifying materials based on chemical composition
In the past decade, there has been a significant interest in the use of machine learning approaches in materials science research. Conventional deep learning approaches that rely on complex, nonlinear models have become increasingly important in computational materials science due to their high predictive accuracy. In contrast to these approaches, we have shown in a recent work that a remarkably simple learned heuristic rule -- based on the concept of topogivity -- can classify whether a material is topological using only its chemical composition. In this paper, we go beyond the topology classification scenario by also studying the use of machine learning to develop simple heuristic rules for classifying whether a material is a metal based on chemical composition. Moreover, we present a framework for incorporating chemistry-informed inductive bias based on the structure of the periodic table. For both the topology classification and the metallicity classification tasks, we empirically characterize the performance of simple heuristic rules fit with and without chemistry-informed inductive bias across a wide range of training set sizes. We find evidence that incorporating chemistry-informed inductive bias can reduce the amount of training data required to reach a given level of test accuracy.
Motional representation; the ability to predict odor characters using molecular vibrations
Harada, Yuki, Maeda, Shuichi, Shen, Junwei, Misonou, Taku, Hori, Hirokazu, Nakamura, Shinichiro
The prediction of odor characters is still impossible based on the odorant molecular structure. We designed a CNN-based regressor for computed parameters in molecular vibrations (CNN\_vib), in order to investigate the ability to predict odor characters of molecular vibrations. In this study, we explored following three approaches for the predictability; (i) CNN with molecular vibrational parameters, (ii) logistic regression based on vibrational spectra, and (iii) logistic regression with molecular fingerprint(FP). Our investigation demonstrates that both (i) and (ii) provide predictablity, and also that the vibrations as an explanatory variable (i and ii) and logistic regression with fingerprints (iii) show nearly identical tendencies. The predictabilities of (i) and (ii), depending on odor descriptors, are comparable to those of (iii). Our research shows that odor is predictable by odorant molecular vibration as well as their shapes alone. Our findings provide insight into the representation of molecular motional features beyond molecular structures.
Wine Characterisation with Spectral Information and Predictive Artificial Intelligence
Yao, Jianping, Tran, Son N., Nguyen, Hieu, Sawyer, Samantha, Longo, Rocco
The purpose of this paper is to use absorbance data obtained by human tasting and an ultraviolet-visible (UV-Vis) scanning spectrophotometer to predict the attributes of grape juice (GJ) and to classify the wine's origin, respectively. The approach combined machine learning (ML) techniques with spectroscopy to find a relatively simple way to apply them in two stages of winemaking and help improve the traditional wine analysis methods regarding sensory data and wine's origins. This new technique has overcome the disadvantages of the complex sensors by taking advantage of spectral fingerprinting technology and forming a comprehensive study of the employment of AI in the wine analysis domain. In the results, Support Vector Machine (SVM) was the most efficient and robust in both attributes and origin prediction tasks. Both the accuracy and F1 score of the origin prediction exceed 91%. The feature ranking approach found that the more influential wavelengths usually appear at the lower end of the scan range, 250 nm (nanometers) to 420 nm, which is believed to be of great help for selecting appropriate validation methods and sensors to extract wine data in future research. The knowledge of this research provides new ideas and early solutions for the wine industry or other beverage industries to integrate big data and IoT in the future, which significantly promotes the development of 'Smart Wineries'.
Predicting band gap from chemical composition: A simple learned model for a material property with atypical statistics
Ma, Andrew, Dugan, Owen, Soljaฤiฤ, Marin
In solid-state materials science, substantial efforts have been devoted to the calculation and modeling of the electronic band gap. While a wide range of ab initio methods and machine learning algorithms have been created that can predict this quantity, the development of new computational approaches for studying the band gap remains an active area of research. Here we introduce a simple machine learning model for predicting the band gap using only the chemical composition of the crystalline material. To motivate the form of the model, we first analyze the empirical distribution of the band gap, which sheds new light on its atypical statistics. Specifically, our analysis enables us to frame band gap prediction as a task of modeling a mixed random variable, and we design our model accordingly. Our model formulation incorporates thematic ideas from chemical heuristic models for other material properties in a manner that is suited towards the band gap modeling task. The model has exactly one parameter corresponding to each element, which is fit using data. To predict the band gap for a given material, the model computes a weighted average of the parameters associated with its constituent elements and then takes the maximum of this quantity and zero. The model provides heuristic chemical interpretability by intuitively capturing the associations between the band gap and individual chemical elements.
A simple DNN regression for the chemical composition in essential oil
Harada, Yuki, Maeda, Shuichi, Kiyama, Masato, Nakamura, Shinichiro
Although experimental design and methodological surveys for mono-molecular activity/property has been extensively investigated, those for chemical composition have received little attention, with the exception of a few prior studies. In this study, we configured three simple DNN regressors to predict essential oil property based on chemical composition. Despite showing overfitting due to the small size of dataset, all models were trained effectively in this study.
Unleashing the power of novel conditional generative approaches for new materials discovery
Novitskiy, Lev, Lazarev, Vladimir, Tiutiulnikov, Mikhail, Vakhrameev, Nikita, Eremin, Roman, Humonen, Innokentiy, Kuznetsov, Andrey, Dimitrov, Denis, Budennyy, Semen
For a very long time, computational approaches to the design of new materials have relied on an iterative process of finding a candidate material and modeling its properties. AI has played a crucial role in this regard, helping to accelerate the discovery and optimization of crystal properties and structures through advanced computational methodologies and data-driven approaches. To address the problem of new materials design and fasten the process of new materials search, we have applied latest generative approaches to the problem of crystal structure design, trying to solve the inverse problem: by given properties generate a structure that satisfies them without utilizing supercomputer powers. In our work we propose two approaches: 1) conditional structure modification: optimization of the stability of an arbitrary atomic configuration, using the energy difference between the most energetically favorable structure and all its less stable polymorphs and 2) conditional structure generation. We used a representation for materials that includes the following information: lattice, atom coordinates, atom types, chemical features, space group and formation energy of the structure. The loss function was optimized to take into account the periodic boundary conditions of crystal structures. We have applied Diffusion models approach, Flow matching, usual Autoencoder (AE) and compared the results of the models and approaches. As a metric for the study, physical PyMatGen matcher was employed: we compare target structure with generated one using default tolerances. So far, our modifier and generator produce structures with needed properties with accuracy 41% and 82% respectively. To prove the offered methodology efficiency, inference have been carried out, resulting in several potentially new structures with formation energy below the AFLOW-derived convex hulls.
Northeast Materials Database (NEMAD): Enabling Discovery of High Transition Temperature Magnetic Compounds
Itani, Suman, Zhang, Yibo, Zang, Jiadong
The discovery of novel magnetic materials with greater operating temperature ranges and optimized performance is essential for advanced applications. Current data-driven approaches are challenging and limited due to the lack of accurate, comprehensive, and feature-rich databases. This study aims to address this challenge by introducing a new approach that uses Large Language Models (LLMs) to create a comprehensive, experiment-based, magnetic materials database named the Northeast Materials Database (NEMAD), which consists of 26,706 magnetic materials (www.nemad.org). The database incorporates chemical composition, magnetic phase transition temperatures, structural details, and magnetic properties. Enabled by NEMAD, machine learning models were developed to classify materials and predict transition temperatures. Our classification model achieved an accuracy of 90% in categorizing materials as ferromagnetic (FM), antiferromagnetic (AFM), and non-magnetic (NM). The regression models predict Curie (N\'eel) temperature with a coefficient of determination (R2) of 0.86 (0.85) and a mean absolute error (MAE) of 62K (32K). These models identified 62 (19) FM (AFM) candidates with a predicted Curie (N\'eel) temperature above 500K (100K) from the Materials Project. This work shows the feasibility of combining LLMs for automated data extraction and machine learning models in accelerating the discovery of magnetic materials.
Machine learning enhances chemical analysis at the nanoscale
Introducing a non-negative matrix factorization based pan-sharpening (PSNMF) method to determine chemical compositions from noisy x-ray spectroscopy data. "Nanomaterials" is a broad term used to describe chemical substances or materials in which a single unit is sized between 1 and 100 nanometers (a nanometer is a billionth of a meter). They include exotic materials such as carbon nanotubes, silver nanoparticles (used as antimicrobials), nanoporous materials, and many types of catalysts used for efficiently driving chemical reactions. Nanomaterials are currently used in a wide range of fields, from medicine to electronics, which means that the ability to determine their exact chemical composition is essential. Nonetheless, this proves challenging, because traditional methods for analyzing nanomaterials tend to be susceptible to low signal-to-noise ratios.