Goto

Collaborating Authors

 chemical composition


Ancient sharks once swam in this landlocked state

Popular Science

'Sharkansas' contains entire fossilized skeletons dating back 320 million years. Breakthroughs, discoveries, and DIY tips sent six days a week. Arkansas is hundreds of miles from the Gulf of Mexico, but it's home to countless sharks . A trove of the fossilized predator's remains are embedded within the Fayetteville Shale --a roughly 350-million-year-old geological formation in the state's northwestern corner. Because a shark's cartilage skeleton decomposes so quickly, they usually only leave teeth behind when they die.


Explainable AI for Curie Temperature Prediction in Magnetic Materials

Ajaib, M. Adeel, Nasir, Fariha, Rehman, Abdul

arXiv.org Artificial Intelligence

Traditional approaches based on quantum mechanical computations or empirical models are often limited in scalability and accuracy. In recent years, machine learning (ML) has emerged as a promising alternative for property prediction across materials science domains [1-9]. Building on this momentum, several recent studies have proposed the use of ML models trained on curated magnetic datasets. In particular, the recent study [10] introduced the NE-MAD database, which aggregates experimentally measured magnetic transition temperatures and compositions. Similarly, the study by [11] utilized two of the largest available datasets of experimental Curie temperatures--comprising over 2,500 materials for training and more than 3,000 entries for validation--to compare machine learning strategies for predicting Curie temperature solely from chemical composition. Our work is inspired by these prior efforts and aims to improve the predictive accuracy and gain insights into model in-terpretability. We develop a pipeline that starts from the NE-MAD dataset, augments it with compositional and elemental features, and evaluates several ML models. A key contribution of our work is the integration of explainable AI (XAI) through SHAP (SHapley Additive exPlanations) analysis, which allows us to quantify how each input feature contributes to the model's prediction. Moreover, we benchmark our models on external datasets from literature to demonstrate generalization.


Motional representation; the ability to predict odor characters using molecular vibrations

Harada, Yuki, Maeda, Shuichi, Shen, Junwei, Misonou, Taku, Hori, Hirokazu, Nakamura, Shinichiro

arXiv.org Artificial Intelligence

The prediction of odor characters is still impossible based on the odorant molecular structure. We designed a CNN-based regressor for computed parameters in molecular vibrations (CNN\_vib), in order to investigate the ability to predict odor characters of molecular vibrations. In this study, we explored following three approaches for the predictability; (i) CNN with molecular vibrational parameters, (ii) logistic regression based on vibrational spectra, and (iii) logistic regression with molecular fingerprint(FP). Our investigation demonstrates that both (i) and (ii) provide predictablity, and also that the vibrations as an explanatory variable (i and ii) and logistic regression with fingerprints (iii) show nearly identical tendencies. The predictabilities of (i) and (ii), depending on odor descriptors, are comparable to those of (iii). Our research shows that odor is predictable by odorant molecular vibration as well as their shapes alone. Our findings provide insight into the representation of molecular motional features beyond molecular structures.


Wine Characterisation with Spectral Information and Predictive Artificial Intelligence

Yao, Jianping, Tran, Son N., Nguyen, Hieu, Sawyer, Samantha, Longo, Rocco

arXiv.org Artificial Intelligence

The purpose of this paper is to use absorbance data obtained by human tasting and an ultraviolet-visible (UV-Vis) scanning spectrophotometer to predict the attributes of grape juice (GJ) and to classify the wine's origin, respectively. The approach combined machine learning (ML) techniques with spectroscopy to find a relatively simple way to apply them in two stages of winemaking and help improve the traditional wine analysis methods regarding sensory data and wine's origins. This new technique has overcome the disadvantages of the complex sensors by taking advantage of spectral fingerprinting technology and forming a comprehensive study of the employment of AI in the wine analysis domain. In the results, Support Vector Machine (SVM) was the most efficient and robust in both attributes and origin prediction tasks. Both the accuracy and F1 score of the origin prediction exceed 91%. The feature ranking approach found that the more influential wavelengths usually appear at the lower end of the scan range, 250 nm (nanometers) to 420 nm, which is believed to be of great help for selecting appropriate validation methods and sensors to extract wine data in future research. The knowledge of this research provides new ideas and early solutions for the wine industry or other beverage industries to integrate big data and IoT in the future, which significantly promotes the development of 'Smart Wineries'.


Predicting band gap from chemical composition: A simple learned model for a material property with atypical statistics

Ma, Andrew, Dugan, Owen, Soljačić, Marin

arXiv.org Artificial Intelligence

In solid-state materials science, substantial efforts have been devoted to the calculation and modeling of the electronic band gap. While a wide range of ab initio methods and machine learning algorithms have been created that can predict this quantity, the development of new computational approaches for studying the band gap remains an active area of research. Here we introduce a simple machine learning model for predicting the band gap using only the chemical composition of the crystalline material. To motivate the form of the model, we first analyze the empirical distribution of the band gap, which sheds new light on its atypical statistics. Specifically, our analysis enables us to frame band gap prediction as a task of modeling a mixed random variable, and we design our model accordingly. Our model formulation incorporates thematic ideas from chemical heuristic models for other material properties in a manner that is suited towards the band gap modeling task. The model has exactly one parameter corresponding to each element, which is fit using data. To predict the band gap for a given material, the model computes a weighted average of the parameters associated with its constituent elements and then takes the maximum of this quantity and zero. The model provides heuristic chemical interpretability by intuitively capturing the associations between the band gap and individual chemical elements.


A simple DNN regression for the chemical composition in essential oil

Harada, Yuki, Maeda, Shuichi, Kiyama, Masato, Nakamura, Shinichiro

arXiv.org Artificial Intelligence

Although experimental design and methodological surveys for mono-molecular activity/property has been extensively investigated, those for chemical composition have received little attention, with the exception of a few prior studies. In this study, we configured three simple DNN regressors to predict essential oil property based on chemical composition. Despite showing overfitting due to the small size of dataset, all models were trained effectively in this study.


Unleashing the power of novel conditional generative approaches for new materials discovery

Novitskiy, Lev, Lazarev, Vladimir, Tiutiulnikov, Mikhail, Vakhrameev, Nikita, Eremin, Roman, Humonen, Innokentiy, Kuznetsov, Andrey, Dimitrov, Denis, Budennyy, Semen

arXiv.org Artificial Intelligence

For a very long time, computational approaches to the design of new materials have relied on an iterative process of finding a candidate material and modeling its properties. AI has played a crucial role in this regard, helping to accelerate the discovery and optimization of crystal properties and structures through advanced computational methodologies and data-driven approaches. To address the problem of new materials design and fasten the process of new materials search, we have applied latest generative approaches to the problem of crystal structure design, trying to solve the inverse problem: by given properties generate a structure that satisfies them without utilizing supercomputer powers. In our work we propose two approaches: 1) conditional structure modification: optimization of the stability of an arbitrary atomic configuration, using the energy difference between the most energetically favorable structure and all its less stable polymorphs and 2) conditional structure generation. We used a representation for materials that includes the following information: lattice, atom coordinates, atom types, chemical features, space group and formation energy of the structure. The loss function was optimized to take into account the periodic boundary conditions of crystal structures. We have applied Diffusion models approach, Flow matching, usual Autoencoder (AE) and compared the results of the models and approaches. As a metric for the study, physical PyMatGen matcher was employed: we compare target structure with generated one using default tolerances. So far, our modifier and generator produce structures with needed properties with accuracy 41% and 82% respectively. To prove the offered methodology efficiency, inference have been carried out, resulting in several potentially new structures with formation energy below the AFLOW-derived convex hulls.


Northeast Materials Database (NEMAD): Enabling Discovery of High Transition Temperature Magnetic Compounds

Itani, Suman, Zhang, Yibo, Zang, Jiadong

arXiv.org Artificial Intelligence

The discovery of novel magnetic materials with greater operating temperature ranges and optimized performance is essential for advanced applications. Current data-driven approaches are challenging and limited due to the lack of accurate, comprehensive, and feature-rich databases. This study aims to address this challenge by introducing a new approach that uses Large Language Models (LLMs) to create a comprehensive, experiment-based, magnetic materials database named the Northeast Materials Database (NEMAD), which consists of 26,706 magnetic materials (www.nemad.org). The database incorporates chemical composition, magnetic phase transition temperatures, structural details, and magnetic properties. Enabled by NEMAD, machine learning models were developed to classify materials and predict transition temperatures. Our classification model achieved an accuracy of 90% in categorizing materials as ferromagnetic (FM), antiferromagnetic (AFM), and non-magnetic (NM). The regression models predict Curie (N\'eel) temperature with a coefficient of determination (R2) of 0.86 (0.85) and a mean absolute error (MAE) of 62K (32K). These models identified 62 (19) FM (AFM) candidates with a predicted Curie (N\'eel) temperature above 500K (100K) from the Materials Project. This work shows the feasibility of combining LLMs for automated data extraction and machine learning models in accelerating the discovery of magnetic materials.


Machine learning enhances chemical analysis at the nanoscale

AIHub

Introducing a non-negative matrix factorization based pan-sharpening (PSNMF) method to determine chemical compositions from noisy x-ray spectroscopy data. "Nanomaterials" is a broad term used to describe chemical substances or materials in which a single unit is sized between 1 and 100 nanometers (a nanometer is a billionth of a meter). They include exotic materials such as carbon nanotubes, silver nanoparticles (used as antimicrobials), nanoporous materials, and many types of catalysts used for efficiently driving chemical reactions. Nanomaterials are currently used in a wide range of fields, from medicine to electronics, which means that the ability to determine their exact chemical composition is essential. Nonetheless, this proves challenging, because traditional methods for analyzing nanomaterials tend to be susceptible to low signal-to-noise ratios.


Diffusion Models Are Promising for Ab Initio Structure Solutions from Nanocrystalline Powder Diffraction Data

Guo, Gabe, Saidi, Tristan, Terban, Maxwell, Billinge, Simon JL, Lipson, Hod

arXiv.org Artificial Intelligence

A major challenge in materials science is the determination of the structure of nanometer sized objects. Here we present a novel approach that uses a generative machine learning model based on a Diffusion model that is trained on 45,229 known structures. The model factors both the measured diffraction pattern as well as relevant statistical priors on the unit cell of atomic cluster structures. Conditioned only on the chemical formula and the information-scarce finite-size broadened powder diffraction pattern, we find that our model, PXRDnet, can successfully solve simulated nanocrystals as small as 10 angstroms across 200 materials of varying symmetry and complexity, including structures from all seven crystal systems. We show that our model can determine structural solutions with up to $81.5\%$ accuracy, as measured by structural correlation. Furthermore, PXRDnet is capable of solving structures from noisy diffraction patterns gathered in real-world experiments. We suggest that data driven approaches, bootstrapped from theoretical simulation, will ultimately provide a path towards determining the structure of previously unsolved nano-materials.