perovskite
Enhancing Dimensionality Prediction in Hybrid Metal Halides via Feature Engineering and Class-Imbalance Mitigation
Karabin, Mariia, Armstrong, Isaac, Beck, Leo, Apanel, Paulina, Eisenbach, Markus, Mitzi, David B., Terletska, Hanna, Heinz, Hendrik
We present a machine learning framework for predicting the structural dimensionality of hybrid metal halides (HMHs), including organic-inorganic perovskites, using a combination of chemically-informed feature engineering and advanced class-imbalance handling techniques. The dataset, consisting of 494 HMH structures, is highly imbalanced across dimensionality classes (0D, 1D, 2D, 3D), posing significant challenges to predictive modeling. This dataset was later augmented to 1336 via the Synthetic Minority Oversampling Technique (SMOTE) to mitigate the effects of the class imbalance. We developed interaction-based descriptors and integrated them into a multi-stage workflow that combines feature selection, model stacking, and performance optimization to improve dimensionality prediction accuracy. Our approach significantly improves F1-scores for underrepresented classes, achieving robust cross-validation performance across all dimensionalities.
- North America > United States > Colorado > Boulder County > Boulder (0.14)
- North America > United States > Tennessee > Rutherford County > Murfreesboro (0.04)
- North America > United States > Tennessee > Anderson County > Oak Ridge (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Government > Regional Government > North America Government > United States Government (0.68)
- Energy > Renewable > Solar (0.47)
Training-Free Active Learning Framework in Materials Science with Large Language Models
Wang, Hongchen, Castañeda, Rafael Espinosa, Werber, Jay R., Fehlis, Yao, Kim, Edward, Hattrick-Simpers, Jason
Active learning (AL) accelerates scientific discovery by prioritizing the most informative experiments, but traditional machine learning (ML) models used in AL suffer from cold-start limitations and domain-specific feature engineering, restricting their generalizability. Large language models (LLMs) offer a new paradigm by leveraging their pretrained knowledge and universal token-based representations to propose experiments directly from text-based descriptions. Here, we introduce an LLM-based active learning framework (LLM-AL) that operates in an iterative few-shot setting and benchmark it against conventional ML models across four diverse materials science datasets. We explored two prompting strategies: one using concise numerical inputs suited for datasets with more compositional and structured features, and another using expanded descriptive text suited for datasets with more experimental and procedural features to provide additional context. Across all datasets, LLM-AL could reduce the number of experiments needed to reach top-performing candidates by over 70% and consistently outperformed traditional ML models. We found that LLM-AL performs broader and more exploratory searches while still reaching the optima with fewer iterations. We further examined the stability boundaries of LLM-AL given the inherent non-determinism of LLMs and found its performance to be broadly consistent across runs, within the variability range typically observed for traditional ML approaches. These results demonstrate that LLM-AL can serve as a generalizable alternative to conventional AL pipelines for more efficient and interpretable experiment selection and potential LLM-driven autonomous discovery.
- North America > Canada > Ontario > Toronto (0.15)
- Europe > Austria > Vienna (0.14)
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > China > Hong Kong (0.04)
Enhanced Conditional Generation of Double Perovskite by Knowledge-Guided Language Model Feedback
Lee, Inhyo, Lee, Junhyeong, Park, Jongwon, Lim, KyungTae, Ryu, Seunghwa
Double perovskites (DPs) are promising candidates for sustainable energy technologies due to their compositional tunability and compatibility with low-energy fabrication, yet their vast design space poses a major challenge for conditional materials discovery. This work introduces a multi-agent, text gradient-driven framework that performs DP composition generation under natural-language conditions by integrating three complementary feedback sources: LLM-based self-evaluation, DP-specific domain knowledge-informed feedback, and ML surrogate-based feedback. Analogous to how knowledge-informed machine learning improves the reliability of conventional data-driven models, our framework incorporates domain-informed text gradients to guide the generative process toward physically meaningful regions of the DP composition space. Systematic comparison of three incremental configurations, (i) pure LLM generation, (ii) LLM generation with LLM reasoning-based feedback, and (iii) LLM generation with domain knowledge-guided feedback, shows that iterative guidance from knowledge-informed gradients improves stability-condition satisfaction without additional training data, achieving over 98% compositional validity and up to 54% stable or metastable candidates, surpassing both the LLM-only baseline (43%) and prior GAN-based results (27%). Analyses of ML-based gradients further reveal that they enhance performance in in-distribution (ID) regions but become unreliable in out-of-distribution (OOD) regimes. Overall, this work provides the first systematic analysis of multi-agent, knowledge-guided text gradients for DP discovery and establishes a generalizable blueprint for MAS-driven generative materials design aimed at advancing sustainable technologies.
Combining feature-based approaches with graph neural networks and symbolic regression for synergistic performance and interpretability
Gouvêa, Rogério Almeida, De Breuck, Pierre-Paul, Pretto, Tatiane, Rignanese, Gian-Marco, Santos, Marcos José Leite
To avoid the featuri zation bottleneck of traditional descriptors, we also leverage GNNs to generate fast, latent-space approximations of MatMiner (ℓ-MM) and Orbital Field Matrix (ℓ-OFM) features. Finally, we augment this feature set with new descriptors derived via symbolic regression. This multifac eted strategy aims to create a more robust, accurate, and versatile featurizer that capitalizes on the distinct strengths of each approach to be useful for a wider range of dataset sizes. To simplify the generation of all those features, a package was developed named MatterVial standing for MATerials fea T uR e E xtraction Via I nterpretable Artificial L earning, which, besides producing all latent-space features from the GNN models, aids i n obtaining the interpretable chemical descriptors that correlate to these high-level features. This is achieved through techniques such as SHapley Additive exPlanations (SHAP) analysi s in surrogate models and symbolic regression via Sure Independence Screening and Sparsifying Operator (SISSO) to obtain an approximate formula from the most important features. Our re sults demonstrate an overall improvement in all analyzed datasets compare d with the baseline MatMiner featurizer. In addition, it surpassed the performance of the individua l GNN models in several cases, indicating that the combination of traditional and l atent-space features leads to a more robust generalization.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- (3 more...)
MatMMFuse: Multi-Modal Fusion model for Material Property Prediction
Bhattacharya, Abhiroop, Cloutier, Sylvain G.
The recent progress of using graph based encoding of crystal structures for high throughput material property prediction has been quite successful. However, using a single modality model prevents us from exploiting the advantages of an enhanced features space by combining different representations. Specifically, pre-trained Large language models(LLMs) can encode a large amount of knowledge which is beneficial for training of models. Moreover, the graph encoder is able to learn the local features while the text encoder is able to learn global information such as space group and crystal symmetry. In this work, we propose Material Multi-Modal Fusion(MatMMFuse), a fusion based model which uses a multi-head attention mechanism for the combination of structure aware embedding from the Crystal Graph Convolution Network (CGCNN) and text embeddings from the SciBERT model. We train our model in an end-to-end framework using data from the Materials Project Dataset. We show that our proposed model shows an improvement compared to the vanilla CGCNN and SciBERT model for all four key properties: formation energy, band gap, energy above hull and fermi energy. Specifically, we observe an improvement of 40% compared to the vanilla CGCNN model and 68% compared to the SciBERT model for predicting the formation energy per atom. Importantly, we demonstrate the zero shot performance of the trained model on small curated datasets of Perovskites, Chalcogenides and the Jarvis Dataset. The results show that the proposed model exhibits better zero shot performance than the individual plain vanilla CGCNN and SciBERT model. This enables researchers to deploy the model for specialized industrial applications where collection of training data is prohibitively expensive.
- North America > Canada (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Machine Learning Reveals Composition Dependent Thermal Stability in Halide Perovskites
Hering, Abigail R., Dubey, Mansha, Hosseini, Elahe, Srivastava, Meghna, An, Yu, Correa-Baena, Juan-Pablo, Homayoun, Houman, Leite, Marina S.
The whiskers extend to 4x the IQR ( Supplementary Figure 1), which is a conservative threshold that ensures only the most extreme variations in PL are classified as outliers (denoted by diamond symbols). Outliers in PL property distributions may indicate experimental errors, sample inconsistencies, or data proces sing anomalies, thus, they are removed from the ML analysis. Data Visualization: PCA orthogonally transforms the original variables into a new set of linearly uncorrelated variables termed principal components (PCs). The first PC captures the maximum variance present in the data, and each subsequent component has the highest variance p ossible under the constraint of being orthogonal to the preceding ones. The methodology involves standardizing the dataset, calculating the covariance matrix, and then extracting the eigenvalues and eigenvectors of this matrix, which, in tur n, dictate the magnitude and direction of the new space, respectively. By projecting the original data along these new axes, PCA provides a means to reduce the dimensionality of the dataset. Supplementary Figure 1A illustrates the distribution of the samples in the space defined by the PCs, with each point representing a single sample's location within this novel coordinate system. Here, the colors indicate the value of each PL property, offering a visual insight into how these factors correlate with the PCs.
- Energy > Renewable > Solar (0.95)
- Information Technology (0.68)
Energy-GNoME: A Living Database of Selected Materials for Energy Applications
De Angelis, Paolo, Trezza, Giovanni, Barletta, Giulio, Asinari, Pietro, Chiavazzo, Eliodoro
Artificial Intelligence (AI) in materials science is driving significant advancements in the discovery of advanced materials for energy applications. The recent GNoME protocol identifies over 380,000 novel stable crystals. From this, we identify over 33,000 materials with potential as energy materials forming the Energy-GNoME database. Leveraging Machine Learning (ML) and Deep Learning (DL) tools, our protocol mitigates cross-domain data bias using feature spaces to identify potential candidates for thermoelectric materials, novel battery cathodes, and novel perovskites. Classifiers with both structural and compositional features identify domains of applicability, where we expect enhanced accuracy of the regressors. Such regressors are trained to predict key materials properties like, thermoelectric figure of merit (zT), band gap (Eg), and cathode voltage ($\Delta V_c$). This method significantly narrows the pool of potential candidates, serving as an efficient guide for experimental and computational chemistry investigations and accelerating the discovery of materials suited for electricity generation, energy storage and conversion.
- Materials > Chemicals (1.00)
- Energy > Energy Storage (1.00)
- Electrical Industrial Apparatus (1.00)
- (3 more...)
Physics-based material parameters extraction from perovskite experiments via Bayesian optimization
Zhan, Hualin, Ahmad, Viqar, Mayon, Azul, Tabi, Grace, Bui, Anh Dinh, Li, Zhuofeng, Walter, Daniel, Nguyen, Hieu, Weber, Klaus, White, Thomas, Catchpole, Kylie
The ability to extract material parameters of perovskite from quantitative experimental analysis is essential for rational design of photovoltaic and optoelectronic applications. However, the difficulty of this analysis increases significantly with the complexity of the theoretical model and the number of material parameters for perovskite. Here we use Bayesian optimization to develop an analysis platform that can extract up to 8 fundamental material parameters of an organometallic perovskite semiconductor from a transient photoluminescence experiment, based on a complex full physics model that includes drift-diffusion of carriers and dynamic defect occupation. An example study of thermal degradation reveals that the carrier mobility and trap-assisted recombination coefficient are reduced noticeably, while the defect energy level remains nearly unchanged. The reduced carrier mobility can dominate the overall effect on thermal degradation of perovskite solar cells by reducing the fill factor, despite the opposite effect of the reduced trap-assisted recombination coefficient on increasing the fill factor. In future, this platform can be conveniently applied to other experiments or to combinations of experiments, accelerating materials discovery and optimization of semiconductor materials for photovoltaics and other applications.
- North America > United States (0.14)
- Oceania > Australia (0.04)
Machine learning accelerates discovery of solar-cell perovskites
Through the generation of a dataset of accurate band gaps for perovskite materials and the use of machine learning methods, several promising halide perovskites are identified for photovoltaic applications. As we integrate solar energy into our daily lives, it has become important to find materials that efficiently convert sunlight into electricity. While silicon has dominated solar technology so far, there is also a steady turn towards materials known as perovskites due to their lower costs and simpler manufacturing processes. The challenge, however, has been to find perovskites with the right "band gap": a specific energy range that determines how efficiently a material can absorb sunlight and convert it into electricity without losing it as heat. Now, an EPFL research project led by Haiyuan Wang and Alfredo Pasquarello, with collaborators in Shanghai and in Louvain-La-Neuve, have developed a method that combines advanced computational techniques with machine-learning to search for optimal perovskite materials for photovoltaic applications.
- Europe > Belgium > Wallonia > Walloon Brabant > Louvain-la-Neuve (0.26)
- Asia > China > Shanghai > Shanghai (0.26)
Hybrid Quantum Graph Neural Network for Molecular Property Prediction
Vitz, Michael, Mohammadbagherpoor, Hamed, Sandeep, Samarth, Vlasic, Andrew, Padbury, Richard, Pham, Anh
To accelerate the process of materials design, materials science has increasingly used data driven techniques to extract information from collected data. Specially, machine learning (ML) algorithms, which span the ML discipline, have demonstrated ability to predict various properties of materials with the level of accuracy similar to explicit calculation of quantum mechanical theories, but with significantly reduced run time and computational resources. Within ML, graph neural networks have emerged as an important algorithm within the field of machine learning, since they are capable of predicting accurately a wide range of important physical, chemical and electronic properties due to their higher learning ability based on the graph representation of material and molecular descriptors through the aggregation of information embedded within the graph. In parallel with the development of state of the art classical machine learning applications, the fusion of quantum computing and machine learning have created a new paradigm where classical machine learning model can be augmented with quantum layers which are able to encode high dimensional data more efficiently. Leveraging the structure of existing algorithms, we developed a unique and novel gradient free hybrid quantum classical convoluted graph neural network (HyQCGNN) to predict formation energies of perovskite materials. The performance of our hybrid statistical model is competitive with the results obtained purely from a classical convoluted graph neural network, and other classical machine learning algorithms, such as XGBoost. Consequently, our study suggests a new pathway to explore how quantum feature encoding and parametric quantum circuits can yield drastic improvements of complex ML algorithm like graph neural network.
- North America > United States (0.14)
- Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
- Energy (0.93)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.46)