Thermodynamic Prediction Enabled by Automatic Dataset Building and Machine Learning
Liu, Juejing, Anderson, Haydn, Waxman, Noah I., Kovalev, Vsevolod, Fisher, Byron, Li, Elizabeth, Guo, Xiaofeng
–arXiv.org Artificial Intelligence
New discoveries in c hemistry and materials science, with increasingly expanding volume of requisite knowledge and experimental workload, provide unique opportunities for machine learning (ML) to take critical roles in accelerat ing research efficiency . Here, we demonstrate (1) the use of large language models (LLMs) for automated literature reviews, and (2) the training of an ML model to predict chemical knowledge (thermodynamic parameters) . Our LLM - based literature review tool (LMExt) successfully extracted chemical information and beyond into a machine - readable structure, including stability constants for metal cation - ligand interactions, thermodynamic properties, and other broader data types ( medical research papers, and financial reports), effectively overcoming the challenges inherent in each domain. Using the autonomous acquisition of thermodynamic data, an ML model was trained using the CatBoost algorithm for accurately predict ing thermodynamic parameters (e.g., enthalpy of formation) of minerals. This work highlights the transformative potential of integrated ML approaches to reshape chemistry and materials science research . Keywords: Thermodynamics, Machine L earning, Large Language Model, D ata M ining, Database Introduction Chemi cal thermodynamics are fundamental for understanding chemical reactions, proposing novel methods to control these reactions, and pred icting chemical equilibria /reactions for new materials. Although scientific breakthroughs occur regularly, contributing to these advances becomes progressively more complex. T ypical research project necessitates a comprehensive literature review that should cover the current state of the field and identify knowledge gaps . Subsequently, rigorous experimentation and modeling are performed to fill such gaps or check hypothesis - driven predictions . Both these steps are essential research steps not unique in chemical research, which however, are inherently mentally - intensive and time - consuming .
arXiv.org Artificial Intelligence
Jul-11-2025
- Country:
- Asia > Japan (0.04)
- Europe > Portugal
- North America
- Canada > British Columbia
- Vancouver (0.04)
- United States > Washington
- Whitman County > Pullman (0.04)
- Canada > British Columbia
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine (0.88)
- Technology: