Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

Prein, Thorben, Pan, Elton, Jehkul, Janik, Weinmann, Steffen, Olivetti, Elsa A., Rupp, Jennifer L. M.

Jun-17-2025–arXiv.org Machine Learning

Inorganic synthesis planning currently relies primarily on heuristic approaches or machine-learning models trained on limited datasets, which constrains its generality. We demonstrate that language models, without task-specific fine-tuning, can recall synthesis conditions. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash and Llama 4 Maverick, achieve a Top-1 precursor-prediction accuracy of up to 53.8 % and a Top-5 performance of 66.1 % on a held-out set of 1,000 reactions. They also predict calcination and sintering temperatures with mean absolute errors below 126 °C, matching specialized regression methods. Ensembling these language models further enhances predictive accuracy and reduces inference cost per prediction by up to 70 %. We subsequently employ language models to generate 28,548 synthetic reaction recipes, which we combine with literature-mined examples to pretrain a transformer-based model, SyntMTE. After fine-tuning on the combined dataset, SyntMTE reduces mean-absolute error in sintering temperature prediction to 73 °C and in calcination temperature to 98 °C. This strategy improves models by up to 8.7 % compared with baselines trained exclusively on experimental data. Finally, in a case study on Li7La3Zr2O12 solid-state electrolytes, we demonstrate that SyntMTE reproduces the experimentally observed dopant-dependent sintering trends. Our hybrid workflow enables scalable, data-efficient inorganic synthesis planning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

Jun-17-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Togo (0.04)
- North America > United States
  - Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany
  - Bavaria > Upper Bavaria > Munich (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Energy > Energy Storage (1.00)
- Electrical Industrial Apparatus (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (1.00)
  - Machine Learning
    - Statistical Learning (1.00)
    - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found