Language Models Enable Data-Augmented Synthesis Planning for Inorganic Materials

Prein, Thorben, Pan, Elton, Jehkul, Janik, Weinmann, Steffen, Olivetti, Elsa A., Rupp, Jennifer L. M.

arXiv.org Machine Learning 

Inorganic synthesis planning currently relies primarily on heuristic approaches or machine-learning models trained on limited datasets, which constrains its generality. We demonstrate that language models, without task-specific fine-tuning, can recall synthesis conditions. Off-the-shelf models, such as GPT-4.1, Gemini 2.0 Flash and Llama 4 Maverick, achieve a Top-1 precursor-prediction accuracy of up to 53.8 % and a Top-5 performance of 66.1 % on a held-out set of 1,000 reactions. They also predict calcination and sintering temperatures with mean absolute errors below 126 °C, matching specialized regression methods. Ensembling these language models further enhances predictive accuracy and reduces inference cost per prediction by up to 70 %. We subsequently employ language models to generate 28,548 synthetic reaction recipes, which we combine with literature-mined examples to pretrain a transformer-based model, SyntMTE. After fine-tuning on the combined dataset, SyntMTE reduces mean-absolute error in sintering temperature prediction to 73 °C and in calcination temperature to 98 °C. This strategy improves models by up to 8.7 % compared with baselines trained exclusively on experimental data. Finally, in a case study on Li7La3Zr2O12 solid-state electrolytes, we demonstrate that SyntMTE reproduces the experimentally observed dopant-dependent sintering trends. Our hybrid workflow enables scalable, data-efficient inorganic synthesis planning.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found