ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System

Han, Dong, Ai, Zhehong, Cai, Pengxiang, Lu, Shanya, Chen, Jianpeng, Ye, Zihao, Sun, Shuzhou, Gao, Ben, Ge, Lingli, Wang, Weida, Zhou, Xiangxin, Liu, Xihui, Su, Mao, Ouyang, Wanli, Bai, Lei, Zhou, Dongzhan, Xu, Tao, Li, Yuqiang, Zhang, Shufei

arXiv.org Artificial Intelligence 

Bayesian optimization (BO) is a powerful tool for scientific discovery in chemistry, yet its efficiency is often hampered by the sparse experimental data and vast search space. Here, we introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates BO through synergistic data- and knowledge-driven strategies. Firstly, the data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples for pseudo-data generation, robustly initializing the optimization process. Secondly, the knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space while mitigating LLM hallucinations. An Upper Confidence Bound algorithm then identifies high-potential subspaces within this established partition. Across the LLM-refined subspaces and supported by LLM-generated data, BO achieves the improvement of effectiveness and efficiency. Comprehensive evaluations across multiple scientific benchmarks demonstrate that ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.