Large Language Models to Enhance Bayesian Optimization

Liu, Tennison, Astorga, Nicolás, Seedat, Nabeel, van der Schaar, Mihaela

arXiv.org Artificial Intelligence 

Bayesian optimization (BO) is a powerful approach for optimizing complex and expensive-to-evaluate black-box functions. Its importance is underscored in many applications, notably including hyperparameter tuning, but its efficacy depends on efficiently balancing exploration and exploitation. While there has been substantial progress in BO methods, striking this balance still remains a delicate process. At a high level, we frame the BO problem in natural language terms, enabling LLMs to iteratively propose promising solutions conditioned on historical evaluations. More specifically, we explore how combining contextual understanding, few-shot learning proficiency, and domain knowledge of LLMs can enhance various components of model-based BO. Our findings illustrate that LLAMBO is effective at zero-shot warmstarting, and improves surrogate modeling and candidate sampling, especially in the early stages of search when observations are sparse. Our approach is performed in context and does not require LLM finetuning. Additionally, it is modular by design, allowing individual components to be integrated into existing BO frameworks, or function cohesively as an end-to-end method. Expensive black-box functions are common in many disciplines and applications including robotics (11, 35), experimental design (25), drug discovery (32), interface design (8) and, in machine learning for hyperparameter tuning (6, 34, 49). Bayesian optimization (BO) is a widely adopted and efficient model-based approach for globally optimizing these functions (31, 33). BO's effectiveness lies in its ability to operate based on a limited set of observations without the need for direct access to the objective function or its gradients. Broadly, BO uses observed data to construct a surrogate model as an approximation to the objective function, and then iteratively generates potentially good points, from which the acquisition function selects the one with the highest utility. This chosen point undergoes evaluation, and the cycle continues. For BO, the name of the game is efficient search, but the efficiency of this search largely depends on the quality of the surrogate model and its capacity to quickly identify high-potential regions (16).