Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?
Grosnit, Antoine, Cowen-Rivers, Alexander I., Tutunov, Rasul, Griffiths, Ryan-Rhys, Wang, Jun, Bou-Ammar, Haitham
Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.
Dec-17-2020
- Country:
- North America
- United States > Massachusetts (0.14)
- Canada (0.14)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- Asia > Middle East
- Qatar (0.14)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Instructional Material > Course Syllabus & Notes (0.45)
- Industry:
- Health & Medicine (0.96)
- Energy > Oil & Gas (0.67)
- Technology:
- Information Technology
- Mathematics of Computing (0.67)
- Data Science > Data Mining (0.67)
- Artificial Intelligence
- Natural Language (0.67)
- Representation & Reasoning
- Optimization (1.00)
- Agents (1.00)
- Uncertainty > Bayesian Inference (0.45)
- Machine Learning
- Evolutionary Systems (1.00)
- Statistical Learning > Gradient Descent (0.45)
- Neural Networks > Deep Learning (0.45)
- Information Technology