Experimenting, Fast and Slow: Bayesian Optimization of Long-term Outcomes with Online Experiments
Feng, Qing, Daulton, Samuel, Letham, Benjamin, Balandat, Maximilian, Bakshy, Eytan
Online experiments in internet systems, also known as A/B tests, are used for a wide range of system tuning problems, such as optimizing recommender system ranking policies and learning adaptive streaming controllers. Decision-makers generally wish to optimize for long-term treatment effects of the system changes, which often requires running experiments for a long time as short-term measurements can be misleading due to non-stationarity in treatment effects over time. The sequential experimentation strategies--which typically involve several iterations--can be prohibitively long in such cases. We describe a novel approach that combines fast experiments (e.g., biased experiments run only for a few hours or days) and/or offline proxies (e.g., off-policy evaluation) with long-running, slow experiments to perform sequential, Bayesian optimization over large action spaces in a short amount of time.
Jul-1-2025
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Canada > Ontario
- Toronto (0.05)
- United States
- California > San Mateo County
- Menlo Park (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York > New York County
- New York City (0.04)
- California > San Mateo County
- Canada > Ontario
- Europe > United Kingdom
- Genre:
- Research Report
- Experimental Study (1.00)
- Strength High (0.68)
- Research Report
- Industry:
- Information Technology (0.46)
- Technology: