AExGym: Benchmarks and Environments for Adaptive Experimentation
Wang, Jimmy, Che, Ethan, Jiang, Daniel R., Namkoong, Hongseok
–arXiv.org Artificial Intelligence
Innovations across science and industry are evaluated using randomized trials (i.e., A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on realworld datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an opensource library, AExGym, which is designed with modularity and extensibility in mind to allow experimentation practitioners to develop and benchmark custom environments and algorithms.
arXiv.org Artificial Intelligence
Aug-8-2024
- Country:
- North America
- United States
- Pennsylvania (0.05)
- California (0.04)
- North Carolina > Wake County
- Raleigh (0.04)
- New York > New York County
- New York City (0.04)
- New Jersey > Mercer County
- Princeton (0.04)
- Georgia > Fulton County
- Atlanta (0.14)
- Canada > British Columbia
- United States
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Portugal > Braga
- Braga (0.04)
- United Kingdom > England
- Asia
- Middle East > Jordan (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America
- Genre:
- Research Report
- Experimental Study (1.00)
- Strength High (0.88)
- Research Report
- Industry:
- Health & Medicine (1.00)
- Information Technology (0.93)
- Government (0.68)
- Education > Educational Setting (0.46)
- Technology: