AExGym: Benchmarks and Environments for Adaptive Experimentation
Wang, Jimmy, Che, Ethan, Jiang, Daniel R., Namkoong, Hongseok
–arXiv.org Artificial Intelligence
Innovations across science and industry are evaluated using randomized trials (i.e., A/B tests). While simple and robust, such static designs are inefficient or infeasible for testing many hypotheses. Adaptive designs can greatly improve statistical power in theory, but they have seen limited adoption due to their fragility in practice. We present a benchmark for adaptive experimentation based on realworld datasets, highlighting prominent practical challenges to operationalizing adaptivity: non-stationarity, batched/delayed feedback, multiple outcomes and objectives, and external validity. Our benchmark aims to spur methodological development that puts practical performance (e.g., robustness) as a central concern, rather than mathematical guarantees on contrived instances. We release an opensource library, AExGym, which is designed with modularity and extensibility in mind to allow experimentation practitioners to develop and benchmark custom environments and algorithms.
arXiv.org Artificial Intelligence
Aug-8-2024
- Country:
- Asia
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East > Jordan (0.04)
- Japan > Honshū
- Europe
- Portugal > Braga
- Braga (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Portugal > Braga
- North America
- Canada > British Columbia
- United States
- California (0.04)
- Georgia > Fulton County
- Atlanta (0.14)
- New Jersey > Mercer County
- Princeton (0.04)
- New York > New York County
- New York City (0.04)
- North Carolina > Wake County
- Raleigh (0.04)
- Pennsylvania (0.05)
- Asia
- Genre:
- Research Report
- Experimental Study (1.00)
- Strength High (0.88)
- Research Report
- Industry:
- Education > Educational Setting (0.46)
- Government (0.68)
- Health & Medicine (1.00)
- Information Technology (0.93)
- Technology: