Characterizing the robustness of Bayesian adaptive experimental designs to active learning bias
Sloman, Sabina J., Oppenheimer, Daniel M., Broomell, Stephen B., Shalizi, Cosma Rohilla
Bayesian adaptive experimental design is a form of active learning, which chooses samples to maximize the information they give about uncertain parameters. Prior work has shown that other forms of active learning can suffer from active learning bias, where unrepresentative sampling leads to inconsistent parameter estimates. We show that active learning bias can also afflict Bayesian adaptive experimental design, depending on model misspecification. We analyze the case of estimating a linear model, and show that worse misspecification implies more severe active learning bias. At the same time, model classes incorporating more "noise" -- i.e., specifying higher inherent variance in observations -- suffer less from active learning bias. Finally, we demonstrate empirically that insights from the linear model can predict the presence and degree of active learning bias in nonlinear contexts, namely in a (simulated) preference learning experiment. Statistical theory often assumes learners' access to large amounts of representative training data, drawn from the distribution which is the target of inference or prediction. Nonetheless, such access is not feasible for many applications. Training data may be scarce (e.g., learning to identify a rare medical condition; Henry, Hager, Pronovost, and Saria (2015)), difficult or expensive to obtain (e.g., requiring human coders for text; Chen, Lasko, Mei, Denny, and Xu (2015)), or time-consuming to collect (e.g., obtaining user preferences online; Cavagnaro, Gonzalez, Myung, and Pitt (2013); Golovin, Krause, and Ray (2010)). One response is to abandon random sampling for adaptive sampling methods, choosing data points in sequence to be as informative as possible.
Nov-28-2022
- Country:
- North America > United States
- Pennsylvania > Allegheny County
- Pittsburgh (0.14)
- New York > New York County
- New York City (0.04)
- New Mexico > Santa Fe County
- Santa Fe (0.04)
- New Jersey
- Mercer County > Princeton (0.04)
- Hudson County > Hoboken (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.05)
- Indiana > Tippecanoe County
- West Lafayette (0.04)
- Lafayette (0.04)
- Pennsylvania > Allegheny County
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Health & Medicine (1.00)