Adaptive Experimental Design for Policy Learning
Kato, Masahiro, Okumura, Kyohei, Ishihara, Takuya, Kitagawa, Toru
This study designs an adaptive experiment for decision-making given multiple treatment arms, such as arms in slot machines, diverse therapies, and distinct unemployment assistance programs. The primary objective is to identify the best treatment arm for individuals given covariates, often referred to as a context, at the end of an experiment. Our problem is termed contextual fixed-budget best arm identification (BAI), an instance of the stochastic multi-armed bandit (MAB) problem (Thompson, 1933; Lai and Robbins, 1985). Our setting is a generalization of the fixed-budget BAI problem to minimize the expected simple regret at the end of a fixed number of rounds of an adaptive experiment, called a budget or sample size (Bubeck, Munos, and Stoltz, 2009, 2011; Audibert, Bubeck, and Munos, 2010). In our setting, at each round of an adaptive experiment, a decision-maker sequentially assigns one of the treatment arms to a research subject based on past observations and contextual information observed before the treatment assignment. At the end of the experiment, the experimenter recommends an estimated best treatment arm for future experimental subjects.
Jan-9-2024
- Country:
- Asia > Japan
- Honshū
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.04)
- Tōhoku (0.04)
- Kantō > Tokyo Metropolis Prefecture
- Honshū
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > Los Angeles County
- Los Angeles (0.04)
- Illinois > Cook County
- Chicago (0.04)
- New York (0.04)
- California > Los Angeles County
- Asia > Japan
- Genre:
- Research Report > Experimental Study (1.00)
- Industry:
- Education (0.45)
- Technology: