Best Arm Identification in Generalized Linear Bandits
Kazerouni, Abbas, Wein, Lawrence M.
The multi-armed bandit problem is a prototypical model for optimizing the tradeoff between exploration and exploitation. We consider a pure-exploration version of the bandit problem known as the best-arm identification problem, where the goal is to minimize the number of arm pulls required to select an arm that is - with sufficiently high probability - sufficiently close to the best arm. We assume that each arm has an observable vector of covariates or features, and there is an unknown vector of parameters (of the same dimension as the vector of features) that is common across arms. Whereas in a linear bandit the mean reward of an arm is the linear predictor (i.e., the inner product of the parameter vector and the feature vector), in our generalized linear model the mean reward is related to the linear predictor via a link function, which allows for mean rewards that are nonlinear in the linear predictor, as well as binary or integer rewards (via, e.g., logistic or
May-20-2019
- Country:
- North America > United States > California > Santa Clara County (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Energy > Oil & Gas
- Upstream (0.34)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
- Energy > Oil & Gas
- Technology: