Best Arm Identification in Generalized Linear Bandits

May-20-2019–arXiv.org Machine Learning

The multi-armed bandit problem is a prototypical model for optimizing the tradeoff between exploration and exploitation. We consider a pure-exploration version of the bandit problem known as the best-arm identification problem, where the goal is to minimize the number of arm pulls required to select an arm that is - with sufficiently high probability - sufficiently close to the best arm. We assume that each arm has an observable vector of covariates or features, and there is an unknown vector of parameters (of the same dimension as the vector of features) that is common across arms. Whereas in a linear bandit the mean reward of an arm is the linear predictor (i.e., the inner product of the parameter vector and the feature vector), in our generalized linear model the mean reward is related to the linear predictor via a link function, which allows for mean rewards that are nonlinear in the linear predictor, as well as binary or integer rewards (via, e.g., logistic or

algorithm, health & medicine, upstream oil & gas, (21 more...)

arXiv.org Machine Learning

May-20-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Santa Clara County (0.14)

Genre:
- Research Report (1.00)

Industry:
- Energy > Oil & Gas
  - Upstream (0.34)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.94)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (0.89)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found