OSOM: A Simultaneously Optimal Algorithm for Multi-Armed and Linear Contextual Bandits

Chatterji, Niladri S., Muthukumar, Vidya, Bartlett, Peter L.

Jun-11-2019–arXiv.org Machine Learning

We consider the stochastic linear (multi-armed) contextual bandit problem with the possibility of hidden \textit{simple multi-armed bandit} structure in which the rewards are independent of the contextual information. Algorithms that are designed solely for one of the regimes are known to be sub-optimal for their alternate regime. We design a single computationally efficient algorithm that simultaneously obtains problem-dependent optimal regret rates in the simple multi-armed bandit regime and minimax optimal regret rates in the linear contextual bandit regime, without knowing a priori which of the two models generates the rewards. These results are proved under the condition of stochasticity of contextual information over multiple rounds. Our results should be viewed as a step towards principled data-dependent policy class selection for contextual bandits.

algorithm, artificial intelligence, big data, (21 more...)

arXiv.org Machine Learning

Jun-11-2019

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.67)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found