CORe: Capitalizing On Rewards in Bandit Exploration

Wang, Nan, Kveton, Branislav, Karimzadehgan, Maryam

Mar-7-2021–arXiv.org Machine Learning

A multi-armed bandit can be considered as a special case We propose a bandit algorithm that explores purely of linear bandits, where the feature vector of each arm is by randomizing its past observations. In particular, a one-hot vector indicating the index of the arm, and the the sufficient optimism in the mean reward estimates parameter vector is a vector of corresponding mean rewards. is achieved by exploiting the variance in Arguably, the most popular and well-studied exploration the past observed rewards. We name the algorithm strategies for solving bandit problems are Thompson sampling Capitalizing On Rewards (CORe). The algorithm (TS) [Thompson, 1933, Agrawal and Goyal, 2013] is general and can be easily applied to different and Optimism in the Face of Uncertainty (OFU) [Auer et al., bandit settings. The main benefit of CORe is that 2002]. TS maintains a posterior distribution over each arm's its exploration is fully data-dependent.

artificial intelligence, big data, variance, (20 more...)

arXiv.org Machine Learning

Mar-7-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.28)

Genre:
- Research Report (0.82)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Statistical Learning (0.67)
  - Data Science > Data Mining
    - Big Data (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found