Finite-Time Logarithmic Bayes Regret Upper Bounds

Atsidakou, Alexia, Kveton, Branislav, Katariya, Sumeet, Caramanis, Constantine, Sanghavi, Sujay

Nov-3-2023–arXiv.org Machine Learning

We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In Gaussian bandits, we obtain $O(c_\Delta \log n)$ and $O(c_h \log^2 n)$ bounds for an upper confidence bound algorithm, where $c_h$ and $c_\Delta$ are constants depending on the prior distribution and the gaps of random bandit instances sampled from it, respectively. The latter bound asymptotically matches the lower bound of Lai (1987). Our proofs are a major technical departure from prior works, while being simple and general. To show the generality of our techniques, we apply them to linear bandits. Our results provide insights on the value of prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve upon existing $\tilde{O}(\sqrt{n})$ bounds, which have become standard in the literature despite the existing lower bounds.

bandit, data mining, machine learning, (21 more...)

arXiv.org Machine Learning

Nov-3-2023

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Texas > Travis County
    - Austin (0.04)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning (0.93)
  - Data Science > Data Mining
    - Big Data (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found