Stochastic Rising Bandits

Metelli, Alberto Maria, Trovò, Francesco, Pirola, Matteo, Restelli, Marcello

Dec-7-2022–arXiv.org Artificial Intelligence

This paper is in the field of stochastic Multi-Armed Bandits (MABs), i.e., those sequential selection techniques able to learn online using only the feedback given by the chosen option (a.k.a. arm). We study a particular case of the rested and restless bandits in which the arms' expected payoff is monotonically non-decreasing. This characteristic allows designing specifically crafted algorithms that exploit the regularity of the payoffs to provide tight regret bounds. We design an algorithm for the rested case (R-ed-UCB) and one for the restless case (R-less-UCB), providing a regret bound depending on the properties of the instance and, under certain circumstances, of $\widetilde{\mathcal{O}}(T^{\frac{2}{3}})$. We empirically compare our algorithms with state-of-the-art methods for non-stationary MABs over several synthetically generated tasks and an online model selection problem for a real-world dataset. Finally, using synthetic and real-world data, we illustrate the effectiveness of the proposed approaches compared with state-of-the-art algorithms for the non-stationary bandits.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Dec-7-2022

arXiv.org PDF

Add feedback

Country:
- Europe (0.45)
- North America > United States (0.27)

Genre:
- Research Report > New Finding (0.67)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Statistical Learning (0.87)
    - Representation & Reasoning (1.00)
  - Data Science > Data Mining
    - Big Data (0.66)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found