Improved Regret Bounds for Bandit Combinatorial Optimization

Ito, Shinji, Hatano, Daisuke, Sumita, Hanna, Takemura, Kei, Fukunaga, Takuro, Kakimura, Naonori, Kawarabayashi, Ken-Ichi

Mar-19-2020, 01:32:55 GMT–Neural Information Processing Systems

In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. \citep{cohen2017tight} obtained a lower bound $\Omega(\sqrt{d k 3 T / \log T})$ of the regret, where $k$ is the maximum $\ell_1$-norm of action vectors, and $T$ is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by $\Omega( \sqrt{d k 3 T})$ through applying a factor of $\sqrt{\log T}$, which can be done by means of strongly-correlated losses with \textit{binary} values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit.

bandit combinatorial optimization, improved regret bound, loss vector, (5 more...)

Neural Information Processing Systems

Mar-19-2020, 01:32:55 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.79)
  - Representation & Reasoning > Search (0.90)