Improved Regret Bounds for Bandit Combinatorial Optimization

Ito, Shinji, Hatano, Daisuke, Sumita, Hanna, Takemura, Kei, Fukunaga, Takuro, Kakimura, Naonori, Kawarabayashi, Ken-Ichi

Neural Information Processing Systems 

In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. \citep{cohen2017tight} obtained a lower bound $\Omega(\sqrt{d k 3 T / \log T})$ of the regret, where $k$ is the maximum $\ell_1$-norm of action vectors, and $T$ is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by $\Omega( \sqrt{d k 3 T})$ through applying a factor of $\sqrt{\log T}$, which can be done by means of strongly-correlated losses with \textit{binary} values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit.