Improved Regret Bounds for Bandit Combinatorial Optimization

Oct-11-2024, 00:55:33 GMT–Neural Information Processing Systems

In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. \citep{cohen2017tight} obtained a lower bound \Omega(\sqrt{d k 3 T / \log T}) of the regret, where k is the maximum \ell_1 -norm of action vectors, and T is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by \Omega( \sqrt{d k 3 T}) through applying a factor of \sqrt{\log T}, which can be done by means of strongly-correlated losses with \textit{binary} values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit.

bandit combinatorial optimization, improved regret bound, loss vector, (5 more...)

Neural Information Processing Systems

Oct-11-2024, 00:55:33 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.76)
  - Representation & Reasoning > Search (0.90)