Reviews: Improved Regret Bounds for Bandit Combinatorial Optimization

Neural Information Processing Systems 

In particular, the gap in the analysis is due to my mis-reading the formula, and the response convinced me. However, the paper overall looks incremental, so it is a paper nice to have, but its acceptance seems to be depending on the quality of other papers.] The paper studies the bandit combinatorial optimization problem and improve the lower bound of the problem from \Omega(\sqrt{dk 3T/log T}) in the prior work [8] to \Omega(\sqrt{dk 3T}), removing a factor of 1/\sqrt{\log T} . This makes the regret dependency on T and k, d tight up to a logarithmic factor. The analysis is built upon prior work [2,8], with the major innovation being a design of new distribution of loss vectors (given in Eq.(8)) that leads to a better lower bound.