Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

In this paper, the authors extend the "resource allocation with semi-bandit feedback", proposed by Lattimore et al. [2014], to the multi-resource case. The paper has provided two regret bounds, one for the worst case (Theorem 2) and the other for the "resource-laden" case (Theorem 7). The authors also provide a new result on the "weighted least squares estimation", which is independently interesting. The paper is well-written and very interesting, the analysis in this paper is also rigorous. The extension to the multi-resource case is non-trivial, and the new result on the "weighted least squares estimation" is very interesting and might be reused by researchers in the field of bandit/RL in the future. Thus, I think this paper meets the acceptance threshold.