Review for NeurIPS paper: Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits
–Neural Information Processing Systems
I must first admit that judging this paper was a fairly challenging task given the mixed opinions expressed by the reviewers, together with my own impressions after having scrutinized the manuscript in detail. The reviewers largely agree that the paper deserves credit as it tackles the challenging, relevant and (relatively) scarcely studied topic of restless bandit learning. I believe the main value of the paper is in the introduction of the birth-death Markov chain structure for arms of a restless bandit, together with the monotonicity and positive correlation assumptions on rewards and transitions. These are not unnatural assumptions, as evidenced by modeling literature on scheduling over wireless channels and queueing systems, and seem to greatly alleviate the computational complexity of a portion of the learning process. On the other hand, the reviewers are not fully convinced about the significance of the proposed algorithm and regret bound proven in the paper, given that the analysis is carried out for a highly structured ensemble of Markov decision processes.
Neural Information Processing Systems
Jan-26-2025, 10:06:14 GMT
- Industry:
- Technology: