Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits Siwei Wang
–Neural Information Processing Systems
We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain, and the reward of pulling an arm depends on both the pulled arm and the current state of the corresponding Markov chain. In this paper, we propose Restless-UCB, a learning policy that follows the explore-then-commit framework. In Restless-UCB, we present a novel method to construct offline instances, which only requires O(N) time-complexity (N is the number of arms) and is exponentially better than the complexity of existing learning policy.
Neural Information Processing Systems
Jan-26-2025, 10:06:27 GMT
- Country:
- Asia (0.28)
- North America (0.28)
- Genre:
- Research Report > Promising Solution (0.48)
- Industry:
- Technology: