Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits

Oct-10-2024, 17:29:46 GMT–Neural Information Processing Systems

We study the online restless bandit problem, where the state of each arm evolves according to a Markov chain, and the reward of pulling an arm depends on both the pulled arm and the current state of the corresponding Markov chain. In this paper, we propose Restless-UCB, a learning policy that follows the explore-then-commit framework. In Restless-UCB, we present a novel method to construct offline instances, which only requires O(N) time-complexity ( N is the number of arms) and is exponentially better than the complexity of existing learning policy. We also prove that Restless-UCB achieves a regret upper bound of \tilde{O}((N M 3)T {2\over 3}), where M is the Markov chain state space size and T is the time horizon. Compared to existing algorithms, our result eliminates the exponential factor (in M,N) in the regret upper bound, due to a novel exploitation of the sparsity in transitions in general restless bandit problems.

efficient and low-complexity algorithm, online restless bandit, restless-ucb, (2 more...)

Neural Information Processing Systems

Oct-10-2024, 17:29:46 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)