Non-Stationary Restless Multi-Armed Bandits with Provable Guarantee

Hung, Yu-Heng, Hsieh, Ping-Chun, Wang, Kai

Aug-15-2025–arXiv.org Artificial Intelligence

Online restless multi-armed bandits (RMABs) typically assume that each arm follows a stationary Markov Decision Process (MDP) with fixed state transitions and rewards. However, in real-world applications like healthcare and recommendation systems, these assumptions often break due to non-stationary dynamics, posing significant challenges for traditional RMAB algorithms. In this work, we specifically consider $N$-armd RMAB with non-stationary transition constrained by bounded variation budgets $B$. Our proposed \rmab\; algorithm integrates sliding window reinforcement learning (RL) with an upper confidence bound (UCB) mechanism to simultaneously learn transition dynamics and their variations. We further establish that \rmab\; achieves $\widetilde{\mathcal{O}}(N^2 B^{\frac{1}{4}} T^{\frac{3}{4}})$ regret bound by leveraging a relaxed definition of regret, providing a foundational theoretical framework for non-stationary RMAB problems for the first time.

data mining, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

Aug-15-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.40)

Industry:
- Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:
- Information Technology
  - Data Science > Data Mining
    - Big Data (0.85)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning
      - Reinforcement Learning (0.67)
      - Learning Graphical Models > Undirected Networks
        Markov Models (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found