MARBLE: Multi-Armed Restless Bandits in Latent Markovian Environment

Amiri, Mohsen, Avrachenkov, Konstantin, Mimouni, Ibtihal El, Magnússon, Sindri

Nov-13-2025–arXiv.org Artificial Intelligence

Restless Multi-Armed Bandits (RMABs) are powerful models for decision-making under uncertainty, yet classical formulations typically assume fixed dynamics, an assumption often violated in nonstationary environments. We introduce MARBLE (Multi-Armed Restless Bandits in a Latent Markovian Environment), which augments RMABs with a latent Markov state that induces nonstationary behavior. In MARBLE, each arm evolves according to a latent environment state that switches over time, making policy learning substantially more challenging. We further introduce the Markov-Averaged Indexability (MAI) criterion as a relaxed indexability assumption and prove that, despite unobserved regime switches, under the MAI criterion, synchronous Q-learning with Whittle Indices (QWI) converges almost surely to the optimal Q-function and the corresponding Whittle indices. We validate MARBLE on a calibrated simulator-embedded (digital twin) recommender system, where QWI consistently adapts to a shifting latent state and converges to an optimal policy, empirically corroborating our theoretical findings.

machine learning, reinforcement learning, whittle index, (14 more...)

arXiv.org Artificial Intelligence

Nov-13-2025

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Oceania > New Zealand (0.04)
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:
- Research Report (0.50)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning > Reinforcement Learning (0.89)
    - Representation & Reasoning (1.00)
  - Data Science > Data Mining
    - Big Data (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found