neurwin
- North America > United States > Texas > Brazos County > College Station (0.14)
- Asia > Taiwan (0.04)
- Oceania > New Zealand (0.04)
- (3 more...)
- Transportation (0.48)
- Government (0.46)
- Health & Medicine (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL
Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems.
- North America > United States > Texas > Brazos County > College Station (0.14)
- Asia > Taiwan (0.04)
- Oceania > New Zealand (0.04)
- (3 more...)
- Transportation (0.48)
- Government (0.46)
- Health & Medicine (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Data Science > Data Mining > Big Data (0.92)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
ContextWIN: Whittle Index Based Mixture-of-Experts Neural Model For Restless Bandits Via Deep RL
This study introduces ContextWIN, a novel architecture that extends the Neural Whittle Index Network (NeurWIN) model to address Restless Multi-Armed Bandit (RMAB) problems with a context-aware approach. By integrating a mixture of experts within a reinforcement learning framework, ContextWIN adeptly utilizes contextual information to inform decision-making in dynamic environments, particularly in recommendation systems. A key innovation is the model's ability to assign context-specific weights to a subset of NeurWIN networks, thus enhancing the efficiency and accuracy of the Whittle index computation for each arm. The paper presents a thorough exploration of ContextWIN, from its conceptual foundation to its implementation and potential applications. We delve into the complexities of RMABs and the significance of incorporating context, highlighting how ContextWIN effectively harnesses these elements. The convergence of both the NeurWIN and ContextWIN models is rigorously proven, ensuring theoretical robustness. This work lays the groundwork for future advancements in applying contextual information to complex decision-making scenarios, recognizing the need for comprehensive dataset exploration and environment development for full potential realization.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL
Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems.Our experiment results show that the performance of NeurWIN is significantly better than other RL algorithms.
Tabular and Deep Learning for the Whittle Index
Relaño, Francisco Robledo, Borkar, Vivek, Ayesta, Urtzi, Avrachenkov, Konstantin
The Whittle index policy is a heuristic that has shown remarkably good performance (with guaranteed asymptotic optimality) when applied to the class of problems known as Restless Multi-Armed Bandit Problems (RMABPs). In this paper we present QWI and QWINN, two reinforcement learning algorithms, respectively tabular and deep, to learn the Whittle index for the total discounted criterion. The key feature is the use of two time-scales, a faster one to update the state-action Q -values, and a relatively slower one to update the Whittle indices. In our main theoretical result we show that QWI, which is a tabular implementation, converges to the real Whittle indices. We then present QWINN, an adaptation of QWI algorithm using neural networks to compute the Q -values on the faster time-scale, which is able to extrapolate information from one state to another and scales naturally to large state-space environments. For QWINN, we show that all local minima of the Bellman error are locally stable equilibria, which is the first result of its kind for DQN-based schemes. Numerical computations show that QWI and QWINN converge faster than the standard Q -learning algorithm, neural-network based approximate Q-learning and other state of the art algorithms.
- Asia > India (0.14)
- Europe > France (0.04)
- Europe > Spain > Basque Country (0.04)
- (2 more...)
NeurWIN: Neural Whittle Index Network For Restless Bandits Via Deep RL
Nakhleh, Khaled, Ganji, Santosh, Hsieh, Ping-Chun, Hou, I-Hong, Shakkottai, Srinivas
Whittle index policy is a powerful tool to obtain asymptotically optimal solutions for the notoriously intractable problem of restless bandits. However, finding the Whittle indices remains a difficult problem for many practical restless bandits with convoluted transition kernels. This paper proposes NeurWIN, a neural Whittle index network that seeks to learn the Whittle indices for any restless bandits by leveraging mathematical properties of the Whittle indices. We show that a neural network that produces the Whittle index is also one that produces the optimal control for a set of Markov decision problems. This property motivates using deep reinforcement learning for the training of NeurWIN. We demonstrate the utility of NeurWIN by evaluating its performance for three recently studied restless bandit problems. Our experiment results show that the performance of NeurWIN is significantly better than other RL algorithms.
- North America > United States > Texas > Brazos County > College Station (0.14)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > Taiwan (0.04)
- (3 more...)
- Transportation > Ground > Road (0.68)
- Transportation > Electric Vehicle (0.68)
- Automobiles & Trucks (0.68)