Deep reinforcement learning for weakly coupled MDP's with continuous actions
Robledo, Francisco, Ayesta, Urtzi, Avrachenkov, Konstantin
–arXiv.org Artificial Intelligence
This paper introduces the Lagrange Policy for Continuous Actions (LPCA), a reinforcement learning algorithm specifically designed for weakly coupled MDP problems with continuous action spaces. LPCA addresses the challenge of resource constraints dependent on continuous actions by introducing a Lagrange relaxation of the weakly coupled MDP problem within a neural network framework for Q-value computation. This approach effectively decouples the MDP, enabling efficient policy learning in resource-constrained environments. We present two variations of LPCA: LPCA-DE, which utilizes differential evolution for global optimization, and LPCA-Greedy, a method that incrementally and greadily selects actions based on Q-value gradients. Comparative analysis against other state-of-the-art techniques across various settings highlight LPCA's robustness and efficiency in managing resource allocation while maximizing rewards.
arXiv.org Artificial Intelligence
Jun-12-2024
- Country:
- Europe
- France
- Nouvelle-Aquitaine > Pyrénées-Atlantiques
- Pau (0.04)
- Occitanie > Haute-Garonne
- Toulouse (0.04)
- Nouvelle-Aquitaine > Pyrénées-Atlantiques
- Italy > Lombardy
- Milan (0.04)
- Spain > Basque Country
- Biscay Province > Bilbao (0.04)
- France
- North America > United States
- Massachusetts (0.04)
- New York > New York County
- New York City (0.04)
- Europe
- Genre:
- Research Report (1.00)
- Technology: