Policy Gradient for Rectangular Robust Markov Decision Processes

Dec-26-2025, 15:41:06 GMT–Neural Information Processing Systems

Policy gradient methods have become a standard for training reinforcement learning agents in a scalable and efficient manner. However, they do not account for transition uncertainty, whereas learning robust policies can be computationally expensive. In this paper, we introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs). We provide a closed-form expression for the worst occupation measure. Incidentally, we find that the worst kernel is a rank-one perturbation of the nominal.

name change, policy gradient, rectangular robust markov decision process, (3 more...)

Neural Information Processing Systems

Dec-26-2025, 15:41:06 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.61)