Policy Gradient for Rectangular Robust Markov Decision Processes
–Neural Information Processing Systems
However, they do not account for transition uncertainty, whereas learning robust policies can be computationally expensive.
Neural Information Processing Systems
Feb-16-2026, 19:26:39 GMT