Policy Optimization for Robust Average Cost MDPs
–Neural Information Processing Systems
This paper studies first-order policy optimization for robust average cost Markov decision processes (MDPs). Specifically, we focus on ergodic Markov chains. For robust average cost MDPs, the goal is to optimize the worst-case average cost over an uncertainty set of transition kernels. We first develop a sub-gradient of the robust average cost. Based on the sub-gradient, a robust policy mirror descent approach is further proposed.
Neural Information Processing Systems
May-28-2025, 17:17:30 GMT
- Country:
- North America > United States (1.00)
- Genre:
- Research Report > Experimental Study (0.93)
- Industry: