Policy Optimization for Robust Average Reward MDPs

May-26-2025, 18:13:49 GMT–Neural Information Processing Systems

This paper studies first-order policy optimization for robust average cost Markov decision processes (MDPs). Specifically, we focus on ergodic Markov chains. For robust average cost MDPs, the goal is to optimize the worst-case average cost over an uncertainty set of transition kernels. We first develop a sub-gradient of the robust average cost. Based on the sub-gradient, a robust policy mirror descent approach is further proposed.

artificial intelligence, machine learning, robust average reward mdp, (8 more...)

Neural Information Processing Systems

May-26-2025, 18:13:49 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.63)