Policy Optimization for Robust Average Cost MDPs

May-28-2025, 17:17:30 GMT–Neural Information Processing Systems

This paper studies first-order policy optimization for robust average cost Markov decision processes (MDPs). Specifically, we focus on ergodic Markov chains. For robust average cost MDPs, the goal is to optimize the worst-case average cost over an uncertainty set of transition kernels. We first develop a sub-gradient of the robust average cost. Based on the sub-gradient, a robust policy mirror descent approach is further proposed.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

May-28-2025, 17:17:30 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Government > Regional Government > North America Government > United States Government (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.69)
    - Reinforcement Learning (0.47)
  - Representation & Reasoning (1.00)