Planning and Learning in Average Risk-aware MDPs

Jun-22-2026, 20:17:06 GMT–Neural Information Processing Systems

For continuing tasks, average cost Markov decision processes have welldocumented value and can be solved using efficient algorithms. However, it explicitly assumes that the agent is risk-neutral. In this work, we extend risk-neutral algorithms to accommodate the more general class of dynamic risk measures. Specifically, we propose a relative value iteration (RVI) algorithm for planning and design two model-free Q-learning algorithms, namely a generic algorithm based on the multi-level Monte Carlo (MLMC) method, and an off-policy algorithm dedicated to utility-based shortfall risk measures. Both the RVI and MLMC-based Qlearning algorithms are proven to converge to optimality. Numerical experiments validate our analysis, confirm empirically the convergence of the off-policy algorithm, and demonstrate that our approach enables the identification of policies that are finely tuned to the intricate risk-awareness of the agent that they serve.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Jun-22-2026, 20:17:06 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.27)

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.92)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found