Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model

Mortensen, Oliver, Talebi, Mohammad Sadegh

Jun-3-2025–arXiv.org Machine Learning

In this paper we analyze the sample complexities of learning the optimal state-action value function $Q^*$ and an optimal policy $π^*$ in a discounted Markov decision process (MDP) where the agent has recursive entropic risk-preferences with risk-parameter $β\neq 0$ and where a generative model of the MDP is available. We provide and analyze a simple model based approach which we call model-based risk-sensitive $Q$-value-iteration (MB-RS-QVI) which leads to $(ε,δ)$-PAC-bounds on $\|Q^*-Q^k\|$, and $\|V^*-V^{π_k}\|$ where $Q_k$ is the output of MB-RS-QVI after k iterations and $π_k$ is the greedy policy with respect to $Q_k$. Both PAC-bounds have exponential dependence on the effective horizon $\frac{1}{1-γ}$ and the strength of this dependence grows with the learners risk-sensitivity $|β|$. We also provide two lower bounds which shows that exponential dependence on $|β|\frac{1}{1-γ}$ is unavoidable in both cases. The lower bounds reveal that the PAC-bounds are both tight in $\varepsilon$ and $δ$ and that the PAC-bound on $Q$-learning is tight in the number of actions $A$, and that the PAC-bound on policy-learning is nearly tight in $A$.

machine learning, natural language, reinforcement learning, (17 more...)

arXiv.org Machine Learning

Jun-3-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- Europe
  - United Kingdom > England
    - Greater London > London (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)

Genre:
- Research Report (0.63)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Generation (0.61)
  - Representation & Reasoning > Optimization (0.46)
  - Machine Learning
    - Reinforcement Learning (0.69)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.49)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found