Policy Gradient Methods for Distortion Risk Measures

Jan-30-2023–arXiv.org Artificial Intelligence

We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision process in on-policy as well as off-policy RL settings. We derive a variant of the policy gradient theorem that caters to the DRM objective, and use this theorem in conjunction with a likelihood ratio-based gradient estimation scheme. We derive non-asymptotic bounds that establish the convergence of our proposed algorithms to an approximate stationary point of the DRM objective.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

Jan-30-2023

arXiv.org PDF

Add feedback

Country:
- North America > Mexico (0.04)
- Asia > India (0.04)

Genre:
- Research Report (0.40)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.93)
  - Machine Learning
    - Reinforcement Learning (0.48)
    - Learning Graphical Models > Undirected Networks
      - Markov Models (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found