Risk-Averse Reinforcement Learning: An Optimal Transport Perspective on Temporal Difference Learning

Feb-22-2025–arXiv.org Artificial Intelligence

-- The primary goal of reinforcement learning is to develop decision-making policies that prioritize optimal performance, frequently without considering risk or safety. In contrast, safe reinforcement learning seeks to reduce or avoid unsafe states. This letter introduces a risk-averse temporal difference algorithm that uses optimal transport theory to direct the agent toward predictable behavior . By incorporating a risk indicator, the agent learns to favor actions with predictable consequences. We evaluate the proposed algorithm in several case studies and show its effectiveness in the presence of uncertainty. The results demonstrate that our method reduces the frequency of visits to risky states while preserving performance. I. INTRODUCTION Reinforcement learning (RL) algorithms focus on maximizing performance, primarily through long-term reward optimization. However, this objective alone does not always prevent negative or high-risk outcomes.

agent, algorithm, probability, (13 more...)

arXiv.org Artificial Intelligence

Feb-22-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > Monroe County > Rochester (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Genre:
- Research Report (0.84)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found