Value-at-Risk Constrained Policy Optimization

Feb-2-2026–arXiv.org Machine Learning

We introduce the Value-at-Risk Constrained Policy Optimization algorithm (VaR-CPO), a sample efficient and conservative method designed to optimize Value-at-Risk (VaR) constraints directly. Empirically, we demonstrate that VaR-CPO is capable of safe exploration, achieving zero constraint violations during training in feasible environments, a critical property that baseline methods fail to uphold. To overcome the inherent non-differentiability of the VaR constraint, we employ the one-sided Chebyshev inequality to obtain a tractable surrogate based on the first two moments of the cost return. Additionally, by extending the trust-region framework of the Constrained Policy Optimization (CPO) method, we provide rigorous worst-case bounds for both policy improvement and constraint violation during the training process.

constraint, machine learning, reinforcement learning, (18 more...)

arXiv.org Machine Learning

Feb-2-2026

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.14)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.83)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Reinforcement Learning (0.70)
  - Representation & Reasoning
    - Constraint-Based Reasoning (0.70)
    - Optimization (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found