SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL

Cloete, Jacques, Vertovec, Nikolaus, Abate, Alessandro

Jun-24-2025–arXiv.org Artificial Intelligence

To apply reinforcement learning to safety-critical applications, we ought to provide safety guarantees during both policy training and deployment. In this work, we present theoretical results that place a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setting. This bound, based on a maximum policy ratio computed with respect to a 'safe' base policy, can also be applied to temporally-extended properties (beyond safety) and to robust control problems. To utilize these results, we introduce SPoRt, which provides a data-driven method for computing this bound for the base policy using the scenario approach, and includes Projected PPO, a new projection-based approach for training the task-specific policy while maintaining a user-specified bound on property violation. SPoRt thus enables users to trade off safety guarantees against task-specific performance. Complementing our theoretical results, we present experimental results demonstrating this trade-off and comparing the theoretical bound to posterior bounds derived from empirical violation rates.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

Jun-24-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)
- Europe (0.92)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Transportation (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Representation & Reasoning > Optimization (0.68)
  - Machine Learning
    - Reinforcement Learning (0.67)
    - Statistical Learning (0.45)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found