Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning

Cao, Hongpeng, Mao, Yanbing, Sha, Lui, Caccamo, Marco

Dec-16-2024–arXiv.org Artificial Intelligence

Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this paper proposes a physics-model-guided worst-case sampling strategy for training safe policies that can handle safety-critical cases toward guaranteed safety. Furthermore, we integrate the proposed worst-case sampling strategy into the physics-regulated deep reinforcement learning (Phy-DRL) framework to build a more data-efficient and safe learning algorithm for safety-critical CPS. We validate the proposed training strategy with Phy-DRL through extensive experiments on a simulated cart-pole system, a 2D quadrotor, a simulated and a real quadruped robot, showing remarkably improved sampling efficiency to learn more robust safe policies.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

Dec-16-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York > New York County
    - New York City (0.04)
  - Illinois > Champaign County
    - Urbana (0.04)
  - California > San Francisco County
    - San Francisco (0.04)
- Europe > Germany
  - Bavaria > Upper Bavaria > Munich (0.04)

Genre:
- Research Report (0.40)

Industry:
- Automobiles & Trucks (0.93)
- Information Technology > Robotics & Automation (0.68)
- Transportation > Ground
  - Road (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)