Model-based Safe Deep Reinforcement Learning via a Constrained Proximal Policy Optimization Algorithm

Dec-24-2025, 20:54:29 GMT–Neural Information Processing Systems

During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents perform a significant number of random exploratory steps. In the real world, this can limit the practicality of these algorithms as it can lead to potentially dangerous behavior. Hence safe exploration is a critical issue in applying RL algorithms in the real world. This problem has been recently well studied under the Constrained Markov Decision Process (CMDP) Framework, where in addition to single-stage rewards, an agent receives single-stage costs or penalties as well depending on the state transitions. The prescribed cost functions are responsible for mapping undesirable behavior at any given time-step to a scalar value.

algorithm, constrained proximal policy optimization algorithm, model-based safe deep reinforcement learning, (6 more...)

Neural Information Processing Systems

Dec-24-2025, 20:54:29 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)