Counterexample Guided RL Policy Refinement Using Bayesian Optimization

Jan-18-2025, 23:56:55 GMT–Neural Information Processing Systems

Constructing Reinforcement Learning (RL) policies that adhere to safety requirements is an emerging field of study. RL agents learn via trial and error with an objective to optimize a reward signal. Often policies that are designed to accumulate rewards do not satisfy safety specifications. We present a methodology for counterexample guided refinement of a trained RL policy against a given safety specification. Our approach has two main components.

bayesian optimization, counterexample guided rl policy refinement, safety specification

Neural Information Processing Systems

Jan-18-2025, 23:56:55 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)