Counterexample Guided RL Policy Refinement Using Bayesian Optimization
–Neural Information Processing Systems
Constructing Reinforcement Learning (RL) policies that adhere to safety requirements is an emerging field of study. RL agents learn via trial and error with an objective to optimize a reward signal. Often policies that are designed to accumulate rewards do not satisfy safety specifications. We present a methodology for counterexample guided refinement of a trained RL policy against a given safety specification. Our approach has two main components.
Neural Information Processing Systems
Jan-18-2025, 23:56:55 GMT
- Technology: