Counterexample Guided RL Policy Refinement Using Bayesian Optimization

Neural Information Processing Systems 

Constructing Reinforcement Learning (RL) policies that adhere to safety requirements is an emerging field of study.